893
edits
m (→Notes for other users: 3D models and PLA) |
m (more PDF notes) |
||
Line 248: | Line 248: | ||
Can extract pages from a pdf | Can extract pages from a pdf | ||
pdfimages --all infile.pdf infile-extracted # the last argument there is the prefix for the output file names | $ pdfimages --all infile.pdf infile-extracted # the last argument there is the prefix for the output file names | ||
Remove any duff pages, rotate any pages which are better rotated, then img2pdf and ocrmypdf | Remove any duff pages, rotate any pages which are better rotated, then img2pdf and ocrmypdf to rebuild the pdf | ||
Create PDF | Create PDF from images | ||
$ img2pdf *.jpeg --output T940-letter-June-1983.pdf | $ img2pdf *.jpeg --output T940-letter-June-1983.pdf | ||
Running OCR | Running OCR | ||
$ ocrmypdf T940-letter-June-1983.pdf T940-letter-June-1983-OCR.pdf | $ ocrmypdf T940-letter-June-1983.pdf T940-letter-June-1983-OCR.pdf | ||
I've had a couple of errors with ocrmypdf when working with some of the files taken from this wiki - most recently, a pdfmark destination beyond the end of the document. I'm pretty sure that extracting all the page images and rebuilding the PDF from images will drop all the extra data from such documents - and maybe I'll find the right combination of options to ignore them or even fix those rogue pointers... | |||
* [https://itsfoss.com/pdf-editors-linux/ a list of PDF editors for Linux] | |||
* [https://code-industry.net/free-pdf-editor/ Master PDF Editor] | |||
=Experimental pages= | =Experimental pages= |
edits