User:Qfissler: Difference between revisions

User:Qfissler (view source)

551 bytes added , 26 December 2024

m

more PDF notes

893

edits

@@ Line 248: / Line 248: @@
 Can extract pages from a pdf
-  pdfimages --all infile.pdf infile-extracted    # the last argument there is the prefix for the output file names
+  $ pdfimages --all infile.pdf infile-extracted    # the last argument there is the prefix for the output file names
-Remove any duff pages, rotate any pages which are better rotated, then img2pdf and ocrmypdf and then rebuild the pdf
+Remove any duff pages, rotate any pages which are better rotated, then img2pdf and ocrmypdf to rebuild the pdf
-Create PDF for images
+Create PDF from images
   $ img2pdf *.jpeg  --output T940-letter-June-1983.pdf
 Running OCR
   $ ocrmypdf T940-letter-June-1983.pdf T940-letter-June-1983-OCR.pdf
+I've had a couple of errors with ocrmypdf when working with some of the files taken from this wiki - most recently, a pdfmark destination beyond the end of the document. I'm pretty sure that extracting all the page images and rebuilding the PDF from images will drop all the extra data from such documents - and maybe I'll find the right combination of options to ignore them or even fix those rogue pointers...
+* [https://itsfoss.com/pdf-editors-linux/ a list of PDF editors for Linux]
+* [https://code-industry.net/free-pdf-editor/ Master PDF Editor]
 =Experimental pages=