User:Qfissler: Difference between revisions

Jump to navigation Jump to search
m
more PDF notes
m (→‎Notes for other users: 3D models and PLA)
m (more PDF notes)
Line 248: Line 248:


Can extract pages from a pdf
Can extract pages from a pdf
  pdfimages --all infile.pdf infile-extracted    # the last argument there is the prefix for the output file names
  $ pdfimages --all infile.pdf infile-extracted    # the last argument there is the prefix for the output file names


Remove any duff pages, rotate any pages which are better rotated, then img2pdf and ocrmypdf and then rebuild the pdf  
Remove any duff pages, rotate any pages which are better rotated, then img2pdf and ocrmypdf to rebuild the pdf  


Create PDF for images
Create PDF from images
  $ img2pdf *.jpeg  --output T940-letter-June-1983.pdf
  $ img2pdf *.jpeg  --output T940-letter-June-1983.pdf


Running OCR
Running OCR
  $ ocrmypdf T940-letter-June-1983.pdf T940-letter-June-1983-OCR.pdf
  $ ocrmypdf T940-letter-June-1983.pdf T940-letter-June-1983-OCR.pdf
I've had a couple of errors with ocrmypdf when working with some of the files taken from this wiki - most recently, a pdfmark destination beyond the end of the document. I'm pretty sure that extracting all the page images and rebuilding the PDF from images will drop all the extra data from such documents - and maybe I'll find the right combination of options to ignore them or even fix those rogue pointers...
* [https://itsfoss.com/pdf-editors-linux/ a list of PDF editors for Linux]
* [https://code-industry.net/free-pdf-editor/ Master PDF Editor]


=Experimental pages=
=Experimental pages=
893

edits

Navigation menu