906
edits
m (Plugins compared. Sections rationalised. TOC added.) |
m (SM pdf parts extract note) |
||
Line 54: | Line 54: | ||
[[Tekwiki_Guidelines]] | [[Tekwiki_Guidelines]] | ||
<br> | <br> | ||
==Working with PDFs== | |||
===Extracting Part Refs=== | |||
I had fun extracting lists of references to the [[151-0367-00]] transistor from the [[475]] service manual. | |||
Get the text from the manual | |||
$ pdftotxt 475_SM.pdf 475_SM.txt | |||
The tables are split but remarkably consistent so it's not too difficult to work back and pick up the previous column | |||
$ grep -B12 151-0367 475_SM.txt | |||
Grab the refs I want and paste them into a text editor, then turn new lines into comma space... | |||
===OCR=== | ===OCR=== | ||
Line 62: | Line 76: | ||
`ocrmypdf` seems to work very well - the recognised text lines up with the image text - best results so far. | `ocrmypdf` seems to work very well - the recognised text lines up with the image text - best results so far. | ||
Will also try `pdfsandwich` | Will also try `pdfsandwich` | ||
Can extract pages from a pdf, remove any duff pages, rotate any pages which are better rotated, then img2pdf and ocrmypdf and then rebuild the pdf | |||
~/Electronics/Scopes/TekTronix/2215/Letter$ ls -l | ~/Electronics/Scopes/TekTronix/2215/Letter$ ls -l |
edits