Talk:422

From TekWiki
Revision as of 07:53, 11 January 2024 by Qfissler (talk | contribs) (ah - thanks)
Jump to navigation Jump to search

The PDFs that were marked "bad OCR" were done with ABBYY FineReader and have major problems. FineReader overzealously "corrected" what it thinks is skew. See page 54 of 070-0434-02.pdf for an example. In general FineReader mangles the images rather than just adding an invisible text layer. I don't know whether the mangling can be turned off. I stopped using it. Also, see page 4 of 070-0895-00.pdf. Again, it incorrectly "corrected" the skew. Also see the figures on page 17 and 18 of 070-0895-00.pdf. Also, see page 213 of 070-0895-00.pdf for an even more bizarre example. Tools that seem to work correctly are Adobe's Acrobat and the open source tools based on Tesseract. I've been using OCRMyPDF and it works fine most of the time. And if it encounters and error, it prints error messages rather than silently mangling your document, like ABBYY FineReader. It isn't really practical to babysit OCR software to make sure it didn't mangle the page images. It needs to be reliable. Kurt (talk) 11:26, 10 January 2024 (PST)

Thanks, Kurt - now I understand :-) I'll re-OCR them and see what I get... Qfissler (talk) 06:53, 11 January 2024 (PST)