User talk:Qfissler: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 1,725: Line 1,725:
[[User:Kurt|Kurt]] ([[User talk:Kurt|talk]]) 04:58, 7 July 2024 (PDT)
[[User:Kurt|Kurt]] ([[User talk:Kurt|talk]]) 04:58, 7 July 2024 (PDT)
:Thanks, Kurt - I was running on limited info - just the markings on the [[010-0367-00]] 10X attenuator I found in a bunch of parts I acquired. I'll aim to search part refs in OCRd manuals to confirm I've not slipped up - but the two places I added it were [[P6038]] and [[S-3]] pages - and was more confused when I saw the '''P6038 / S3S''' refs on the [[P6038 Signal Chopper]] page - my tiredness mixing up '''S-3''' and '''S3S'''. I ''think'' I understand now, but sorry if I messed up! [[User:Qfissler|Qfissler]] ([[User talk:Qfissler|talk]]) 07:11, 7 July 2024 (PDT)
:Thanks, Kurt - I was running on limited info - just the markings on the [[010-0367-00]] 10X attenuator I found in a bunch of parts I acquired. I'll aim to search part refs in OCRd manuals to confirm I've not slipped up - but the two places I added it were [[P6038]] and [[S-3]] pages - and was more confused when I saw the '''P6038 / S3S''' refs on the [[P6038 Signal Chopper]] page - my tiredness mixing up '''S-3''' and '''S3S'''. I ''think'' I understand now, but sorry if I messed up! [[User:Qfissler|Qfissler]] ([[User talk:Qfissler|talk]]) 07:11, 7 July 2024 (PDT)
----
Regarding the [[User:Qfissler#OCR]] commands, one additional thing to watch out for is preserving original page sizes.
For example, the T940-letter-June-1983.pdf PDF metadata claims a page size of 15.63 × 20.84 inches but the original document
is presumably 8.5 x 11 inches.
The page size can be obtained in various ways. One way to get the page size in inches is invoke pdfinfo on the file and then
divide the "points" dimensions by 72. Note that page size is a per-page attribute in a PDF. So pdfinfo needs to be given the -f and -l
flags to get it to output the page sizes of all pages.
Many PDFs that I made over the years were done incorrectly in regard to page size.
They can be fixed losslessly but it would have been better to have gotten it right from the beginning.
Since the original physical size info was lost, it's not something that can be trivially fixed in an automated fashion.
I have a script that normalizes page sizes to a height of 11 inches, which works for a lot of PDFs.
But it is not correct in cases where some pages were letter-sized, but scanned in landscape orientation.
If a document is scanned to PDF, it usually starts with the correct page size because the scanner knows the physical size of the page
it's scanning. If the scan stays in PDF through whatever processing is done (e.g., OCR), the page size info is generally preserved.
Scanned to an image format or converting to an image format as part of processing are the most common ways for page size info to be lost.
Of course the page size or DPI info can be passed out-of-band to the program that generates PDF files from the image files.
It's just easy to forget to do that.
Also, if some pages (e.g., schematics) are scanned at higher DPI than other pages, it's easy to for the page size info to get messed up,
mainly due to human error. I've gotten tangled up in that many times.
Some image formats support metadata tags that convey information about the physical size.
I don't know how widely (and correctly) those tags are supported.
The VintageTek Museum's microfiche scans have PDF page sizes of 8.5 x 11 inches despite the physical size of the page of microfiche being very small.
That behavior seems appropriate to me.
[[User:Kurt|Kurt]] ([[User talk:Kurt|talk]]) 15:18, 26 December 2024 (PST)