33,250
edits
No edit summary |
No edit summary |
||
Line 1,725: | Line 1,725: | ||
[[User:Kurt|Kurt]] ([[User talk:Kurt|talk]]) 04:58, 7 July 2024 (PDT) | [[User:Kurt|Kurt]] ([[User talk:Kurt|talk]]) 04:58, 7 July 2024 (PDT) | ||
:Thanks, Kurt - I was running on limited info - just the markings on the [[010-0367-00]] 10X attenuator I found in a bunch of parts I acquired. I'll aim to search part refs in OCRd manuals to confirm I've not slipped up - but the two places I added it were [[P6038]] and [[S-3]] pages - and was more confused when I saw the '''P6038 / S3S''' refs on the [[P6038 Signal Chopper]] page - my tiredness mixing up '''S-3''' and '''S3S'''. I ''think'' I understand now, but sorry if I messed up! [[User:Qfissler|Qfissler]] ([[User talk:Qfissler|talk]]) 07:11, 7 July 2024 (PDT) | :Thanks, Kurt - I was running on limited info - just the markings on the [[010-0367-00]] 10X attenuator I found in a bunch of parts I acquired. I'll aim to search part refs in OCRd manuals to confirm I've not slipped up - but the two places I added it were [[P6038]] and [[S-3]] pages - and was more confused when I saw the '''P6038 / S3S''' refs on the [[P6038 Signal Chopper]] page - my tiredness mixing up '''S-3''' and '''S3S'''. I ''think'' I understand now, but sorry if I messed up! [[User:Qfissler|Qfissler]] ([[User talk:Qfissler|talk]]) 07:11, 7 July 2024 (PDT) | ||
---- | |||
Regarding the [[User:Qfissler#OCR]] commands, one additional thing to watch out for is preserving original page sizes. | |||
For example, the T940-letter-June-1983.pdf PDF metadata claims a page size of 15.63 × 20.84 inches but the original document | |||
is presumably 8.5 x 11 inches. | |||
The page size can be obtained in various ways. One way to get the page size in inches is invoke pdfinfo on the file and then | |||
divide the "points" dimensions by 72. Note that page size is a per-page attribute in a PDF. So pdfinfo needs to be given the -f and -l | |||
flags to get it to output the page sizes of all pages. | |||
Many PDFs that I made over the years were done incorrectly in regard to page size. | |||
They can be fixed losslessly but it would have been better to have gotten it right from the beginning. | |||
Since the original physical size info was lost, it's not something that can be trivially fixed in an automated fashion. | |||
I have a script that normalizes page sizes to a height of 11 inches, which works for a lot of PDFs. | |||
But it is not correct in cases where some pages were letter-sized, but scanned in landscape orientation. | |||
If a document is scanned to PDF, it usually starts with the correct page size because the scanner knows the physical size of the page | |||
it's scanning. If the scan stays in PDF through whatever processing is done (e.g., OCR), the page size info is generally preserved. | |||
Scanned to an image format or converting to an image format as part of processing are the most common ways for page size info to be lost. | |||
Of course the page size or DPI info can be passed out-of-band to the program that generates PDF files from the image files. | |||
It's just easy to forget to do that. | |||
Also, if some pages (e.g., schematics) are scanned at higher DPI than other pages, it's easy to for the page size info to get messed up, | |||
mainly due to human error. I've gotten tangled up in that many times. | |||
Some image formats support metadata tags that convey information about the physical size. | |||
I don't know how widely (and correctly) those tags are supported. | |||
The VintageTek Museum's microfiche scans have PDF page sizes of 8.5 x 11 inches despite the physical size of the page of microfiche being very small. | |||
That behavior seems appropriate to me. | |||
[[User:Kurt|Kurt]] ([[User talk:Kurt|talk]]) 15:18, 26 December 2024 (PST) |