Page 1 of 1

How to number lines in scanned text?

Posted: 10 Jan 2021, 09:25
by Adam32
I need to number lines in scanned text (for legal documents), which are a few thousand pages. The scanned text is in image format and has been bound into PDF. I don't want to OCR the text and import to Libre office etc, as this will be a lot of work as it will mess up formatting etc. What I would like to do is somehow recognised the lines and then overlay numbers. Any idea how this could be done simply?

Re: How to number lines in scanned text?

Posted: 10 Jan 2021, 14:15
by cday
Adding line numbers to scans of a document is unlikely to be a problem encountered by forum members, no reason not to ask though, but have you searched online for something like 'add line numbers to pages of pdf document' ?

A quick search shows this is evidently a need others have had, and it may be worth looking beyond the first page of results. But realistically quite likely there isn't an easy solution, especially as your PDF file contains scan images rather than editable text. But if you have Adobe Acrobat (the full program) available, you could if necessary OCR it to ClearScan editable text.

In the absence of an ideal solution, could you manage with referencing passages of text by referring to the page number plus a small number in the (left) margin not exactly aligned with a particular line? I think I see a reasonably simple way of creating a suitable scale with as fine a resolution as you need, subject to the resolution of the page images, and then adding it to each page so that individual lines could be closely referenced.

You don't give any indication of the software you have available other than LibreOffice, or how much effort the task is worth. On a detail, given the large number of pages to be processed, you may possibly have to split long documents into sections in order to use some tools.

Re: How to number lines in scanned text?

Posted: 10 Jan 2021, 15:58
by dpc
Well there's this bit of chicanery: ... -to-a-pdf/

You might also want to ping the folks in the PDF forum of ( ... .php?f=184).