How to number lines in scanned text?

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.

Moderator: peterZ

Post Reply
Adam32
Posts: 29
Joined: 28 Jun 2014, 08:55
Number of books owned: 500
Country: United Kingdom

How to number lines in scanned text?

Post by Adam32 »

I need to number lines in scanned text (for legal documents), which are a few thousand pages. The scanned text is in image format and has been bound into PDF. I don't want to OCR the text and import to Libre office etc, as this will be a lot of work as it will mess up formatting etc. What I would like to do is somehow recognised the lines and then overlay numbers. Any idea how this could be done simply?
cday
Posts: 451
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: How to number lines in scanned text?

Post by cday »

Adding line numbers to scans of a document is unlikely to be a problem encountered by forum members, no reason not to ask though, but have you searched online for something like 'add line numbers to pages of pdf document' ?

A quick search shows this is evidently a need others have had, and it may be worth looking beyond the first page of results. But realistically quite likely there isn't an easy solution, especially as your PDF file contains scan images rather than editable text. But if you have Adobe Acrobat (the full program) available, you could if necessary OCR it to ClearScan editable text.

In the absence of an ideal solution, could you manage with referencing passages of text by referring to the page number plus a small number in the (left) margin not exactly aligned with a particular line? I think I see a reasonably simple way of creating a suitable scale with as fine a resolution as you need, subject to the resolution of the page images, and then adding it to each page so that individual lines could be closely referenced.

You don't give any indication of the software you have available other than LibreOffice, or how much effort the task is worth. On a detail, given the large number of pages to be processed, you may possibly have to split long documents into sections in order to use some tools.
dpc
Posts: 379
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: How to number lines in scanned text?

Post by dpc »

Well there's this bit of chicanery: https://tex.stackexchange.com/questions ... -to-a-pdf/

You might also want to ping the folks in the PDF forum of mobileread.com (https://www.mobileread.com/forums/forum ... .php?f=184).
Post Reply