How to edit the text layer of a PDF?

Convert page images into searchable text. Talk about software, techniques, and new developments here.

Moderator: peterZ

Post Reply
hacecalor
Posts: 5
Joined: 28 Sep 2016, 08:16
Number of books owned: 0
Country: United States

How to edit the text layer of a PDF?

Post by hacecalor »

Hi all,

I'm in the process of building my scanner, so I haven't completed any projects yet, but I have made a couple of PDFs using photos from the Internet Archive.

My question is about how one would modify the text layer of a PDF. Google doesn't seem to be very helpful about it.

I'm using Tesseract with hOCR via ocrmypdf and the results are good, but need a few corrections here and there.

It'd be nice if there were a tool that let you see the text layer of a PDF and edit it on the fly. Anyone know of a program (that runs on Linux, preferably free) that can do something like this?
qqmxdpo
Posts: 12
Joined: 24 Sep 2016, 02:13
Number of books owned: 0
Country: china

Re: How to edit the text layer of a PDF?

Post by qqmxdpo »

Hi
I even made somes experiments abouts hows the OCR of some software identifies the words of some pictures。it is not 100% successful。 sorry。
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: How to edit the text layer of a PDF?

Post by L.Willms »

hacecalor wrote: 29 Sep 2016, 08:34 I'm in the process of building my scanner, so I haven't completed any projects yet, but I have made a couple of PDFs using photos from the Internet Archive.
[...]
It'd be nice if there were a tool that let you see the text layer of a PDF and edit it on the fly.
I know that version 14 of ABBYY Fine Reader can do this, and of course Adobe Acrobat Pro, starting with a higher level beyond version 8. I have Acrobat Pro 8, and this can do OCR of an image PDF, but provides no means to edit the recognized text. I know that later versions can do that, but don't know with which level on that capability is provided.
b0bcat
Posts: 49
Joined: 30 Nov 2012, 21:37
Number of books owned: 0
Country: UK

Re: How to edit the text layer of a PDF?

Post by b0bcat »

I may be sending you on a wild goose chase but this may have some leads:

https://github.com/manisandro/gImageReader/

I haven't looked at it for some time but I recall a fairly recent version allowed input image pdf to be output with a text layer instead of ocr-only and (maybe) the recognized text could also be edited - whether that applies also to a text layer as opposed to text-only, I don't know. Happy exploring!
Post Reply