Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

How to edit the text layer of a PDF?

Convert page images into searchable text. Talk about software, techniques, and new developments here.
Post Reply
hacecalor
Posts: 1
Joined: 28 Sep 2016, 08:16
Number of books owned: 0
Country: United States

How to edit the text layer of a PDF?

Post by hacecalor » 29 Sep 2016, 08:34

Hi all,

I'm in the process of building my scanner, so I haven't completed any projects yet, but I have made a couple of PDFs using photos from the Internet Archive.

My question is about how one would modify the text layer of a PDF. Google doesn't seem to be very helpful about it.

I'm using Tesseract with hOCR via ocrmypdf and the results are good, but need a few corrections here and there.

It'd be nice if there were a tool that let you see the text layer of a PDF and edit it on the fly. Anyone know of a program (that runs on Linux, preferably free) that can do something like this?

qqmxdpo
Posts: 12
Joined: 24 Sep 2016, 02:13
Number of books owned: 0
Country: china

Re: How to edit the text layer of a PDF?

Post by qqmxdpo » 29 Sep 2016, 10:35

Hi
I even made somes experiments abouts hows the OCR of some software identifies the words of some pictures。it is not 100% successful。 sorry。

L.Willms
Posts: 129
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: How to edit the text layer of a PDF?

Post by L.Willms » 05 Mar 2018, 03:36

hacecalor wrote:
29 Sep 2016, 08:34
I'm in the process of building my scanner, so I haven't completed any projects yet, but I have made a couple of PDFs using photos from the Internet Archive.
[...]
It'd be nice if there were a tool that let you see the text layer of a PDF and edit it on the fly.
I know that version 14 of ABBYY Fine Reader can do this, and of course Adobe Acrobat Pro, starting with a higher level beyond version 8. I have Acrobat Pro 8, and this can do OCR of an image PDF, but provides no means to edit the recognized text. I know that later versions can do that, but don't know with which level on that capability is provided.

b0bcat
Posts: 36
Joined: 30 Nov 2012, 21:37
Number of books owned: 0
Country: UK

Re: How to edit the text layer of a PDF?

Post by b0bcat » 06 Mar 2018, 01:52

I may be sending you on a wild goose chase but this may have some leads:

https://github.com/manisandro/gImageReader/

I haven't looked at it for some time but I recall a fairly recent version allowed input image pdf to be output with a text layer instead of ocr-only and (maybe) the recognized text could also be edited - whether that applies also to a text layer as opposed to text-only, I don't know. Happy exploring!

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests