Any way to convert book scans into PDF with vector text?

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.

Moderator: peterZ

Post Reply
Jackson342
Posts: 8
Joined: 11 May 2017, 04:09
Number of books owned: 0
Country: USA

Any way to convert book scans into PDF with vector text?

Post by Jackson342 »

Any way to convert PDF book scans into PDFs with vector text? I want to convert the IMAGES of text into VECTOR text.

What is the best way to do this? What software do I need?
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Any way to convert book scans into PDF with vector text?

Post by dtic »

Adobe Acrobat Pro has a good vectorization/"fontirization" feature. It was previously called ClearScan but now the option is called "Editable text and images".

Read more about it here
http://blogs.adobe.com/acrolaw/2009/05/ ... n_is_smal/
https://forums.adobe.com/thread/1810210

Adobe has a trial version of Acrobat Pro DC where you can test the feature.
Jackson342
Posts: 8
Joined: 11 May 2017, 04:09
Number of books owned: 0
Country: USA

Re: Any way to convert book scans into PDF with vector text?

Post by Jackson342 »

dtic wrote: 06 Sep 2017, 15:16 Adobe Acrobat Pro has a good vectorization/"fontirization" feature. It was previously called ClearScan but now the option is called "Editable text and images".

Read more about it here
http://blogs.adobe.com/acrolaw/2009/05/ ... n_is_smal/
https://forums.adobe.com/thread/1810210

Adobe has a trial version of Acrobat Pro DC where you can test the feature.
Thank you for your response dtic! Do you know if this is only program with this feature? Do you know if ABBYY does this? So far I have only been using ABBYY to convert from PDF to Microsoft Word.
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Any way to convert book scans into PDF with vector text?

Post by dtic »

Jackson342 wrote: 07 Sep 2017, 16:29 Do you know if this is only program with this feature?
I don't know any other that has it, though I haven't tested ABBYY and some similar competitors in a long time so there might be something out there. https://github.com/ncraun/smoothscan is an opensource attempt but is far from Acrobat in text quality and seems to be no longer in development.
duerig
Posts: 388
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Re: Any way to convert book scans into PDF with vector text?

Post by duerig »

I should add that there are a number of open source pixel-image-to-vector conversion tools around. I tried a number of them and was impressed by neither the compression nor the quality. It is possible that one or more of them has made advances since then, and you can do a search on Github or generally to find a few different project.

But I think if you really want vector representation, your best bet is to try the Adobe ClearScan, like dtic recommends.

-Jonathon Duerig
cday
Posts: 447
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Any way to convert book scans into PDF with vector text?

Post by cday »

Adobe ClearScan, as noted above, can produce excellent quality output and also very small files sizes, but requires quite high DPI images (ideally typically around 600DPI) in order to work well.

Another approach to producing a PDF with vector text, not mentioned above, is to use the option in Abbyy FineReader or Nuance OmniPage to output pages that are created using standard vector fonts, as would be used in a word processor document.

While that option can also produce excellent quality output and minimal file sizes, a practical limitation is that in general it will be difficult to maintain the original page layout, especially when the page layout is complex. Even if fonts that nominally match the original fonts used are available, line breaks for example are likely to change, while preserving the layout of a complex page is likely to require a lot of effort. Lower resolution images can however be used satisfactorily, subject to the OCR text recognition accuracy achieved and the time that can be spent on proof-reading. If preserving the layout of the original pages is not important, it could be a good option.
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: Any way to convert book scans into PDF with vector text?

Post by L.Willms »

Jackson342 wrote: 07 Sep 2017, 16:29
Do you know if ABBYY does this? So far I have only been using ABBYY to convert from PDF to Microsoft Word.
Yes, ABBYY FineReader does OCR (Optical Character Recognition), and is able to create PDFs with the images as being used as input and a layer of the recognized text behind, so that one can search for text in the PDF.

Ominpage by Nuance is another offer -- I think it is the main competitor to ABBYY FineReader.

There is a List of optical character recognition software on en.Wikipedia.org which also indicates some main features of them.

ABBYY software is -- as far as I can see -- the only one which recognizes Gothic script (Fraktur in German), but no longer in the FineReader but in the "Recognition Server" (indended to be used as a distributed workflow with a number of work stations repsonsible for a special step in the process from scanning via OCR to final output); and recently I learned that they have a new pricing model for this: it is by page. One buys a given number of pages to be recognized -- the cheapest is 2500 pages for 149 Euros.
Post Reply