Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Any way to convert book scans into PDF with vector text?

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.
Post Reply
Jackson342
Posts: 8
Joined: 11 May 2017, 04:09
Number of books owned: 0
Country: USA

Any way to convert book scans into PDF with vector text?

Post by Jackson342 » 04 Sep 2017, 18:06

Any way to convert PDF book scans into PDFs with vector text? I want to convert the IMAGES of text into VECTOR text.

What is the best way to do this? What software do I need?

dtic
Posts: 430
Joined: 06 Mar 2010, 18:03

Re: Any way to convert book scans into PDF with vector text?

Post by dtic » 06 Sep 2017, 15:16

Adobe Acrobat Pro has a good vectorization/"fontirization" feature. It was previously called ClearScan but now the option is called "Editable text and images".

Read more about it here
http://blogs.adobe.com/acrolaw/2009/05/ ... n_is_smal/
https://forums.adobe.com/thread/1810210

Adobe has a trial version of Acrobat Pro DC where you can test the feature.

Jackson342
Posts: 8
Joined: 11 May 2017, 04:09
Number of books owned: 0
Country: USA

Re: Any way to convert book scans into PDF with vector text?

Post by Jackson342 » 07 Sep 2017, 16:29

dtic wrote:
06 Sep 2017, 15:16
Adobe Acrobat Pro has a good vectorization/"fontirization" feature. It was previously called ClearScan but now the option is called "Editable text and images".

Read more about it here
http://blogs.adobe.com/acrolaw/2009/05/ ... n_is_smal/
https://forums.adobe.com/thread/1810210

Adobe has a trial version of Acrobat Pro DC where you can test the feature.
Thank you for your response dtic! Do you know if this is only program with this feature? Do you know if ABBYY does this? So far I have only been using ABBYY to convert from PDF to Microsoft Word.

dtic
Posts: 430
Joined: 06 Mar 2010, 18:03

Re: Any way to convert book scans into PDF with vector text?

Post by dtic » 08 Sep 2017, 17:40

Jackson342 wrote:
07 Sep 2017, 16:29
Do you know if this is only program with this feature?
I don't know any other that has it, though I haven't tested ABBYY and some similar competitors in a long time so there might be something out there. https://github.com/ncraun/smoothscan is an opensource attempt but is far from Acrobat in text quality and seems to be no longer in development.

duerig
Posts: 343
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Re: Any way to convert book scans into PDF with vector text?

Post by duerig » 09 Sep 2017, 12:27

I should add that there are a number of open source pixel-image-to-vector conversion tools around. I tried a number of them and was impressed by neither the compression nor the quality. It is possible that one or more of them has made advances since then, and you can do a search on Github or generally to find a few different project.

But I think if you really want vector representation, your best bet is to try the Adobe ClearScan, like dtic recommends.

-Jonathon Duerig

cday
Posts: 216
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Any way to convert book scans into PDF with vector text?

Post by cday » 11 Oct 2017, 05:51

Adobe ClearScan, as noted above, can produce excellent quality output and also very small files sizes, but requires quite high DPI images (ideally typically around 600DPI) in order to work well.

Another approach to producing a PDF with vector text, not mentioned above, is to use the option in Abbyy FineReader or Nuance OmniPage to output pages that are created using standard vector fonts, as would be used in a word processor document.

While that option can also produce excellent quality output and minimal file sizes, a practical limitation is that in general it will be difficult to maintain the original page layout, especially when the page layout is complex. Even if fonts that nominally match the original fonts used are available, line breaks for example are likely to change, while preserving the layout of a complex page is likely to require a lot of effort. Lower resolution images can however be used satisfactorily, subject to the OCR text recognition accuracy achieved and the time that can be spent on proof-reading. If preserving the layout of the original pages is not important, it could be a good option.

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests