Noob Questions on Scanning Process and E-Reader Formats

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.

Moderator: peterZ

dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Noob Questions on Scanning Process and E-Reader Formats

Post by dtic »

Some tools for turning images into pdf have a OCR step built in. For example Adobe Acrobat. With such a tool OCR'ing is included your step 4. Otherwise you do OCR after the pdf/djvu is created and use some tool to insert the OCR'ed text into the file.
rkomar
Posts: 98
Joined: 12 May 2013, 16:36
E-book readers owned: PRS-505, PocketBook 902, PRS-T1, PocketBook 623, PocketBook 840
Number of books owned: 3000
Country: Canada

Re: Noob Questions on Scanning Process and E-Reader Formats

Post by rkomar »

I converted a lot of my old textbooks and references into PDF files a few years ago. At the time, mathematical equations, computer code snippets and tables were very hard to put into EPUBs. So, I just left them as scanned images inside the PDF files. I also found that such a bare-bones document was very hard to use as a reference. I ended up adding all of the chapters and sections to the "bookmarks" section in each file. _That_ turned out to be as much work as everything else when dealing with books with detailed contents (computing the page offsets, typing in the text for each,...). Still, it was needed if I wanted to be able to find information easily in the documents. You can use OCR to add a text layer to the document and search that when looking for information, but I personally don't think that's as good as having a table of contents.
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Noob Questions on Scanning Process and E-Reader Formats

Post by dtic »

@rkomar: Yeah, manual bookmarking takes a lot of time. I posted a script that, combined with jpdfbookmarks, speeds up bookmark creation a lot. See this thread http://diybookscanner.org/forum/viewtop ... =19&t=2837 , especially post number 4.
recaptcha
Posts: 64
Joined: 03 Sep 2010, 13:23
Number of books owned: 0
Location: Calgary, Alberta, Canada

Re: Noob Questions on Scanning Process and E-Reader Formats

Post by recaptcha »

So if I want to have a lot of searchable reference books and articles on a tablet/e-reader, what would you recommend in terms of saving processing time? It's starting to sound like a major undertaking.
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Noob Questions on Scanning Process and E-Reader Formats

Post by dtic »

If you're on Windows and have access to Acrobat (sounded like you had before) then I suggest you start out simple:
1. start out with a simple DIY cardboard scanner and a sheet of glass/plastic
2. run the book page photos through Scan Tailor
3. then turn the images into an OCR'ed pdf in Acrobat (try the Clearscan OCR setting)

Once you get the hang of it you can add more steps, test out different software here, build a scanner that is faster to operate and so on.
Do save a backup of all unedited book photos from the start. That way you can always reprocess at a later time when you have more experience with the different settings and can add additional postprocessing steps.
Post Reply