Most Efficient Workflow / Process Available Currently

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.

Moderator: peterZ

Leolopez
Posts: 2
Joined: 04 Mar 2014, 00:53

Re: Most Efficient Workflow / Process Available Currently

Post by Leolopez »

Hi, to reduce processing time, you can open multiple sessions scantailor, take advantage of a dual or quad core processor, and thus reduce processing time for each book.
I process books average 1000 pages, what I do is divide the book into 3 parts.
I put snapshot of the pc doing the job.
I hope this helps in your work.

Bye
Attachments
output - 3 at time.JPG
output - 3 at time.JPG (153.81 KiB) Viewed 14977 times
content in one.JPG
content in one.JPG (166.45 KiB) Viewed 14977 times
part1-2-3.JPG
part1-2-3.JPG (154.68 KiB) Viewed 14977 times
emmerick

Re: Most Efficient Workflow / Process Available Currently

Post by emmerick »

mellow-yellow wrote:Since writing my post above, I have experimented, corrected, and improved this proposal substantially. Feedback welcome! :)

NOTE: A=Attended ("your" time), U=Unattended ("CPU" time)

Fastest (300 pg book)
1. Scan with SDM using S_FAST* (8 min A)
2. Transfer L and R images to PC (2 min A)
3. Rename L (001.jpg, 003.jpg, etc.) and R (002.jpg, 004.jpg) with IrfanView in Batch (1 min A)
4. Combine results into a single folder, move to ABBYY Hot Folder** and convert to PDFs (1 min A, 20 min. U)
5. Acrobat Standard - Combine Files - to create a single PDF (1 min A, 2 min U)
Total: 13 minutes (A) or 35 minutes (A+U)
Advantages: Speed (a 300-page, OCR'd book in 13 min!), Less time waiting for and returning to the PC (#4 to #5)
Disadvantages: Poor contrast (JJM's correct), no cropping*** (rig visible, IrfanView can crop but you'll add 1 min. A and 6 min. U)


Better Quality (300 pg book)
1. Scan with SDM using S_FAST* (8 min A)
2. Transfer L and R images to PC (2 min A)
3. Rename L (001.jpg, 003.jpg, etc.) and R (002.jpg, 004.jpg) with IrfanView in Batch (1 min A)
4. ScanTailor L then ScanTailor R: steps #1-#4 (5 min A, 3 min U)
5. ScanTailor Cropping*** Fix (http://diybookscanner.org/forum/viewtop ... =466#p4791) (2 min. A)
6. ScanTailor L then ScanTailor R: steps #4-#6 with Mixed selected (5 min A, 7 min U)
7. Copy L and R "out" folder to ABBYY Hot Folder** for conversion to PDFs (1 min A, 20 min. U)
8. Acrobat Standard - Combine Files - to create a single PDF (1 min A, 2 min U)
Total: 25 min (A), 57 min (A+U)
Advantages: White backgrounds on black text, good colors and contrast, cropped images
Disadvantages: Cumbersome, 62% slower ( (57-35)/35 *100 = 62.857), More time waiting for and returning to the PC (#4 to #5, #6 to #7, #7 to #8)

* S_FAST: http://www.diybookscanner.org/forum/vie ... 5528#p5528
** ABBYY Hot Folder Settings included: PDF/A, Mixed Raster Content (MRC), text under image
*** JPEGCrops was unstable (crashes, slow, probably due to the hundreds of 12MP color images) for me on both Windows 7 and XP SP3.

The secret to jpegCrops not to crash is to put a maximum of 100 to 100 images. really is much easier to work with the scan tailor in select contend
Anonymous1

Re: Most Efficient Workflow / Process Available Currently

Post by Anonymous1 »

I would discourage running so many Scan Tailor instances. The output size for each batch will be different, and I doubt you'll get much of a boost in performance (I think ST utilizes multiple QThreads, but I'm not entirely sure).
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Most Efficient Workflow / Process Available Currently

Post by dtic »

Anonymous1: ST only uses one CPU core. I think my script for using two ST instances (only) for the final processing step avoids the problem you describe. I at least haven't noticed differences in page size for the two output halves. Check it out here:
http://diybookscanner.org/forum/viewtop ... f=8&t=1249
User avatar
mellow-yellow
Posts: 46
Joined: 28 Jun 2010, 13:33
Number of books owned: 1
Country: USA
Location: Portland, OR, USA
Contact:

Re: Most Efficient Workflow / Process Available Currently

Post by mellow-yellow »

Since I haven't updated this post lately, and because I've been working on the software side of this workflow a *lot* lately, it now seems to me that the fastest and easiest way to convert your images to e-books (i.e. PDF using Scantailor and ABBYY FineReader) is ... (sorry, shameless plug follows) ... to use my open-source DIY E-book Creator, especially if you have Microsoft Windows XP, Windows Vista, or Windows 7.

http://diybookscanner.org/forum/viewforum.php?f=23

Please consider helping us improve that software though, as it needs bug fixes and flexibility improvements (e.g. more output formats, open source OCR engine like Tesseract).
User avatar
mellow-yellow
Posts: 46
Joined: 28 Jun 2010, 13:33
Number of books owned: 1
Country: USA
Location: Portland, OR, USA
Contact:

Re: Most Efficient Workflow / Process Available Currently

Post by mellow-yellow »

Just a quick update on our workflow: I've posted a roughly 7-minute "book to e-book" video showing my current (fastest) workflow:



Enjoy!
sam_brewer
Posts: 6
Joined: 17 Aug 2013, 02:28
Number of books owned: 200
Country: USA

Re: Most Efficient Workflow / Process Available Currently

Post by sam_brewer »

Excellent thread, really helped me get through my first book!

For the "freeware" types (i.e. - those who won't or can't shell out $$ for "real" software ;) ), I use the following workflow. All programs (AFAIK) are freeware and/or open-source.

1) Shoot pictures. I use a single-camera setup, shooting several chapters of a book (was a college-text) to get about 300 pages per session. (Since I have to fiddle with moving the book anyway, this broke the sessions and resulting data into manageable chunks)

1a) I shoot one side of the book first, flip the book around and shoot the other. I do the pages in-order for each session (front to back of book), as this makes the renaming easier for me (less "stuff" to keep track of, especially with the camera's date-index autonaming). Even with my totally manual setup, I was still able to snap photos on average about every 6-7 seconds, including time to readjust the book, refocus, etc. ("Necessary" breaks not included :) )

2) Download the pictures into the computer. USB in my case, or pull the SD card if you must.

3) Sort the pictures into "Odd" and "Even" folders - this is easier for me instead of "Right" and "Left". YMMV. I used a separate directory for each chapter. Take notes during the photo-session, or use Irfanview to preview the JPGs to find out which blocks of photos to put where.

4) Rename the photos using Batch Rename utility (BRU). I used "Chapter_n_pagenumber" for the new filename. Makes the files easily human-readable and gives you automatic sorting when you merge the Odd and Even pages in step 6. (Note: for Windoze users, be sure to use leading zeros for the pagenumber field. Otherwise, pages will sort funny. Like "1", "10", "11"..., "2", "20", etc.). Using the actual page numbers allows you to quickly spot any missing pages - the filename should correspond to the actual page image.

5) Process photos using Scan Tailor. Lots of info on using it, it's VERY easy to learn - kudos to the author on this one!

6) Copy the Scan Tailor output TIF Files into a separate directory named "Merge". Now, the entire chapter is in a single, clean directory for the publishing steps.

7) Create a multipage TIF file using Irfanview ( Options --> Multipage Images --> Create Multipage TIF ). Open the new multipage TIF in Irfanview for a final-check to make sure no missing pages/artifacts and make sure that Scan Tailor got the Content windowed correctly (sometimes it captures "stuff" that expands the content window). Repeat steps 1-6 as needed to clean up the chapter TIFFs.

8) When all is ready, do a multipage print of the TIF file using Irfanview. ( Options --> Multipage Images --> Print All Pages ) I use PDF Creator, installed as a printer on the computer. Make sure the preview window looks OK, you might have to diddle the settings in the PDF Creator window when it comes up.

9) (Optional) Merge all of the chapters into a single PDF using "Merge PDF". (Either I downloaded the wrong version or it's only available in French, but the UI is easy to follow and I had no trouble merging the chapters into a single PDF File.)

10) (Optional) - Add a Table of Contents to the PDF file using "Jpdfbookmarks". This neat little program will take a text file (I couldn't OCR the TOC from my PDF, so I just manually typed it using Word) and allows you to go through your PDF to link the TOC to specific pages. A live-TOC is especially useful for a textbook, where a reader may be flipping between chapters quite often, maybe not as much for a novel (easier ways to insert an electronic bookmark for a particular page).

11) All done!

Back up your work, and if possible publish it to some of the many repositories linked in this site. All that work you did might be valuable to others (it likely is). Be absolutely certain to observe the applicable copyright laws in doing this - not all books are in public domain, and changes to the law make it likely that anything published after 1964 will be under copyright for the next 97 years. Thankyou VHS-BETAMAX videotape for this!

My first (and so-far only) scan took about 2 working-days from start-to-finish, including putting the scan-setup together and learning how to use the new software (Scan Tailor, BRU, MergePDF, and Jpdfbookmarks). Murphy picked a largish book for my first effort (1036 pages!), but the workflow helped me grind through it and at no point did I feel I wasn't making good time.

I know this isn't the least labor-intensive workflow, especially for the archivists facing stacks of books, and it certainly won't win any awards in a Scanner Shootoff contest. But for the occasional scanner just wanting to capture a book or 2 on loan from the library it's an inexpensive path to getting a VERY good-quality scan for your digital library (and possibly to share, observing relevant IP ownership laws and all that). Beats the HECK out of using a flatbed scanner or (worse) a photocopy machine.

And who knows where it might lead...

Making a personal eBook cost me nothing but my time. I already had an old Epson printer/scanner in the recycle pile, and fortunately have a camera, tripod, a computer, and duct-tape. The software was easily downloadable and free. I made not a single trip to the store or even the dumpster for anything I used (although I did use the last couple of feet of duct-tape in the house, so a resupply run is in order STAT!)

Thanks to Daniel and all of the members posting here - you guys and gals are a true inspiration!

Sam

(Note to Modz - there may be a more appropriate Topic for this post, please feel free to move it if necessary. "Most Efficient" is relative, this workflow got me through my first bookscan attempt and is thus "Most Efficient" for me :) )
Post Reply