Re: Plustek 3800
Posted: 24 Aug 2017, 20:19
The same like me. However, I admit that there were books available only in paper form I bought and turned into e-book just for fun and to test some improvements to my workflow process that came to my mind.I don't bother scanning any book I can get some other way. It just isn't worth the effort. Most of the books I want I can get one way or another, but some just aren't available for one reason or another. So those I scan.
I am particularly impressed with your 3 hour time for creation of the ebook.
It was necessary to gain some experience and develop my individual workflow. It was rather gradual and long-lasting process. 3 hour time for 200 page novel is not even my best result but there are some assumptions behind this.
The crucial issue is to have good quality images. All of them must be sharp, with good contrast, evenly lighted etc. Sometimes it is difficult to meet this condition as I do not use any special device for taking photos (only HTC 10 or iPhone 6 camera and the mobile phone holder that may be affixed to a window glass). In result if there are issues with some pages each following stage (especially image preprocessing and proofreading) takes a lot of additional work. However, if everything goes well, time spent on following steps would be like this:
1. Taking photos - 15 min (200 pages so it is necessary to take 100 photos as I usually take 2 pages at once).
2. Scan Tailor preprocess - it depends, but there must be some bigger problem with input pictures if it takes more than 1 hour.
3. OCR - I do not even regard this as time spent on ebook creating as it requires only initial set-up (like 3 min.) and then everything goes automatically with no necessity of my any supervision over it. However, we may assume that it takes 30 min.
4. Proofreading – approx. 1 hour, provided that OCR output is of reasonable quality and no footnotes. There are always misspellings and errors but PepitoCleaner really helps to correct them. I also use regular expressions for find & replace and spellchecking box that is provided by the LanguageTool. After this I go through the whole text checking and correcting errors as well as comparing it with original contents of a book. At this stage I try to apply original text formattings (italic, bold etc. but headers and chapters are usually reconstructed at the PepitoCleaner stage). My proofreading definitely is not as thorough and careful as yours, and some errors usually remain unnoticed. But typically they are few.
5. Epub conversion and final editions – 10 min. writer2epub usually creates very clean html code so there is not much to do. I used to spend a lot of time on editing of the code output by calibre but with writer2epub it shouldn’t be a problem anymore.
As you can see it is actually doable to keep to 3 hours.