Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Plustek 3800

Discussions, questions, comments, ideas, and your projects having to do with DIY Book Scanner software. This includes the Stereo Data Maker software for the cameras, post-processing software, utilities, OCR packages, and so on.
Post Reply
genarcher
Posts: 2
Joined: 21 Jul 2017, 00:29
E-book readers owned: kindle, sony
Number of books owned: 5000
Country: Australia

Plustek 3800

Post by genarcher » 30 Jul 2017, 02:46

I've got a project of scanning my library - about 4,000 books and I'm using the PDF option to create files; I've set the parameters for Color - 150dpi; gray - 250 dpi & black&white 330dpi - I want to create as good an image as I can get to get a smaller PDF file but have the correct dpi to get OCR working as best as I can. I was using Omnipage 19 which worked well and gave me prompts when the spelling wasn't correct - now I've got Finereader 12 and Finereader 14 which allow me to save as docx, epub & text but doesn't have the spellchecker on.
I can scan one page about every 12 second so that give me 5 pages a minute and a book about 60 to 90 minutes depending on how many pages. Is there a faster way of scanning for the best results or are these formats okay for my purpose - seeing that I'll get about 4 to 5 books a day, I'm looking at a couple of years!
Alternatively, will i get better results and faster with the Plustek 4800.
Thanks for your assistance and best wishes from downunder.

BruceG
Posts: 60
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: Plustek 3800

Post by BruceG » 31 Jul 2017, 08:03

Using a scanner with 2 cameras scanner such as David Landin's made with PVC piping (details can be found on the site) will increase through put to 1000 pages an hour. 2 pages at a time and only turning the page beside moving the book to and from a platen. He has made some videos on youtube, just search Easy Book Scanner. Post processing will take longer however.
I am currently making epub files for my ereader from someone elses pdf files scanned on a flat bed book scanner. With Omnipage to docx then using Jutoh to create chapter index to epub. You can go straight from Omnipage to epub but I prefer using Jutoh to prepare epub files.
I am also using a Czur E16 scanner which does not use a platen. The book just sit on a table, so the distance from camera to page changes from close at the start of the book to further away by the end. I did a series of minute books that had lots of pasted in letters, hand written notes, etc. either up and down or across the pages of different sizes which would have been difficult with a flat bed scanner. The Czur scanner uses the Finereader engine for OCR. When scanning in grayscale it produces a grey hue so I scan in B&W. Omnipage does not do this.
One of the reasons people use cameras with a platen scanner is difficulty in scanning the middle of a book with a flat bed scanner. I have found magazines are a bigger problem, when the printing goes across the gutter of pages is the worst. Book scanners such as the Plustek with the ability to scan up to 2mm from the edge the scanner help. They also do less damage to the books as does the camera platen scanners.
300dpi is the usual minimum for OCR.

genarcher
Posts: 2
Joined: 21 Jul 2017, 00:29
E-book readers owned: kindle, sony
Number of books owned: 5000
Country: Australia

Re: Plustek 3800

Post by genarcher » 04 Aug 2017, 23:41

Hi Bruce and thanks for your suggestions; I've scanned about 400 books in six months with the Plustek and most of the results have been okay - every so often I get a book which has been compiled with very tight margins and they have been a problem, otherwise new books seem to get better results than older books (which might be due to improved printing machinery, cleaner fonts and better quality paper). Older books, particularly those hand typset and in the 1930s and 1940s with coarse paper tend to result more difficult files. I agree that the epub from both ominpage and Finereader don't get best results - I need to re-edit them with Word to get section and chapter headings, section breaks and the occasional spell clean-up to be able to import them into an electronic book reader - who would have thought fifteen years ago of the ability now to create such content.

BillGill
Posts: 42
Joined: 18 Dec 2016, 17:13
E-book readers owned: Calibre, FBReader
Number of books owned: 7000
Country: USA

Re: Plustek 3800

Post by BillGill » 05 Aug 2017, 09:36

I don't worry too much about speed of scanning. Lately with my new 1 camera scanner I am getting about 250 pages per hour. That is a very small part of the process of creating an epub. I spend far more time proof reading the text after it has been converted. Even the best OCR will leave a lot of errors. And that is if the scan is really good.

The quality of the scan varies a lot depending on the book. I have a lot of older books that have yellowed pages that don't provide high quality scans. There are a lot of garbled words, incorrect punctuation, and misspelled words. Some common errors are confusion between 'h' and 'b' and substitution of '1' for 'l' or 'I'. Getting the text corrected is definitely the long end of the pole in book scanning.

I generally work on one chapter at a time and run through each chapter 3 times looking for errors. Then I put all the chapters together and go through it one more time. Then I convert it to epub and go through it one more time. And I am still finding errors on the last pass. All told this takes me about 1 1/2 to 2 weeks per book. That is working a few hours a day.

I use Calibre for the converting it to epub and then editing it.

Bill

BruceG
Posts: 60
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: Plustek 3800

Post by BruceG » 13 Aug 2017, 23:45

The errors I find the most is as you say 1 for I in dates and the pound sign.

Others issues I have is - do I move footnotes to the actual location and put in brackets
- do I remove footnotes that refer to other pages as they are not relevant to ereader page numbers
I have only started doing index with chapters etc. A recent book had sketches before the chapter heading. So do I move the chapter heading before the sketches so they are included with the chaper or leave it and the sketches go with the previous chapter.

Today printing is different than in the past, spaces were used in the past, after and before ' !" etc. Do I leave or change. I tend to leave but if missing I insert as per today.

I have not found any paper that discusses these issues.

As well as creating epub files I create pdf files to index for searching purposes, ie many files can be searched at once.

BillGill
Posts: 42
Joined: 18 Dec 2016, 17:13
E-book readers owned: Calibre, FBReader
Number of books owned: 7000
Country: USA

Re: Plustek 3800

Post by BillGill » 14 Aug 2017, 10:02

I haven't run into some of the issues you have, since I am scanning mostly old fiction, and mostly old paperbacks. I have only run into one book that had images, and they were in the middle of a chapter. I don't know how you are creating your epubs, but I am creating word processor files, then importing them into Calibre and letting Calibre convert them to epub. I separate the chapters with page breaks, and Calibre uses the page breaks to create the chapter files. So if I have an image in the chapter it should wind up ahead of the chapter heading when it is read. I don't know that for a fact, but it seems reasonable to me.

For the other formatting matters I usually try to keep the final version as much like the original as I can. So I put in spaces where they have spaces. Sometimes I look at something and realize that they have it wrong in the original book. If it particularly bothers me I may correct it, but mostly I go along with their mistakes.

Bill

BruceG
Posts: 60
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: Plustek 3800

Post by BruceG » 16 Aug 2017, 20:19

Most books I am doing have illustrations and a number have footnotes. They are mostly Missionary History or Biographies. I was going from Omnipage directly to epub then Calibre for final editing. I thought I would like to put in chapters on new pages and couldn't workout how calibre did it so brought Jutoh which uses 'headings 1' from word and other ways which I have not tried yet. So now I go OmniPage>word>Jutoh>epub.
It is because I have selected Jutoh 'Headings1" for chapters that any illustrations on the chapter page need to be after the Chapter number or name. I will need to check out the other methods for selecting Chapters.
As I am so far the only one is reading the epub files I am not so particular with editing. I edit once in OmniPage then once in word mostly for spaces e between pages and once again in Jutoh.
Someone else is scanning the books at a rate of about 3 a week and I am already slipping behind.

Post Reply

Who is online

Users browsing this forum: Google [Bot] and 1 guest