Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Creating a Digital Library

Book scanning methods that involve taking books apart.
Post Reply
Posts: 6
Joined: 25 Sep 2013, 16:54
E-book readers owned: iPad
Number of books owned: 10000
Country: US

Creating a Digital Library

Post by John_Latta » 13 Dec 2019, 14:55

Creating a Digital Library

Creating a digital library of 3,000 books means scanning about 1m pages. Any DIY scanner and the associated software is much too slow. I designed and implemented an 80/20 DIY scanner but in spite of being “fast,” the scanning of books and software processing took far too long. Thus, the quest to find a better solution based on cutting the books into individual pages. There are 8 scanners of various types here and an ADF scanner was essential for this task. Yet, most consumer ADF scanners do not have high volume capabilities. I found the enterprise level Fujitsu fi-6670 superb. It has a robust scanner driver that can be tailored to specific scanning needs. The PaperStream Capture software is excellent. A version of Abbyy is included with the software which does OCR.

The production set up included 3 fi-6670’s and 5 computers. 2 of the fi-6670 scanners are on a USB switch so that they can be operated by a computer that is not doing OCR.

IMG_5327 - small.jpg

To lower the overall time, the objective is to keep all the scanners running constantly. Once a scanner completes a book, it stops to do an OCR batch. The scanner is then switched to another computer which starts scanning another book. By the time this is complete the OCR on the other computer is usually finished. Thus, operationally the task is to constantly feed the scanners. All scanners run at 600dpi for best OCR. Further, the scanner driver was set up to automatically recognize color and the page size. I set up the scanner to scan as large at A3 but when the pages passed the scanner they were immediately sized.

The fi-6670 has a multifeed sensor that is excellent at detecting multiple pages stuck together. As a result in creating this library I found no missed pages due to two pages that passed as one through the scanner. It is also very common that the glue, especially the glue on the first or last pages, will rub off on the glass plate of the scanner. If not removed this creates a line in the text. As a result I checked the image sensor glass after each book and cleaned it as required.

Book cutting requires special care. I purchased, used, a relatively low cost paper cutter.

IMG_5186 - small.jpg
This worked very well. Only once was it necessary to have the blade sharpened. Cutting books takes more than just put them in the cutter and chop. Thicker books, typically above 1/2” curl during the cut. In the extreme this can cut into the edge of the gutter and cut text off. Further there is a fine line between cutting on the edge of the binding and cutting so that no glue exists on the pages. Glue, during binding, can seep into the pages. This gives rise to multiple stuck pages. When scanning the scanner halts and the pages must be separated – slowing the process. Thus, after the cut the pages are fanned to catch any glued pages. I never cut off any text during the cutting but it was a constant vigil to balance between the cut depth, glued pages and the book gutter. But for any thick book I had to cut the spine to create a 1/2'” thick section of the book and scan all such sections. Making sure that the sections were in order was essential so that the final book scan exactly matched the original book. Scanning books of 1,000 to 1,500 pages was routine.

In general, soft cover books were easier to cut, in that the separate step of cutting the hard cover off was not required. In the end, when cutting the pages, soft and hard were the same.

In the process of creating the digital library I set up a number of steps, one of which, scanned the front and back covers of each book before it was cut. There were two reasons for this: the full extent of the cover was scanned (the cover on soft cover books was slightly smaller after the cut) and it was an independent check on the books scanned. That is, sometimes a scanned book was “lost.” I would not know this in the final stage of the process unless there was an independent check – the book cover scanning provided this.

Another process was to name the file for the PDF of the book its title, author. Since all the books had OCR they were searchable. Further, scanning a hardbound book would not have dust jacket, thus, the cover scanning captured the dust jacket and this was added to the final version of the PDF book. Having the cover as the first page of the PDF was excellent. That is, in Windows the large format file display was selected. Even for PDF files the first page is seen, which is the cover based on the procedures outlined here. When I open the folder with the PDF books it looks like a bookshelf with the covers of the books visible.

The process steps included the following: place a group of books into a plastic bin, typically 25 – 30. The bin was numbered. Scan the front and back cover of each book. Cut the books. Scan each book. It would typically take 4 – 6 hours per bin from the books to OCRed PDF files. The books, when done, were discarded. Off and on, the 1,000 books were turned into the digital library in 3 months.

The fi-6670’s and the paper cutter were purchased used. With diligence and care the quality of all the units were excellent and the price a fraction of new. I will eventually sell everything and the net result all this hardware was basically on “rent.”

For reading I use Acrobat reader on the PC and GoodReader on iOS devices. This latter app is superb.

User avatar
Posts: 2786
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States

Re: Creating a Digital Library

Post by daniel_reetz » 31 Dec 2019, 18:59

Thanks for sharing all this experience, John_Latta.

Posts: 63
Joined: 03 Sep 2010, 13:23
Number of books owned: 0
Location: Calgary, Alberta, Canada

Re: Creating a Digital Library

Post by recaptcha » 31 Jan 2020, 11:20

Hi John. Thanks for posting.

Just curious, what kind of cutter is that? And how much did it cost? I think I might need something like that.

Posts: 6
Joined: 25 Sep 2013, 16:54
E-book readers owned: iPad
Number of books owned: 10000
Country: US

Re: Creating a Digital Library

Post by John_Latta » 03 Feb 2020, 18:25

This is a low cost Chinese paper cutter. It was mostly found with its model number - 450VS+ and 480VS+, mine is a 450VS+. Cost about $900. I bought it used.

This is widely available on eBay and other sources. I recall even Amazon has it listed.

But this is not the full story. There are several issues to address with a paper cutter.
(1) Shipping.
(2) Weight and placement
(3) Source
(4) Blade sharpening.

I have had to contend with all four.

Virtually all paper cutters are shipped via truck. I have not had good experience with freight. In spite of excellent packaging the freight company dropped my unit even when it was on a pallet. It took over a month to finally get it addressed. I was an eBay buyer and the seller was excellent.

Once on location finding a place and moving the unit can be difficult. It takes 2 to lift the VS450. Do not place the unit outside as this will rust the blade. Mine is inside. Most commercial paper cutters weigh a lot more.

There are many paper cutters available. I was on a look out on Craigslist and eBay for months. The are a few brand names of the cutters that service the print industry. Frequently when print shops close these come up at reasonable prices. Such units are heavier still. These are industrial grade but I did not need this quality level and they were larger still.

The effectiveness of the unit is based on how sharp the blade is. I have 3 spares and can replace blades easily. Finding a company to sharpen the blades was a challenge - these are specialty companies. That said, I found that a sharp blade lasted me about 2,000 books. Most printers with paper cutters do not think of "debinding."

Overall I am very pleased with the unit. Cutting books is a process and I have this down. The paper cutter is just a tool in that process.

Post Reply