Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Digitizing Homeschool material and illustrated books- help?

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.
Post Reply
PacMama
Posts: 3
Joined: 28 Apr 2013, 15:50
E-book readers owned: iphone, Kindle Fire HD
Number of books owned: 2500
Country: USA

Digitizing Homeschool material and illustrated books- help?

Post by PacMama » 29 Apr 2013, 10:17

Hello! I am new here, but have been lurking/reading for a few weeks now. I have done multiple searches and read threads, and I can't seem to manage successfully completing my task. I would really appreciate some help, if anyone has the time and patience! I'm on OS X Mountain Lion, if it matters, and my computer skills are somewhere around advanced-novice. If it can be googled, I can usually do it, but without a headache and a lot of time! :lol: Also, I'm a rambler... I'll summarize in bold at the end of this, if you want the ClifNotes version! Moving on...

I am trying to digitize some of the media in our home, including DVDs, music (lotsa CDs!), and BOOKS! I am not interested in scanning mine or my husband's books, however, and will be using an all-in-one printer for my scanning needs. I don't have another option at this point, nor the time to set up a whole dedicated scanning station. I want to scan illustrated childrens' books that we already own, so that my son can read them on an e-reader, on the go. He is only 4.5, but has been learning to read, mostly self-driven, for the last 6-8 months! I can't keep up with the guy! Some of our very favorite books aren't available in the ebook format yet, so I thought I might be able to work around that issue by doing this. Additionally, we'll be doing a summer bridge/kindergarten program over the summer, and I want to be able to scan some of his learning materials in so he can read them on the go. FIrst things first- everything I want to digitize DOES fit on the scanner... I had to eliminate a few that were too big! ;)

So this is what I tried:

I scanned Jamie Lee Curtis' FABULOUS "I'm Gonna Like Me: Letting Off a Little Self-Esteem," using Image Capture, compiling each page into one big Pdf file. I think I did it at 150ppi. The resulting filesize was ginormous! I then followed a tutorial (and tips from many threads I found here) that said to use PDF Editor Pro for Mac. After I scanned each page again, as individual PDF files, it put the files into a Word program, which then saved as html (I think?) Then I used Calibre to convert that to epub, and then to mobi for the Kindle. Sounds good right...? Nope. Unfortunately it didn't work. The images in my Kindle, and in the Aldiko program on the Google Nexus, were not only tiny- taking up about 1/4 of the screen, but they were mirror-images. I checked and rechecked my settings, and I just don't know what I did wrong. I'd just buy the ebooks (our favorites!) but they aren't available in e-format, and I still need to figure it out for our homeschool material and worksheets that I come up with. Oh also- something to think about here is that the text in many of these books is swirly-whirly, and incorporated in with the pictures. I think that's why the OCR program didn't recognize the file "slides" as text.

I am feeling super frustrated that I can't manage something that would seem to be an easy task! I need the file sizes to be small, because my son only has an 8GB ereader device... the books can't be 1gb each, ya know? (And they are!) And the file size was large even after getting it to epub, larger than I'd like it to be, honestly.

Is there a kind, patient soul who can help me out? I don't even know what questions to google at this point because I thought I was on the right track. But clearly, I'm missing some major part of this process! Um and a bonus round- is there any way I could add page-by-page narration for some of these books? Definitely out of my league, but if that's even possible, it would be amazing to have as a feature for my younger son to be able to follow along with the story, in my voice. The Kindle audio-reader function is pretty lackluster!

Thanks!

Clifnotes:
*Need to scan illustrated books and homeschool worksheets/material, using an AIO printer with scanner, OSx Mountain Lion
*Current e-reader is a Kindle Fire HD, but that will likely change when I find something better
*Filesize needs to be fairly small/relative to content
*Much of the text is part of the image, not recognized by OCR software
*After converting to epub, image is very small on screen and pages are inverted horizontally/mirror-image.
*Help is much appreciated! Thank you!!

dtic
Posts: 446
Joined: 06 Mar 2010, 18:03

Re: Digitizing Homeschool material and illustrated books- he

Post by dtic » 29 Apr 2013, 17:11

if you want books that are mostly illustrations and only a little text then you could try scanning to individual jpg files, resize down to match the resolution of the kindle fire screen (or whatever image size you prefer), zip the resized images, rename the .zip to .cbr, move the cbr to your device and read it with a comic book reader app e.g. ComiCat.

Do a backup of the scanned images before downsizing so that you can reuse the images for a new cbr file later on if you get a bigger tablet with a higher resolution screen.

If you want to scan books that are mostly text and only few illustrations then try scanning to jpg, process images in Scan Tailor and only thereafter convert to pdf.

PacMama
Posts: 3
Joined: 28 Apr 2013, 15:50
E-book readers owned: iphone, Kindle Fire HD
Number of books owned: 2500
Country: USA

Re: Digitizing Homeschool material and illustrated books- he

Post by PacMama » 29 Apr 2013, 23:23

dtic wrote:if you want books that are mostly illustrations and only a little text then you could try scanning to individual jpg files, resize down to match the resolution of the kindle fire screen (or whatever image size you prefer), zip the resized images, rename the .zip to .cbr, move the cbr to your device and read it with a comic book reader app e.g. ComiCat.

Do a backup of the scanned images before downsizing so that you can reuse the images for a new cbr file later on if you get a bigger tablet with a higher resolution screen.

If you want to scan books that are mostly text and only few illustrations then try scanning to jpg, process images in Scan Tailor and only thereafter convert to pdf.
Thank you! I really appreciate this- so basically does this mean that there isn't a reasonable way to do this in a regular ereader format? Trying to make it as simple as possible for my pre-k son (I like to lock him into his sandbox app on my GN7! ha ha )

I really appreciate the step-by-step. I'll give it a whirl tomorrow!

dtic
Posts: 446
Joined: 06 Mar 2010, 18:03

Re: Digitizing Homeschool material and illustrated books- he

Post by dtic » 30 Apr 2013, 12:03

Well, if the source book is mostly illustrations and little text then there isn't much point to try to separate the images from a text layer.

If you really want to separate the two there is likely some way to do it, but I'm not sure what the best way would be. Project Gutenberg separates the text from the illustrations in its children's picture book section, e.g. http://www.gutenberg.org/files/19177/19 ... 9177-h.htm But if I was scanning illustrated childrens book I'd probably stick with simply the jpg files in a cbr file since that is closer to the reading experience with the physical book.

PacMama
Posts: 3
Joined: 28 Apr 2013, 15:50
E-book readers owned: iphone, Kindle Fire HD
Number of books owned: 2500
Country: USA

Re: Digitizing Homeschool material and illustrated books- he

Post by PacMama » 11 May 2013, 00:35

dtic wrote:Well, if the source book is mostly illustrations and little text then there isn't much point to try to separate the images from a text layer.

If you really want to separate the two there is likely some way to do it, but I'm not sure what the best way would be. Project Gutenberg separates the text from the illustrations in its children's picture book section, e.g. http://www.gutenberg.org/files/19177/19 ... 9177-h.htm But if I was scanning illustrated childrens book I'd probably stick with simply the jpg files in a cbr file since that is closer to the reading experience with the physical book.
Thank you! I think this is where I'm confused- how do I make that work? When I scan images as .jpeg, they don't stack together- each scanned page is its own file.

dtic
Posts: 446
Joined: 06 Mar 2010, 18:03

Re: Digitizing Homeschool material and illustrated books- he

Post by dtic » 11 May 2013, 11:20

PacMama wrote:
dtic wrote:how do I make that work? When I scan images as .jpeg, they don't stack together- each scanned page is its own file.
To make a .cbr for reading in a comic book app like ComiCat you put all the images in a .zip file and then rename the file extension from .zip to .cbz (the last z stands for zip; I think the .cbr extension should also work even if that originally stood for r as in rar, which is an alternative compression format). The app will then let you read the book.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest