Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Another Complaint About eBooks

Whatever.
StevePoling
Posts: 290
Joined: 20 Jun 2009, 12:19
E-book readers owned: SONY PRS-505, Kindle DX
Number of books owned: 9999
Location: Grand Rapids, MI
Contact:

Another Complaint About eBooks

Post by StevePoling » 20 May 2010, 19:02

Here's a story I hope you won't find too boring. But it introduces a problem I see and would like some feedback about. Yesterday my wife stopped by the neighbor's Estate Sale whereat she picked up an old, old book--a hagiography of Daniel Webster. (Political aside: Forget the Republicans, Democrats, Labor, Liberal, or Conservative parties. The Whigs have my endorsement!) The book was pretty cool and my wife and son suggested I scan it.

Sounded reasonable, but I went online first to see if it had already been scanned. It had. I downloaded ePub and mobi formats and felt pretty cool about it. Then I opened the books on my SONY and they were chock full of typos. The book-design of the eBook (as an eBook) was sucky, too. Obviously, someone at Gutenberg had scanned and OCRed the book in question. But the quality of the ebook, as an ebook, just wasn't there.

This makes me think that DIY book scans are like cassette tapes were in the 1970s. I recorded songs off the air for free, or dubbed it off an LP, but the quality on the hand-lettered cassette just sucked. I figure someone could make a modest sum taking Gutenberg texts and cleaning them up, making them pretty, and then selling the product for a few bucks. A price low enough that folks with a job won't fuss with pirating it. As sales drop off, you drop the price toward zero. Do this just raise the quality of what's out there.

Does anybody think this is worth bothering about?

User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Another Complaint About eBooks

Post by rob » 21 May 2010, 11:56

As I understand it, that's what The Distributed Proofreading project is for. They scan the book, then send it through 6 separate phases of proofreading (3 rounds of proofreading, and then 3 rounds of formatting) before finally submitting to Project Gutenberg. So this may have been one of the earlier ones which just got submitted direct to PG without going through TDP.

If you want, you can always scan it and upload it to TDP where it will become a project. There's a bit of work involved in that -- the copyright page needs to be scanned and emailed to them first for copyright clearance, then the scans have to be OCRd and the images and text FTPd to TDP. From there, a project manager is assigned, who makes sure the images and text are ok, and then the proofreading begins.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.

Afish
Posts: 34
Joined: 04 Mar 2014, 00:52

Re: Another Complaint About eBooks

Post by Afish » 21 May 2010, 23:22

Looks like a little more than half of the ebooks available on Project Gutenberg have been proofed (18,000 out of the 30,000 available).

User avatar
daniel_reetz
Posts: 2797
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Another Complaint About eBooks

Post by daniel_reetz » 23 May 2010, 09:54

It's a serious problem. At FOO Camp in Boston a few weekends ago, I learned that many publishers are actually scanning texts (instead of using original sources) and then cleaning up the scans with OCR software, which has spell-checkers that will then change words.

In the words of the guy that was telling this story (I can't recall his name, but I can probably find him) "eBooks are a f*cking pump and dump right now, I won't participate until that changes*.

User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Another Complaint About eBooks

Post by rob » 23 May 2010, 11:15

So basically it's no better than what I do myself: scan a book, send it through OCR, and hope for the best. Now I feel a little better about not buying ebooks anymore. Why buy an ebook that costs $9.99 when I can go to the store, buy it for $7.99, cut the text block out (it's a paperback, don't be sad!), run it through a ScanSnap, run that through OCR, and I'm done.

Of course, now that I read what I just wrote, I sound pretty damn cheap!
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.

StevePoling
Posts: 290
Joined: 20 Jun 2009, 12:19
E-book readers owned: SONY PRS-505, Kindle DX
Number of books owned: 9999
Location: Grand Rapids, MI
Contact:

Re: Another Complaint About eBooks

Post by StevePoling » 26 May 2010, 02:04

In my experience, the quality of the ebook depends upon where I'm buying it. The free ebooks can be pretty grim, which is why i'm not happy with them. The books I've bought from Baen or Amazon are great. Well designed, relatively error-free. These seem to be taken from the author's Word files with relatively few typos. Not significantly more error-prone than an N-th draft of some of my writing.

Conversely, a lot of the "free" books are like the cheesy cassette tapes--worth every penny. Now, I see that some guys are selling public domain titles for a couple bucks, Barnes & Noble, Amazon. Perhaps these are where i should expect some rudimentary ebook design and should come down hard on if they're just OCRed titles. B/c the texts are basically free, there's no barrier to entry should someone say, "i can do this better." And by the same logic, there's no barrier to keep a rip-off artist from saying, "i can make a quick buck." I don't have an answer, just a desire to encourage the former and discourage the latter.

mutantstrain
Posts: 15
Joined: 22 May 2010, 02:43
Number of books owned: 1200
Location: Lost in Arizona
Contact:

Re: Another Complaint About eBooks

Post by mutantstrain » 30 May 2010, 06:14

rob wrote:, run it through a ScanSnap, run that through OCR, and I'm done.

Of course, now that I read what I just wrote, I sound pretty damn cheap!
Just a few thoughts...

I just got a Scansnap from amazon, and promptly returned it. On a test book ... it was within minutes ... smearing pages and collecting dust. $440? no thanks. Low volume, letters, contracts... ? Sure. High page number books ? No way.

IMHO there will be *ZERO* comparison between a mechanical single line camera and a full size camera. Personally I see no future for the line scanner tech. Its just frought with too many problems that will not be overcome easily. A point and shoot on the other hand is just soo much simpler to deal with. The reason is, is that the single line scanner needs mechanical movement to scan the page. The movement has to be uniform across the page and at a constant rate. If *any* dust get introduced into the system, the flow breaks down. Its not that it doesn't work ... its just not what *I* want for my ebooks.

univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Another Complaint About eBooks

Post by univurshul » 09 Jun 2010, 05:02

StevePoling wrote:...This makes me think that DIY book scans are like cassette tapes were in the 1970s. I recorded songs off the air for free, or dubbed it off an LP, but the quality on the hand-lettered cassette just sucked. I figure someone could make a modest sum taking Gutenberg texts and cleaning them up, making them pretty, and then selling the product for a few bucks. A price low enough that folks with a job won't fuss with pirating it. As sales drop off, you drop the price toward zero. Do this just raise the quality of what's out there.

Does anybody think this is worth bothering about?
--I've been thinking along these same lines, and I'm a serious vinyl collector! But you know: I listen to music most of the time on my iPhone. I take my music everywhere, and a ton of it. I transcribe my records to iTunes, I have all my albums finally transcribed.

It's nice to have a wall of records. It's nice to have a wall of intriguing books. But when it comes down to it: all this stuff; the vinyl, the tapes, the books: the information they carry and how it's delivered will inevitably change forms until one day it's possible to completely store and access directly from the mind. So if it feels like the dubbed-tape era for books, I can agree with that. But that opens an opportunity for publishers to re-release their books in HD digitally when we have displays and resolution quality that exceed the capacity of the human eye. It will happen, but right now, yeah, pump and dump. I would expect no less from an industry that robs college students for no other reason than to keep printing presses , editors, and publishers grossly profiting on the same regurgitated information every semester.

For me: scan and process so it's possible to read through a book without hating yourself for the hour you killed scanning, processing, etc. Scan with the assumption that one day the only copy you'll have is the ebook.

The price of books is indeed falling. If you've taken a trip around half.com, you'll see that. So many deals begs the question if reading has taken a backseat to internet skimming, sound biting, news bits, and referencing from databases.

spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: Another Complaint About eBooks

Post by spamsickle » 09 Jun 2010, 14:20

mutantstrain wrote:I just got a Scansnap from amazon, and promptly returned it. On a test book ... it was within minutes ... smearing pages and collecting dust. $440? no thanks. Low volume, letters, contracts... ? Sure. High page number books ? No way.
I love the ScanSnap. I haven't noticed any smearing pages, and the page dust that collects I just brush or blow away. I absolutely prefer it for high-page-number books, low-page-number books, and especially pocket paperbacks. I can optionally OCR it as I scan, though I usually don't bother. When I'm through scanning, I'm basically through processing, unlike the DIY scan which requires more work even to meet my low standards.
mutantstrain wrote:IMHO there will be *ZERO* comparison between a mechanical single line camera and a full size camera. Personally I see no future for the line scanner tech. Its just frought with too many problems that will not be overcome easily. A point and shoot on the other hand is just soo much simpler to deal with. The reason is, is that the single line scanner needs mechanical movement to scan the page. The movement has to be uniform across the page and at a constant rate. If *any* dust get introduced into the system, the flow breaks down. Its not that it doesn't work ... its just not what *I* want for my ebooks.
I don't know what you mean that "the flow breaks down" if there's any dust in the system. My experience is that the pages still feed through just fine. If dust is stuck on the scanning element, it's usually brushed off by the next page that feeds through. If it isn't cleared automatically, I may get a streak and want to re-scan, but that's a pretty rare occurrence.

Advantages (for me) to using the ScanSnap:

No keystoning
Pages are always straight and perfectly cropped
Acceptable quality even at its fastest ("Normal") speed for most books
Fast is really pretty fast for a $400 product
No post-processing required
Good compression in PDF output

Disadvantages:

Maximum page width is 8.5 inches
Book must be destroyed
When it breaks, I can't repair it myself (OTOH, the same is true of the cameras in my DIY scanner)

Honestly, for any book that isn't too valuable to destroy, or too large to fit, the ScanSnap is my digitizer of choice. I just fed 2000 onionskin pages through mine yesterday at the highest quality setting, and needed to re-scan less than ten.

mutantstrain
Posts: 15
Joined: 22 May 2010, 02:43
Number of books owned: 1200
Location: Lost in Arizona
Contact:

Re: Another Complaint About eBooks

Post by mutantstrain » 10 Jun 2010, 23:15

Sheesh. I never considered re-scanning pages with the scan snap. duh!! :oops: Well ... I guess I just needed motivation to finish my DIYbookscanner!! 8-) The good news is, I finished this past weekend and I have scanned about 9 books so far!! yippie! When I get my groove I can go as fast as 1000 pages an hour. So thats cool. Obviously doing it the DIY method has its problems also like pages getting tilted and words getting chopped off, but thats only if I am going too fast and not paying attention.

Post Reply