Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

E-book standard format

Discussions, questions, comments, ideas, and your projects having to do with DIY Book Scanner software. This includes the Stereo Data Maker software for the cameras, post-processing software, utilities, OCR packages, and so on.
User avatar
ceeann1
Posts: 106
Joined: 17 Nov 2010, 20:00
E-book readers owned: Several Palm PDA's
Number of books owned: 700
Location: Albuquerque, New Mexico
Contact:

E-book standard format

Post by ceeann1 » 03 Jan 2011, 20:10

Does this forum endorse any one standard for electronic books?

Personally I imagine it would be like herding cats to get everyone to agree... but it would be a good thing if we could agree on one format. Knowing of course there would be those of us that would prefer to use a different format we could just have a set of format changers as a group if we had somewhere to start from. The fact we can change format is not the problem, rather; the fact that we have no standard format is the issue. I think that having no one standard even within a particular language is like inventing the Tower of Babel all over again!! This is one of the fundimental problems with electronic books in my opinion. I can pick up any book that is titled in english and expect to be able to read it so long as is within my ability to read. I can not pick up an electronic book and read it unless I am willing to invest in the hardware and possibly the software to be able to read it (or spend the time decoding it in an unfamiliar language). I strongly believe that this is fundimentally incorrect for a democratic form of governance.

I believe that the more basic the standard the better! ASCII text would be one possible such standard, another might be the subject of this thread PDF autoflow, or "poor man OCR", or "OWR" http://www.diybookscanner.org/forum/vie ... ?f=3&t=777 , extracted word sized data to allow compression. It is not a simple problem but it is fundimental to what we are doing here. It is fundimental to the thread that popped up just recently from a person asking if they could request a book to be scanned. It is so basic to the reality of how we as a group or a nation choose to share our knowledge and teach each other. I believe we should set that standard if we can and strive toward it even if we can not yet attain that goal.

What is your opinion? What would you choose as a standard e-book format possibly for a group goal?

<Edit>
From the first 11 posts
An E-book standard format should:
1. Preserve the format of the book on a page by page basis.
2. Have searchable text with the location of the text available.
3. Have lossless compression to allow e-book format changes of the same quality
4. Allow for expansion of graphic content in the future.
5. Be as fast as possible to use (speed is an issue).
6. Allow for use by those who are visually handicapped
7. Allow for formats of differing purposes since there are real reasons for using alternative formats.

I believe that summarizes what we have been saying in so far as we have been speaking of a standard e-book format. I have not presented these in any particular order. I have no doubt inserted my own bias although I have tried to keep that as minimal as possible. I believe there is more to say. I really did not think of many of these objectives and I think they are really very insightful and important!! The analysis of the first 11 posts is in the 12th post http://www.diybookscanner.org/forum/vie ... t=10#p7643
Last edited by ceeann1 on 04 Jan 2011, 22:00, edited 1 time in total.

User avatar
Gerard
Posts: 154
Joined: 17 Oct 2010, 07:15
Number of books owned: 0
Location: Berlin (Germany)

Re: E-book standard format

Post by Gerard » 03 Jan 2011, 20:36

html with a JavaScript library
like this http://meyerweb.com/eric/tools/s5/

User avatar
ceeann1
Posts: 106
Joined: 17 Nov 2010, 20:00
E-book readers owned: Several Palm PDA's
Number of books owned: 700
Location: Albuquerque, New Mexico
Contact:

Re: E-book standard format

Post by ceeann1 » 03 Jan 2011, 21:13

Thats the idea. An open standard with a straight forward application. I would suggest that the vast bulk of americans are not fluent in Java. I would further state that the huge majority of english speaking people are not fluent in Java. I do like the idea though. It is good thinking.

It may be that all of us do not need to know how to bind a book. We do need to be able to see/ read the book.

User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: E-book standard format

Post by rob » 03 Jan 2011, 21:57

More and more I think that searchable PDF is a good ebook format. Someone else here -- I forget who, or I would link -- gave a good argument for PDFs maintaining the artistic intent of the author and publisher. I don't know any other ebook format that does that.

In the end, I want an image, and underneath, text that can be searched, tagged with the location of the text in the image, and that's what searchable PDF does.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.

dansheffler

Re: E-book standard format

Post by dansheffler » 03 Jan 2011, 22:42

I really like searchable PDFs as well. I agree with the argument that the page layout is desirable. Typesetting is an art-form developed over hundreds of years. Even the best edited ebooks usually look shabby in terms of layout. Aside from the aesthetics of it, though, as a scholar it is very important to me to be able to see footnotes on the page where they occur. The linked footnote systems (which are very hard to come by) are clumsy at best. In addition, while the scholarly world may eventually catch up to citing things by location, right now I need to cite by page number. Again, systems which incorporate the original print page number without the page layout are usually very clumsy.

I have never understood the desirability of reflow, someone went to a whole lot of work to make the page readable. Why ruin it? I understand that larger text is desirable, but the solution to that is simply really gigantic screens (jk).

As a teacher, PDFs are slightly desirable over DjVU since everyone knows how to handle a PDF, while there are many students who don't have DjVU reading software. It is very easy right now for me to send out a chapter of reading and have the entire class be on the same page (literally). Students without e-readers (nearly all) can bring print-outs to class.

User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: E-book standard format

Post by rob » 03 Jan 2011, 23:46

Just one other comment: I think reflowable layout will eventually die. They are here for now because (a) most ebook readers aren't large enough right now to handle a real page, and (b) web pages used to have to fit on small screens.

We no longer have small screens for web pages, and web page writing has become closer to desktop publishing than coding raw HTML. How small can you make your browser while viewing this page, while keeping all of the elements visible?

Sure, there can be several different editions of a book, all with a different layout, but that still doesn't change the underlying text which, remember, should be searchable and can be put on a separate layer. There shouldn't be a need, though, to reformat a book on the fly. Your ebook reader should be able to handle an appropriate edition.

There is certainly a place for reflowable text, but that is at the publisher's level, not the reader's level.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.

Anonymous1

Re: E-book standard format

Post by Anonymous1 » 04 Jan 2011, 00:59

I wish more people used DjVu. I get a huge compression boost from it (as I archive decaying books), but the only downside is that people think Adobe is better. Just because :x

The searchable PDF/DjVu :P option is good. I used it for a while, until I realized how much time I saved by turning off OCR in djvubind. I absolutely abhor books where pages with images and blank pages are not considered when numbering. It really throws everything off.

I suggest overthrowing the whole PDF monopoly and introducing a new eBook format which does everything that is desired of one. Just to be annoying and not conform to higher standards (just like Internet Explorer users).

I'm just ranting now...

User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: E-book standard format

Post by strider1551 » 04 Jan 2011, 07:21

Well I think everyone could guess what format I would vote for...

I agree with recent comments that a format that preserves the layout is preferred.
ceeann1 wrote:Knowing of course there would be those of us that would prefer to use a different format we could just have a set of format changers as a group if we had somewhere to start from.
I'm reading this requirement as "it should be easy, or at least possible, to convert a standard format ebook to any other format of ebook." If we agree to this requirement, I recommend that the format require loseless compression so that a secondary format can be made with the same quality of input.

Edit: And yes, you can go from DjVu to something else. "ddjvu" can output to pdf, tiff, pbm, pgm, ppm, pnm, or rle.

User avatar
daniel_reetz
Posts: 2776
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: E-book standard format

Post by daniel_reetz » 04 Jan 2011, 09:34

I fully agree with Rob. Screens are not going to have less resolution in the future, but books are going to contain more graphical content. Reflow, as we currently understand it, is a technical hack to overcome the wide variety of screen sizes and aspect ratios out there.

User avatar
JonEP
Posts: 81
Joined: 19 Apr 2010, 15:09

Re: E-book standard format

Post by JonEP » 04 Jan 2011, 15:50

@rob's comment that
Someone else here -- I forget who, or I would link -- gave a good argument for PDFs maintaining the artistic intent of the author and publisher
Indeed, a book is in fact not the product merely of the author, but also of the publisher, editor, typeface designer, book designer, and others -- a whole team who (ideally) work together to produce the entire look and feel of the book. When ebooks convert books into "text" that can be reflowed to fit the needs of the e-reader and its user, they strip away much of that work, and the outcome is a much less satisfying product. The typefaces and page design on a Kindle (not to mention contrast ratio, etc), are dishearteningly unattractive. This is one of the reasons that I hope future versions of scantailor or similar applications might find a way to preserve page formatting and more accurately preserve the look of the actual book that we are digitizing.

One thing I've noticed recently as I've begun using publisher-released e-books in concert with their actual physical books is that the text in the official PDF versions (those released by publishers, with different ISBNs etc.) often has been subtly "re-flowed", resulting in a line or two of text appearing on a different page in the electronic version as compared with the original. This is a bummer for me as an academic, as I'd like to be able to use page numbers accurately when I'm footnoting, and it's getting more complicated as different versions start piling up.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest