Search found 6 matches

by Mondotofu
17 Sep 2010, 21:27
Forum: Chat
Topic: What's broken about eBooks? What would un-break them?
Replies: 43
Views: 42848

Re: What's broken about eBooks? What would un-break them?

if (as Wikipedia http://en.wikipedia.org/wiki/Extensible_Metadata_Platform says ) XMP is most commonly serialized and stored using a subset of the W3C Resource Description Framework (RDF), which is in turn expressed in XML. , then this could be relevant to the Semantic Web, as RDF http://www.w3.org/...
by Mondotofu
16 Sep 2010, 23:56
Forum: Chat
Topic: What's broken about eBooks? What would un-break them?
Replies: 43
Views: 42848

Re: What's broken about eBooks? What would un-break them?

Well, I hate replying to myself, but I meant to add that there's an ambitious project to build a Semantic Web. http://en.wikipedia.org/wiki/Semantic_Web that would help us understand context and content of digital content. An important pre-requisite to the Semantic Web is to make digital content and...
by Mondotofu
16 Sep 2010, 23:31
Forum: Chat
Topic: What's broken about eBooks? What would un-break them?
Replies: 43
Views: 42848

Re: What's broken about eBooks? What would un-break them?

Metadata! To archive and distribute books, good metadata will become more essential. Agreed. Metadata will help us sift through book collections. Some electronic documents are rife with metadata, for instance, emails. There's the usual stuff: sender, recipients, subject, and body, but emails can co...
by Mondotofu
14 Sep 2010, 23:21
Forum: Chat
Topic: What's broken about eBooks? What would un-break them?
Replies: 43
Views: 42848

Re: What's broken about eBooks? What would un-break them?

@ univershul -- Not all PDFs are generated the same. With Apple and now with some desktop Linuxes like Ubuntu, you can Print to PDF. It just means that you get an image of the content -- in most cases it's another rendiering of the content to a bitmap and then to a page which can be far away from it...
by Mondotofu
11 Sep 2010, 22:44
Forum: Show and Tell / Book Projects
Topic: DIY scanner and Scan Tailor processed books on Google Books
Replies: 16
Views: 75245

Tesseract without compilation

[ There are three decent (depending on your needs and skills) options for open source OCR right now. Tesseract, Ocropus, and Cuneiform. Tesseract - http://code.google.com/p/tesseract-ocr/ - the development version has to be built from source in order to get page layout analysis. See http://code.goo...
by Mondotofu
11 Sep 2010, 16:41
Forum: Introductions and connections
Topic: Post something about yourself here (The Hello Thread)
Replies: 441
Views: 657915

Re: Post something about yourself here (The Hello Thread)

Hello Thread -- Thirty years ago I had a student job at the Georgia Newspapers Project ( http://www.libs.uga.edu/gnp/ ) where I prepared old and crumbling newspapers for the microfilm cameras by taping up tears and flattening folds and creases with a hot iron. I wondered why they couldn't be OCR-ed,...