Usually when I have a scanned book with text (and perhaps some black white diagrams or illustrations) I use scantailor to split pages etc. and then use djvubind (together with tesseract or cuneiform) to produce a small sized good looking and searchable djvu file.
However with magazines or books which contain lots of pictures (which sometimes reach the page margin) this would not produce what I want. Here are some problems with it:
- scantailor sometimes has problems to recognize the content it there are pictures which reach the page margin or if there are printed hints or annotations on the page margins. Then I have to adjust everything manually
- I tried scantailor in mixed mode to seperate text and images, but then I must define most pictures manually since the picture detecting algorithm seems not very mature
- But how to proceed afterwards? Djvubind would "destroy" the nice color or grayscale pictures.
So what's the best way (detailed steps on linux) to proceed magazines or books with lots of pictures to get a searchable pdf or djvu where the pictures are in good color or grayscale quality and where the file size is not too big?
Best way to proceed with scanned magazines etc
Moderator: peterZ
Re: Best way to proceed with scanned magazines etc
I just realized that djvubind can also handle tif files in mixed mode preserving colors. Nevertheless, what's your favourite way to proceed magazines or books with lots of pictures?