I think the http://www.documentsnap.com website can be safely assumed to be a Fujitsu marketing site, the 'snap' in the url being a giveaway as their scanners usually have 'snap' in their names.grisard wrote:There is a (positive) review of this scanner on this blog: http://www.documentsnap.com/scansnap-sv600-review. But I am not so sure whether this is an impartial opinion because the website http://www.documentsnap.com in general rather promotes Fujitsu ScanSnap scanners over any other producers (the site perhaps might be sponsored by Fujitsu).
This scanner does seem to have an inherent limitation compared with both flatbed scanners and typical camera-based scanners on the forum: on a flatbed scanner a book can be pressed down fairly firmly against the scanner glass to minimise warping, on camera-based scanners a sheet of glass or plexiglass is usually pressed against the open book to flatten the pages, but this scanner when used as shown with two fingers holding the pages down, lacks an effective means of flattening pages when scanning books.grisard wrote:There is also a 3 page test scan from a book on this blog (direct link to the test scan: http://cache.documentsnap.com/files/ear ... e-ages.pdf). Looking at this test scan leads me to assume that this scanner does not offer any dewarping.
But the scans obtained could be post-processed in the same way as other camera-based scans: the starting image quality is high although dewarping might be more challenging.
I selected a small amount of text and pasted it into Word and there was a paragraph break at the end of each *line* in the text in the image: this for some reason seems to be common when extracting text from PDFs whichever software was used to create the file. Looking at the PDF document properties (File | Properties... ) the software used is in fact licensed from Adobe.grisard wrote:The OCR result is questionable: word recognition seems to be o.k. but when you copy the hidden text from this pdf into Word, there is a paragraph break after each single word, i.e. each word has its own line! So from this roughly 4 pages of original text you get 21 pages in Word. I think this OCR result is completely useless.
When copying text from PDFs there is a reasonably simple solution: the MS Word 'Find and Replace' facility can be used to replace the [usually hidden] unwanted paragraph marks with spaces, so that the text flows normally between the margins. It should be possible to record a macro and assign it to a keyboard shortcut if doing it regularly. It is, however, necessary to reinsert paragraph breaks where they are present in the original image, which could be tedious for a long document.