Hello, does anyone know of a trick/method to enable easy proofing of OCR text, possibly with ImageMagick or some other image manipulation tool. Adobe Acrobat X (mac) option to review OCR suspects one by one is slow.
This is my manual process to enable the proofreading (looking for a tool to automate)
Page by Page process
1 make pdf page double width (so I will be able to view the "text" and "image of text" side by side
(note: none of the ocr engines ive tested actually use featue of acrobat, "Layer" (e.g. layer text, layer image ), rather both text and image are in the same (one) layer)
2 using Adobe Acrobat X plugin (enfocus PitStop)
select page contents, change FILL to ON (the text is invisible/hidden/transparent as font fill is OFF)
(this gives the glassy 2 layer look, but not really layers)
3 unselect contents, then move the "image/bitmap" to right next to text
then I edit the OCR'd text as follows:
1 spellchecker
2 remove scannos (common scanning errors e.g. [ for J, 3 for S etc...
3 fix text because of my underlining in the book
NOTE: this method is for searchable image (EXACT) option selected in Adobe acrobat X (mac) OCR engine or ABBYY or READIRis
it does not work for AAX clearscan option, as clearscan is a different OCR method (it creates a new font, type 3, so editting is crazy crazy territory.. as fonts in PDF lands are not markup languages, they a blobs on a page)
4 then I either delete the image or (return to as-is e.g. step 3, 2)
5 return page size
Proofreading OCR text (text under the image)
Moderator: peterZ
Re: Proofreading OCR text (text under the image)
Hi,
first thought was: have a closer look at fire-text.
I saw that Firefox-Plugin a while ago and perhaps it is partly what you are looking for. As i understood the description right, it let you have (ocr'd) text and (scanned) image files in two directories, loading both into your browser and then let you edit the text files while compairing text and image.
Cheers,
Marcus.
first thought was: have a closer look at fire-text.
I saw that Firefox-Plugin a while ago and perhaps it is partly what you are looking for. As i understood the description right, it let you have (ocr'd) text and (scanned) image files in two directories, loading both into your browser and then let you edit the text files while compairing text and image.
Cheers,
Marcus.
Re: Proofreading OCR text (text under the image)
thankyou marcus.
the text and image are in the same file (pdf).
the text is underneath the image.
I use PDF to retain all the formatting, and ultimately I want a PDF to read my book.
---
I will check the link out as I do use Firefox. thankyou
the text and image are in the same file (pdf).
the text is underneath the image.
I use PDF to retain all the formatting, and ultimately I want a PDF to read my book.
---
I will check the link out as I do use Firefox. thankyou