Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

ABBYY 12 - why are end-of-line dashes an odd character?

Convert page images into searchable text. Talk about software, techniques, and new developments here.
Post Reply
glenleslie
Posts: 17
Joined: 13 Aug 2012, 09:08
E-book readers owned: Kindle - multiple platforms
Number of books owned: 1000
Country: United States

ABBYY 12 - why are end-of-line dashes an odd character?

Post by glenleslie » 13 Mar 2021, 15:16

I've noted over many scans that dashes at the end of lines turn into an odd character

¬


ASC character 172 is substituted for dashes ... is there a way to tell ABBYY to always use a specific character to replace what it thinks it found?

Obviously it's sort of a perfectionist problem. The only time you see this character is if you use a PDF reader to reflow the text or if you export the project to a text format. Wondered if someone knew a quick way to address this.

orwell_review_ocr_issue.jpg
orwell_review_ocr_issue.jpg (273.4 KiB) Viewed 2276 times

cday
Posts: 300
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: ABBYY 12 - why are end-of-line dashes an odd character?

Post by cday » 13 Mar 2021, 16:03

I vaguely recognise the problem but I spend my time in Linux now (since Windows 10... ;)) and rarely have a need to fire up my Windows 7 computer with FineReader on it.

I fully understand perfectionism! I think the symbol is a specialised typographic symbol, probably indicating that the text should be a single word without a space if the text is re-flowed to fit between different margins, I haven't been able to find a name for it, though.

I presume those symbols are present in the final output file when it is viewed?

Does FineReader possibly have a 'find and replace' facility you could use to remove them, I would have to find the PDF guides to maybe find the answer to that? Or some configuration options??

One would think that your issue must be a common one, it is not at all evident why any normal user would want those symbols, so possibly there is an answer online, or maybe Abbyy has a forum of some kind.

BruceG
Posts: 72
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: ABBYY 12 - why are end-of-line dashes an odd character?

Post by BruceG » 13 Mar 2021, 23:07

This is something I also have noticed. It is a symbol that I have not been able to reproduce.

Just looking at two documents recently scanned. A newspaper in the 1940's the hyphen was recognized as per your document. The magazine from 2000 the hyphen was recognized as a -. What difference does this make to the output file. I often save magazines with text on top of the image as well as text beneath the image (as text is edited). Newspapers are only saved with text under the image (as the text is not edited - I only have one life).

These both were saved with text on top to check what happened. A paragraph in each of the Abbyy files was copy & pasted into word, to see if it was the same or different than the pdf output.
What I found was both symbols in Abbyy produced a - hyphen in the outputted pdf and when copy and pasted into Word. When the formatting changed in Word the hyphen disappeared if not at the end of a line.

One problem I would like to find an answer is how to fix margins that are not straight.

BillGill
Posts: 122
Joined: 18 Dec 2016, 17:13
E-book readers owned: Calibre, FBReader
Number of books owned: 7000
Country: USA

Re: ABBYY 12 - why are end-of-line dashes an odd character?

Post by BillGill » 14 Mar 2021, 10:27

I have FineReader 14. I have it output the text file to Word 16 and those marks show up, but only if I have Word set to show hidden characters: carriage returns, page breaks, etc. When I am through proofing the text I convert it to the EPUB format. At that point those characters, whatever they are, disappear, so they aren't a problem for me.

Bill

BillGill
Posts: 122
Joined: 18 Dec 2016, 17:13
E-book readers owned: Calibre, FBReader
Number of books owned: 7000
Country: USA

Re: ABBYY 12 - why are end-of-line dashes an odd character?

Post by BillGill » 14 Mar 2021, 10:30

Now I think of what I should have added.

My biggest problem is that em dashes are detected as simple dashes. I have to watch for those all the way through.

Bill

Post Reply