scanning & processing 78 RPM record labels

Post by iam2sam » 03 Jan 2019, 17:36

Hello. In mid-2017, I built Jonathan Duerig's Archivist Quill Scanner. At the time I was very focused on digitizing a high school yearbook for a 50-year reunion. I was on a strict deadline, and I sort of forced my way through the project, but the results were very good. I moved on to other (non-scanning) projects in the interim, but I am about to begin scanning again. I have a collection of about 1,000 78 RPM records that I inherited. I intend to digitize some of those. I first want to check with the collections at archive.org - no point in spending my time if there is a high-quality digital copy of the identical recording already available for download. I want to use the book scanner to help me check for existing copies by first scanning and (hopefully) digitizing the label information, so that I can produce a formatted list to compare to what is listed at archive.org. The discs are all black in color, and fairly reflective. Most of the labels are dark in color, with lighter type. Obviously, I am really only interested in capturing the circular label area in the center of the disc. There is some very small "circular" type around the label circumference on some of the labels, but I am only interested in processing the "linear" content. I'm not quite certain how unique these characteristics are in terms of scanning projects. I'm looking for advice on what obstacles I might encounter in the scanning effort, and any suggestions on post-processing software, or steps that might save me time. Thanks!

Re: scanning & processing 78 RPM record labels

Post by cday » 05 Jan 2019, 16:43

No-one else has replied yet so perhaps I can give some thoughts as a start:

o The Archivist Quill Scanner which you already have has two features intended for book scanning that are not really relevant to your project: first, it is designed to flatten the pages of a book, but the records you need to scan are inherently flat. Second, it automatically places two pages into position in one operation, but you need to place each record in place individually by hand. It might be the way to go, but a simple copy stand could also be used if overall it is a better solution.

o The center hole in the records could be used to quickly and accurately locate the records in a consistent position, if you can insert some kind of locating pin of about the correct diameter into the scanner bed. That would probably be possible without significantly damaging your Archivist Quill Scanner, although possibly more easily done on a copy stand as the locating pin can extend a little above the record. You might also consider easy ways of rotating each record so that the label text is about horizontal, the accuracy required depending on the actual post-processing needs.

o As you only need to image the center label, and reflections from the black surface of the records might be an issue, you could possibly arrange some kind of mask sheet with a hole matching the label diameter between the camera and the record being imaged. Alternatively and possibly more easily, or not, the label could be isolated using a software mask in an image editor, and batch processing images should be easy enough.

o There will also be plenty of scope for preprocessing your record mages to enhance the images and, if needed, convert white text to black.

o If you don't already have an OCR software to extract editable text from the label image, the Abbyy Screenshot Reader should be sufficient and is very inexpensive. You should be able to just step through the image files you create, select the area to be converted to editable text, and then paste the result into a table in a word processor. That may depend on how much of the label text you need to archive. Once in a table, the text can then easily be converted into a consistent typeface and point size, if necessary.

Alternatively, if you can touch type, might it actually be easier overall to read the record labels and type the text to be archived directly into a table?

Re: scanning & processing 78 RPM record labels

Post by dpc » 05 Jan 2019, 19:54

cday is right. It's worth determining if you can type the info you need from the label rather than scan the entire lot. If you were going to save the images of the record label for some reason (i.e. to display onscreen of a jukebox app) then it would probably be worth it to do the image mask, color clean-up/conversion, and OCR, but I think all of that is overkill for something that you could read and type in under 10 seconds per disc. You should be able to do those 1000 records in a day.

