Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

finishing project, keep old files

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.
Post Reply
joseph73
Posts: 17
Joined: 10 Jan 2013, 22:02
Number of books owned: 1000
Country: USA

finishing project, keep old files

Post by joseph73 » 30 Jan 2016, 23:48

Hi,
When I'm finally finished processing all the images (jpegs) and have created one PDF file, do you still keep the old TIFF and original JPEG images?
They're taking up a ton of storage space. Would I ever need them again, having the original full quality files, say if a better OCR program came out?
I'd hate to have to capture them over again. I'm thinking of archiving them to bd-rs but that would take effort.

cday
Posts: 224
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: finishing project, keep old files

Post by cday » 31 Jan 2016, 08:17

joseph73 wrote:When I'm finally finished processing all the images (jpegs) and have created one PDF file, do you still keep the old TIFF and original JPEG images?
They're taking up a ton of storage space. Would I ever need them again, having the original full quality files, say if a better OCR program came out?
In the future as you say better OCR software can be expected, or you might for example wish to use Adobe ClearType or any future similar software technology to produce output images that use vector text rather than bitmap text, resulting in potentially both better image quality and a reduction in file size; to use either, or any other future technology, you would clearly need original images to process.

Good quality JPEG images can reasonably be expected to provide suitable input for any future processing, and JPEG files with suitable compression are normally much smaller than TIFF images for colour or greyscale images; for black and white images (1-bit depth files rather than black and white pages saved in colour or greyscale) TIFF images with suitable compression can be very small. So retaining suitably compressed JPEG files should provide satisfactory input for any foreseeable future need and also be reasonably economical on space, especially considering that the cost of storage continues to decline rapidly.

As the JPEG format is for technical reasons less suited to images with sharp edges such as text than for photographs, the file size of a page of text when compressed to a level that still provides a good quality image might be of the order of a megabyte or two, the actual size depending on the complexity of the page and a subjective assessment of the quality of the image displayed. Often original JPEG files can be recompressed using a lower quality setting than used when the file was created with little or no visible loss of image quality, resulting in a useful reduction in file size. However JPEG is a 'lossy' format, so it is better to compress the original file than to recompress a file that has already been repeatedly compressed.

If you wish to consider retaining the JPEG files and space is a consideration, it might pay to open sample files in an image editor and then experiment with compressing them at lower, possibly substantially lower, 'Quality' settings and then compare the resulting images on the screen. As screen quality is likely to increase in the future and the cost of memory is declining it would probably be best not to be too aggressive, though. If recompression at a lower quality setting produces a worthwhile reduction in file size, all the images for a book, or multiple books, can easily be batch compressed using any of a number of freeware image editing programs. But always test with copies of your files.

joseph73
Posts: 17
Joined: 10 Jan 2013, 22:02
Number of books owned: 1000
Country: USA

Re: finishing project, keep old files

Post by joseph73 » 31 Jan 2016, 16:50

I looked at recompressing the jpegs but think that might defeat the purpose of having them. These images were taken with a high resolution camera (think similar to d810 Nikon, not as high as the 50mp canon however), around 40 megapixel. The files are large, even jpegs. I considered compressing the files with jpeg2000 but cannot find a good way to batch process them. Using adobe acrobat pro, you can combine the original images into another archive pdf and specify jpeg2000 compression, no downsizing. This is very slow even on a fast machine. It might be what I end up doing. The file is about 40% what the jpegs are and there is little image quality loss. In the end I might just end up with a blu ray burner and a bunch of discs, hoping that the discs last for 4-5 years. It's about 4TB worth of files so its going to be a lot of discs and time, each jpeg about 4 to 10mb depending. Why so big, these books have large pages, images, graphics with lots of tiny detail.

cday
Posts: 224
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: finishing project, keep old files

Post by cday » 31 Jan 2016, 18:48

If your PDF files with JPEG2000 compression are around 40% of the size of the JPEGs from which they were created, that is more of a reduction than I would expect from what I have read in the past and my own very limited tests, so if there is no visible loss of quality it might still be worth at least doing some tests with recompressing the original JPEGs at a slightly lower quality setting. I wouldn't be surprised if you could at least halve the file size, but the only way to be sure is to make some tests.

I should have mentioned originally that to some extent at least your PDF files could be repurposed in the future using the images in the existing files: if necessary those images can be easily extracted as JPEGs for reuse, and some operations such as OCR'ing with newer software could likely be done directly on the existing PDF files, although one or two current applications (including I believe Adobe Acrobat) for some reason decline to OCR a file that is already searchable. A possible limitation, however, is that depending on how the PDF files were created, the images in the file might be inferior to the original JPEGs from which the file was created, if higher compression, or even down-sampling, was used. The interface in software used to create PDF files often doesn't give a very clear indication of the exact process that will be used.

joseph73
Posts: 17
Joined: 10 Jan 2013, 22:02
Number of books owned: 1000
Country: USA

Re: finishing project, keep old files

Post by joseph73 » 31 Jan 2016, 20:08

I'm using lossy compression for the jpeg2000, about 80%. There is some very very minor softening of the edges. At very high res it becomes pixelated whereas the original isn't. It's nice to have one big pdf file. In the optimize setting you can control the downsizing, amount of compression, etc. Using the normal cleartype text?? you cannot. I'm probably going to just buy a few hundred LTH bd-r's and burn them to disc as original jpegs when I'm sitting at the computer anyway. If the discs last a couple of years even, that will be enough. I will check the quality of the discs before buying a lot. By that time the M discs will have come down in price. My thinking is two unreliable storage systems are better than one. When hard drives fail it takes everything. When optical fail, they piecemeal.

cday
Posts: 224
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: finishing project, keep old files

Post by cday » 01 Feb 2016, 07:40

joseph73 wrote:I'm using lossy compression for the jpeg2000, about 80%. There is some very very minor softening of the edges. At very high res it becomes pixelated whereas the original isn't. It's nice to have one big pdf file. In the optimize setting you can control the downsizing, amount of compression, etc. ... I'm probably going to just buy a few hundred LTH bd-r's and burn them to disc as original jpegs when I'm sitting at the computer anyway...
Burning the original jpegs to discs is certainly the simplest backup solution.
joseph73 wrote:Using the normal cleartype text?? you cannot.
Whoops, I meant Adobe 'ClearScan' when I wrote in my first post about possible future processing options that someone might want to keep open. If you are not familiar with ClearScan it is something you could possibly explore, with its potential to provide both high quality searchable vector text and greatly reduced file sizes, at least in favourable circumstances, although of course it would take time and you would need to proceed with caution. Given the file sizes of your images, you would probably need to create the new output PDF in sections and then combine them later.

joseph73
Posts: 17
Joined: 10 Jan 2013, 22:02
Number of books owned: 1000
Country: USA

Re: finishing project, keep old files

Post by joseph73 » 01 Feb 2016, 20:27

I've used the clearscan text, or whatever they call it now. It is nice, looks great, and does reduce file size. The OCR layer of adobe isn't as accurate as some other programs. Say in ten years a incredible program could come out that needs original jpeg quality files.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest