Instructions / Documentation for BSW

Discussion about Steve DeVore's Book Scan Wizard, a power-user package to automate scan processing.

Moderator: peterZ

User avatar
JonEP
Posts: 81
Joined: 19 Apr 2010, 15:09

Instructions / Documentation for BSW

Post by JonEP »

BSW looks like it will be incredibly useful. However, the interface seems a bit daunting to me. As it has now undergone several revisions, and has numerous features and commands, and as there are numerous threads on the forum dedicated to various questions and issues, I am wondering if there exists a good starting point for the average end user to try to figure out what BSW is and how to use it. So far, I don't see a wiki or documentation available (or have I just missed it?). Is the forum the best bet, currently, for trying to figure out how to make use of BSW?

Thanks.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Instructions / Documentation for BSW

Post by daniel_reetz »

I agree that this is a need. Someone (perhaps me) really needs to make a BSW introduction video. Steve has made quite an effort toward support and documentation, but an intro video would still help a lot. Here's his main post on this topic: http://www.diybookscanner.org/forum/vie ... ?f=3&t=839

Now that I'm back in the US, I plan to spend some time on this. Here's hoping that works out. :)
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: Instructions / Documentation for BSW

Post by steve1066d »

It makes sense for me to do the video.. I think I'll have some time Friday to do it.
Steve Devore
BookScanWizard, a flexible book post-processor.
ibr4him
Posts: 102
Joined: 18 Oct 2010, 10:36

Re: Instructions / Documentation for BSW

Post by ibr4him »

+1.

I tried several times but can't understand a thing! Video would be great help.
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: Instructions / Documentation for BSW

Post by steve1066d »

I ended up being busy with other things over the weekend, but a video is on my list to do.

In the meantime, you could try following the example on the wiki.. It might be enough to get you started.
Steve Devore
BookScanWizard, a flexible book post-processor.
seasalt

Re: Instructions / Documentation for BSW

Post by seasalt »

Hello - I am new to this forum - and new to book scanning - new to acrobat - not very techncial and very new to posting in forums - so apologises ahead, if I use the wrong language/wrong place/wrong style.

i do love books though. So any help anyone wishes to send my way - I am very appreciative of.

I have read both apps sections on the 2 tools - BSW and ST - plus wiki and have questions. I am not sure which is best area, so I put under instructions as I could not find the answer anywhere. I have got ST working but am struggling with BSW. I would like to use BSW as many of my images required the same processing before i ocr + recreate pdf.

So I am delighted to find 2 tools that can help simple people like me. I have about 500 non-fiction books to scan in and I want to create SMALL sized (less than 10mb) magnficent searchable, bookmarked pdfs - not average. IMPRINT of the book is important to me - page numbering - table of contents etc...

workspace:
Macbook intel - 10.6X Snow L
Adobe Acrobat Prof 10.x and ABBYY Express for MAC for post processing. Also new, Graphic Converter
Source of book page is 1 of these 3 options:
- flat bed scanner, 300 dpi and can scan to TIFF, PNG, JPEG, JPEG2000 or pdf
- djvu extracted to PDF or TIFF document (using djvulibre/extract tool)
- (adobe digital editions) (ADE) and use print function to produce screen image PDF, which then I extract to TIFF

I am stumped as I dont get some basics - COMPRESSION (what sets it/how to) - IMAGE TYPES (text layers v images, and the different types) - the best workflow sequence.

specific questions (tell me if these should be in separate posts)

1- cannot get BSW to install on my macbook. I can use java version but I am not online often. I have DL successfully and now I have a bunch of scripts. install.txt tells me to do stuff with java, but java comes with SL 10.6x I found (thanks to anoymous1 post directing me to http://javatester.org/version.html) to determine version. I also have the firefox java plugin.
1.1 Can you direct me to step by step instructions please to install - and show me how I test if I have these in my system
per install.txt
Java Advanced Imaging (JAI) 1.1.3 http://java.sun.com/products/java-media ... 1_1_3.html
JAI ImageIO 1.1: http://download.java.net/media/jai-imag ... elease/1.1

how do I run "java -Xmx1024 -jar BookScanWizard.jar"
-- in terminal window? what do i type?

1.2 what do I set in the parameter override DPI
1.3 what do I put into field destination DPI (is it 300dpi for text and 600dpi for illustrations+text content)


2- dont understand how to get a "loseless" compression PDF and how compression works. I understand ST uses uncompressed TIFF. I don't know what BSW exports out to.
2.1) I don't know when is compression important - e.g. in final step or the entire way thru the process? (e.g. is it better to scan at high quality (600dpi with illustrations 300dpi if text - black and white - and turn off all compression parameters?) to TIFF and then set the compression in "create PDF tool"
2.2) Is there more than 1 type of compression e.g. compression at image and compression at out put document (e.g. PDF) or am I just plain muddled
2.3) what is the best settings for compress loseless in Adobe acrobat profession 10.x?
2.4) is there a better tool for MAC 10.6x users to create compressed PDFs (I use annotate alot in my work so, I prefer PDF to djvu still)
2.5) when people say "uncompressed TIFF", does that mean TIFF? or TIF? (I have both options in my scanner)
e) loseless means = no loss of quality and small in size, correct?
2.6) compression types that appear in the different apps
e.g. in BSW its G4 and deflate
e.g. in acrobat it appears to be in optiimized options tab - colour/grayscale - monochrome and size slider
e.g. in djvu export to TIFF has 3 options "force bitonal G4 compression - Allow lossy JPEG compression and then set jpeg quality number x - allow deflate compression"
2.7) if I scanned to PDF rather than to TIFF image (as it is quicker in scanning mechancis (e.g. scanner does not finish routine, is ready for next page turn), is there any loss in quality?
2.8) how does colour - greyscale - black white affect "image quality" and/or compression

3) ImAGE TYPE
3.1 I could not see in ST or BSW that speaks to layering. I understood unless we layer the text and the image separately the compression is impacted. am I mistaken? or is layer nothing to do with image type?

3.2 don't understand what is advantage or disadvantage of using TIFF v PNG
I get ST uses TIFF and why (per the post) but with flat bed scanners, TIFF files cause fuzzy as they scan 1 directional, so for big books, I scan to jpeg 300dpi and then extract to TIFF.
-- does this cause any loss in quality?

3.3) don't understand what JPEG2000 image type is - its in acrobat under optimised PDF option (colour/grayscale) - it is also in my scanner. But it is not in BSW or ST.

3.4) for covers and back of book what is the suggested image settings for a beautiful looking high gloss colour image?


4 IN terms of workflow
4.1 what is "keystone"? BSW mentions it. I have no idea what it is.
4.2 - is it best to not load cover/back into ST or BSW and process these images separately as they are different to rest of book
4.3 is it best to do the image detailed work first for all the content (e.g. 300-400 pages) e.g. lighten boxes with text in them, so the text can be read in ocr
4.3 - or is it best to split 2 page/rotate/crop first, then do the detailed image work
4.4 - what is best step and tool to scrub out handwritten notes? (I was thinking if I use ST content border and cropped at that, then most of the handwritten notes would be gone - as an alternative to scrubbing)
4.5 - in ST I could not see how to "sharpen the text". I saw make the text fatter or thinner but nothing about sharpening text
4.6 - is it best to add cover last (as colour and different resolution to rest of pages) - ST did not do a clean cut on edges - or probably me as the border function did not make sense to me. I find cropping much easier.
4.7 - is the only way to get correct page numbers and table of contents in the "created PDF" to set them in WORD first? or is there some other ideas out there??
4.8 I could not find option in ST or BSW to change background colour e.g. old books its a bit yellow

5 software problems - in my testing for OCR quality
5.1 - if I create a huge files for OCR (ABBYY Fine REader MAC Express) keeps crashing e.g. 600dpi for 300 pages. But if I put it at 300dpi, it processes (is this a known issue in book scanning community??)
5.2 - is there a link "comparing OCR quaity" in acrobat - OCR in PDFmaker - OCR in ABBYY finereader that anyone is aware of

--- hopefully such a long post is ok ---
thankyou BIG TIME, in advance for any help
cheers
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: Instructions / Documentation for BSW

Post by steve1066d »

You should still be able to use the webstart version even if you are offline. I'm not exactly sure where it installs on a mac, but on a PC, it creates a shortcut in the Programs lists. Thats going to be easier than installing it manually. The manual version is meant for those that are wanting to run things from the command line.

The override DPI is used to specify the DPI of your source document. Its equivalent to setting the DPI in ScanTailor. However, there's other, automatic ways of setting the DPI in BSW, so if you are using those (like using the focal length of the camera, or scanning a barcode page first). So if you are doing one of those, you don't need to specify the DPI. If you are using a scanner, just enter whatever DPI your scanner was set to.

The destination DPI depends on what you are doing with the scan. If you are keeping the output as grayscale or color, my opinion is 300 DPI is sufficient. If you are converting to bitonal (black and white), you may find 450 or 600 will give you better quality. There's a tradeoff between file size and quality. (a 600 DPI is 4 times as big as a 300 dpi image).

Compression is tough to understand because there are tradeoffs and paramters to adjust.. so it isn't a real simple topic to begin with. The reason we do compression is to reduce the file size.

There's two kinds of compression.. Loseless, and lossy. Loseless will only modestly reduce the image quality. (say by 30%). Loseless can reduce things much better, but it can introduce changes (called artifacts) in the compressed image. If a reasonable compression value is chosen, the changes shouldn't be very noticeable. If you are dealing with bitonal text or line drawing images, loseless compression works quite well, and creates a small, loseless file. For color or gray scale, uncompressed images are quite large. (A letter size color document at 600 DPI is around 55 megs for a single page).

Normally, you want to bother with compression at the output, as you generally dont' care if an intermediate file is large (as those will just get deleted when you are finished). Also, you want to avoid lossy compressing thing things multiple times, as you will loose a bit of quality each time you do that (think of making a copy of a copy of a VHS tape).

When you create a PDF, you can choose the compression you want.
I don't use adobe acrobat, so I can't really say what to use. If you give me the options, I can help you choose.
If you have access to adobe acrobat, I think that is the best tool to use. It can read any format that BSW can create, and gives you many options. If you have specific questions on using it, create a new question in the software forum.

Tiff and tif are the same format. The format is officially tiff, but then you have the old ms-dos filenames, that could only have 3 letters, which lead to the .tif name.

loseless means no loss in quality and smaller in size. Lossy compression will compress much smaller, but you loose some quality. I think you are better with lossy compression unless you are dealing with bitonal images.

Yep... there's different compressions, and different apps can handle different ones...
Here's a quick take:
G4: a good lossless compression but only works for bitonal images.
deflate: a lossless compression that uses the same compression that .ZIP uses.

2.7: It probably is using lossy jpg compression in creating the pdf directly from the scanner, so yes, I'd expect some loss of quality.
2.8. I think I've answer that above... if you have further questions, let me know.

3.1: Adobe acrobat does some fancy stuff where it breaks apart the background from the text and handles each separately. In that way, the background can be compressed greater without affecting the image (using lossy compression). ST & BSW don't layer data. DjVu and Adobe acrobat have options to use them.

3.2 There's many options with tiff files (which means it can have different compression types, etc). PNG files are simpler. If a program handles PNG, then it can handle pretty much any PNG file. However, there's so many options with tiff files, that it is common to have tiff files that can't be read by certain programs. As far as which to use.. it doesn't really matter, as long as the software you are using can read them. I'm not sure why your scanner behaves differently on tiff vs. png and jpeg.

3.3: BSW can handle jpeg2000 images. (Though it is done using the SaveImage operation, instead of using the default .tiff output). It reads jpeg2000 files normally. jpeg2000 has the advantages that it provides better compression over jpeg, and provides a losseless compression. It also can compress to a certain size. So if you wanted your pages to each be 100K, jpeg2000 is the best way of doing that. The disadvantage is that jpeg2000 isn't universally supported. Browsers don't ready jpeg2000 images natively.

3.4. Standard lossly compression works fine.

So in summary..
Only bother with compression on the final creation of the PDF. Use lossy compression for grayscale or color images.

Hope this helps..
Steve Devore
BookScanWizard, a flexible book post-processor.
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: Instructions / Documentation for BSW

Post by Misty »

steve1066d wrote:Yep... there's different compressions, and different apps can handle different ones...
Here's a quick take:
G4: a good lossless compression but only works for bitonal images.
deflate: a lossless compression that uses the same compression that .ZIP uses.
In Acrobat, I'd recommend using the JBIG2 lossless or lossy option rather than G4. It's a much more efficient compression, which will ensure much smaller filesizes. Lossless is perfectly lossless (and so identical to G4 in terms of the image you get), while lossy tries to identify similar characters and replaces them. It reduces the filesize even further, but you do run a risk of it making a few mistakes and replacing some similar-looking characters that aren't supposed to be identical. You can find the settings for that in the "convert to PDF" section of Acrobat's preferences.

If I'm remembering right, Acrobat X now defaults to JBIG2 lossless for monochrome compression instead of G4 like in older versions. If you import TIFFs that are monochrome, like a TIFF G4 from BSW, it will use the compression settings from the "convert to PDF" option's "TIFF" section.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
seasalt

Re: Instructions / Documentation for BSW

Post by seasalt »

thankyou misty and Steve - VERY helpful. I will try offline the java version tonight.
ibr4him
Posts: 102
Joined: 18 Oct 2010, 10:36

Re: Instructions / Documentation for BSW

Post by ibr4him »

Was the tutorial video ever made? ;D
Post Reply