Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Making the Select Content Box fixed... or disabled?

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.
univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

voided post

Post by univurshul » 25 Aug 2010, 21:17

duplicate post errors here...maybe an issue with my internet browser.
Last edited by Anonymous on 25 Aug 2010, 23:35, edited 4 times in total.

univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Making the Select Content Box fixed... or disabled?

Post by univurshul » 25 Aug 2010, 21:22

JonEP wrote:I too wish it were possible to "set fixed content area" on the "select content area" dialogue. If one could set a fixed content area, and then go back and manually adjust those pages that need fixing, it would be so much easier for me to use Scan Tailor....In any case, please count me in as someone who would appreciate a GUI addition to allow this feature.
Jonathan
NB. The main culprit that seems to require endless adjustment: chapters that start 1/3d of the way down the page, or solitary section titles in the middle of a blank page... Очень расстраивает! (Но спасибо вам за вашу работу)
I'm claiming the errors in the content selection box are mainly driven by the following: 1) subject-matter captured around the book; 2) horizontal reflections on the opposing glass pane (for angled platens); 3) how the user enters in DPIs upon project creation; 4) exposure/focus settings on the camera.

The only thing I can see being of benefit is a batch crop utility that helps the algorithms with splitting and content selection at later stages. I've personally noticed that if you don't have a very clean, black cradle, you're suspect to get errors in page-split and content selection. I also ran tests on images that have large borders of the opposing glass pane reflections; if you scan an entire book without filling the optics you'll also run the risk of splitting and selection errors in Scan Tailor. And the reason some of us don't pack the book page to the end of the optics is because we're using less-than-stellar cameras that have innately blurred optics near their perimeters....And this is the only reason why I think an early batch crop option will be helpful. And a general 3rd party app will suffice too, doesn't have to be a must in Scan Tailor.

I personally think if you lock the selection area, it will take the same amount of manual time to get back to the sweet-spot; plus we'd be denouncing the intuitive feature the architects intended for this sensor technology; which as a whole, works very well given you know how to shoot and adjust the imaging these algorithms look for.

spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: Making the Select Content Box fixed... or disabled?

Post by spamsickle » 26 Aug 2010, 03:22

univurshul wrote: I'm claiming the errors in the content selection box are mainly driven by the following: 1) subject-matter captured around the book; 2) horizontal reflections on the opposing glass pane (for angled platens); 3) how the user enters in DPIs upon project creation; 4) exposure/focus settings on the camera.
It isn't necessarily even content selection errors which are of concern, though. It's the "pan and scan" philosophy of re-editing the page with selected content rather than preserving the formatting chosen by the original editorial team.

For instance, as a general rule, pages at the beginning of a chapter will have the content at the bottom of the page, while pages at the end of a chapter will have the content at the top of the page.

Now that I have a scanner which is consistent enough to successfully make use of a fixed content selection box, I'm more inclined to look into changing the source code to implement one.

User avatar
JonEP
Posts: 81
Joined: 19 Apr 2010, 15:09

Re: Making the Select Content Box fixed... or disabled?

Post by JonEP » 26 Aug 2010, 09:33

Thanks for the responses. Regarding these issues:
I'm claiming the errors in the content selection box are mainly driven by the following: 1) subject-matter captured around the book; 2) horizontal reflections on the opposing glass pane (for angled platens); 3) how the user enters in DPIs upon project creation; 4) exposure/focus settings on the camera.
1) I think you are referring to the possibility that ST will take elements of the cradle, behind the book, to be part of the book? I've now covered mine with black cloth (placemats--a great use for them!), and no longer have that issue. ST is pretty successfully cropping the pages at the edge of the book, sometimes even at the edge of the page.

2) Reflections: I haven't found this to be an issue. The only serious reflection issue I have is that my images generally include a bit of the glass from the opposing page, but that actually seems to help ST accurately assign a dividing line down the center of the platen's V.

3) DPIs: DPIs remain an issue but mostly because a slight variance will alter the resulting page size; usually this means slightly different page sizes for left and right pages, which I have been adjusting using Adobe Acrobat's crop tool. This is an added step that it would be nice to eliminate, but I don't think setting a fixed crop area would solve the problem or make it worse.

4) Exposure and focus: I haven't had issues with this, and also don't think it affects the question of whether or not it is optimal to have the ability to set a pre-determined content box.

To be a bit clearer about my desire to be able to pre-set the area of the active page: my book scanner is quite good at taking very uniform images of every page of a book (although of course the left and right images differ). This means that the 'active page' -- the part of the image that I would like ST to pass along to the final output -- remains constant throughout all of the photographs. However, ST necessarily goes through all images and detects the active page based on whether or not there is any text or image on it. Because books almost always have half-empty pages or pages with just a line or two of text at the beginning or ending of chapters, or where there is a section title, etc., ST is identifying active page sizes that are quite small. I spend most of my time with ST going through and un-doing ST's identification of the active page, resetting it to the normal page size. Another problem, ST sometimes does not identify faint text near the edges of the page, such as picture subtitles, and it is necessary to go in and pull the boundaries of the active page over to compensate for this. I'd like to just tell ST that "in all these pictures, the active page is this part of the image." Then I'm happy, indeed delighted, for ST to go in and figure out, within that area, where the photos are, where the text is, and transform it to black and white or greyscale to output the final image.

Finally, regarding the question of the book's initial formatting: agreed! I'd like to preserve the layout of the text on the page as it is in the original book, so that my experience of reading the book is as close to the original as possible (this is why I'm interested in PDFs of the books, rather than just OCR'd text that is then distributed over an ebook based solely on the ebook reader's page size...).

Again, these are my personal desires, but perhaps I am not alone in holding them?

univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Making the Select Content Box fixed... or disabled?

Post by univurshul » 26 Aug 2010, 16:22

JonEP wrote: 2) Reflections: I haven't found this to be an issue. The only serious reflection issue I have is that my images generally include a bit of the glass from the opposing page, but that actually seems to help ST accurately assign a dividing line down the center of the platen's V.
--That's exactly what I meant.

User avatar
JonEP
Posts: 81
Joined: 19 Apr 2010, 15:09

Re: Making the Select Content Box fixed... or disabled?

Post by JonEP » 27 Aug 2010, 14:15

Got it.

You know, GUI-wise, the only tweak that is really necessary is when you are on the first screen of "Select Content" and have the option to choose "Auto" or "Manual", if you were to be able to choose "Manual", then pull the content box to the page dimensions (ie., the portion of the photo you want to keep), and if there were then a dialogue "apply to: this page, all pages, all following pages" like there is for the other functions (deskewing, splitting, fix orientation all have this ability), then the feature would be there. Of course, I have no idea what's going on behind the scenes... :D

User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: Making the Select Content Box fixed... or disabled?

Post by Misty » 27 Aug 2010, 14:31

"Apply to all odd" and "apply to all even" could be handy too, if the cradle has slight variations in camera positioning on both sides.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.

univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Making the Select Content Box fixed... or disabled?

Post by univurshul » 28 Aug 2010, 03:12

JonEP wrote:Got it.

You know, GUI-wise, the only tweak that is really necessary is when you are on the first screen of "Select Content" and have the option to choose "Auto" or "Manual", if you were to be able to choose "Manual", then pull the content box to the page dimensions (ie., the portion of the photo you want to keep), and if there were then a dialogue "apply to: this page, all pages, all following pages" like there is for the other functions (deskewing, splitting, fix orientation all have this ability), then the feature would be there. Of course, I have no idea what's going on behind the scenes... :D
Misty wrote:"Apply to all odd" and "apply to all even" could be handy too, if the cradle has slight variations in camera positioning on both sides.
--Yes, for the widest and lengthiest selection boxes in the sort field, makes sense, and does sound faster than what I've been envisioning. It will obviously add a useful tool to the diorama. Turn up the horsepower on these stubborn books that require hundreds of content edits. There are always exceptions, and on a few books, I do wish for all the visual data to be included (most of the time I could care less if page numbers and horizontal print lines are cut out).

I still am leaning toward a cleaner page-split on the dual camera rigs which will means better content selection. But I've learned that most of the errors in splitting can be directly addressed by how you shoot the book and how the book appears in the image.

rlh3
Posts: 1
Joined: 04 Mar 2014, 00:52

Re: Making the Select Content Box fixed... or disabled?

Post by rlh3 » 31 Aug 2010, 23:46

I would like to add a vote for this feature. The primary use of my scanner is for copying books that I am using in my research (I am a historian.) Consequently I need the page numbers, headers and all of that sort of thing. What I would like to produce would be page images (in PDF format) that look like Interlibrary loan images or journal articles. Even having the slightly blurred image of the edges of the text block would be acceptable. Basically I would rather have too much information on the page rather than too little.

Just my $.02

User avatar
Moonboy242
Posts: 56
Joined: 22 Aug 2010, 18:09
E-book readers owned: iPad, Netbook
Number of books owned: 1000

Re: Making the Select Content Box fixed... or disabled?

Post by Moonboy242 » 11 Sep 2010, 19:58

I'd like to start my thread necro by saying "thank you" Tulon for your work. I think we would all be dead in the water if we had to process scans using just about any other means... you've made it free and very effective.

I had some initial problems with platen reflections being selected by ST as part of the actual content and a few confused page splits. My solutions:

1. Using a platen with a sharper angle @ ninety degrees to create a well defined spine / page split.
2. Ensuring that my camera was perpendicular to the subject media and ensuring that the camera was sufficiently elevated so as to place it as close to the subject plate of the opposite page. This reduces reflections and further defines the page split on the platen.
3. This one is controversial... I use a split platen so I took a black permanent marker and neatly colored the edge (Not the edge that faces the camera, but the thin edge between the two plates) of the platen so as to create a black line between my pages. Scan Tailor seems to cue in on that line nicely... although I could have just gotten lucky. Time will tell.
4. I use "high contrast" foam sheets as padding on my cradles. Black works well. I'm using a bright navy blue that seems to clash with the color of nearly every book I've scanned. Scan Tailor seems to cue in on this contrast very well.
5. Three words: Dee-pee-aye.

It is also very important to remember that the more refined and cleanly constructed your scanning build the more accurate your scan photos and the less OCR you will have to do. Even more important is the need to take the time to watch the Vimeo Scan Tailor tutorial. There are tricks in there that will help you pare your workflow to maximum efficiency.

Hopefully these ideas will help with some of the OCR issues we experience. Would it be nice to have a fixed select content box? Sure, but I'd rather have an effective dewarping algorithm first. ;)
iPad: Over it. Android FTW.

Post Reply