A task for volunteers: handling out of memory situations

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

A task for volunteers: handling out of memory situations

Post by Tulon »

From time to time people ask me how a new developer could contribute. So, here is a good task for you to consider.

As you may know, Scan Tailor has a built-in crash reporter (a Windows only feature). When ST crashes, you get a dialog offering you to sumbit a crash report with one click. On average, I get about one a day. FYI: Scan Tailor is downloaded over 100 times a day. Now, it takes me 5 to 10 minutes to extract useful information (that is a stack trace) from a crash report, and who knows how much more if I decide to reply. Fortunately, most people don't bother to leave contact email.

Now, it turns out that like 95% of those crashes are out of memory situations. People load 1200 DPI color scans which take 500MB just to load and god knows how much to process. A 64-bit build of Scan Tailor would help these people, as we are running out of not (just) memory, but of 32-bit address space. I can't make a 64-bit build myself, as I am running 32-bit Windows. Volunteers are welcome to produce such a build, however that's not what this post is about.

What needs to be done is handling those out of memory situations. By handling I mean telling user about it and offering to save the project. The bad news is that's not exactly easy, as Scan Tailor is multithreaded with I believe 3 non-GUI threads, where it's impossible to communicate with the user directly. Still, it's one of the easiest things for a new developer to try. If anyone is interested, I can provide more technical details.

And as usual, don't wait me to do it myself.
Last edited by Tulon on 20 Nov 2010, 14:51, edited 1 time in total.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
StevePoling
Posts: 290
Joined: 20 Jun 2009, 12:19
E-book readers owned: SONY PRS-505, Kindle DX
Number of books owned: 9999
Location: Grand Rapids, MI
Contact:

Re: A task for volunteers: handling out of memory situations

Post by StevePoling »

Tulon, I just got scantailor to compile on my MacBook. And I just installed 64-bit Win7 on a bootcamp partition. It remains to install Visual Studio 2010 (or 2008, i'm not sure which i want) on that partition. Then I'll be able to compile scantailor for you. Or does your windoze port use another compiler? I haven't checked. Give me a few nudges in the right direction and i'll use the 64-bit scantailor compile as a learning exercise.

(Kudos to Rob who helped during the OSX compile.)
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: A task for volunteers: handling out of memory situations

Post by Tulon »

Scan Tailor supports both Visual Studio (tested with VS2008 only) and MinGW. Because I think Visual Studio Express Edition (the free one) doesn't support 64-bit builds, you'll have to use MinGW. I believe it's been a while since anyone tried to build with that, so you might run in a few issues, which shouldn't be hard to fix though. Just follow the instructions in ${SOURCE_FOLDER}/packaging/windows/readme.en.txt

With MinGW you wouldn't get the built-in crash reporter because it's unsupported by Google Breakpad, but on the plus side you'll get a slightly faster executable compared to Visual Studio.

64-bit MinGW can be found here: http://mingw-w64.sourceforge.net/
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
User avatar
Moonboy242
Posts: 56
Joined: 22 Aug 2010, 18:09
E-book readers owned: iPad, Netbook
Number of books owned: 1000

Re: A task for volunteers: handling out of memory situations

Post by Moonboy242 »

Tulon, thank you for your work on Scan Tailor. Without it bookscanning wouldn't be worth the effort to me!

The crashes I have experienced in Scan Tailor have always been out of memory issues and my solutions have been a pretty easy for me:

1. When I scan large books or image heavy books I ALWAYS save, close, and then restart Scan Tailor to ensure my memory buffer has been cleared after each major operation: Select Content, Margins, and Output. A save and restart also provides a restart point should I experience a crash.

2. When I scan image heavy books I Scan Tailor process each chapter individually through output, and then use PDF Split and Merge (http://www.pdfsam.org/mediawiki/index.p ... =Main_Page) to combine the pdf files into one document. Alternately, I have also elected to give each pdf file a number followed by a chapter name after the chapter they contain and leave them all in a folder named after the book I have scanned.

Example: File/Hellbirds/000 Introduction.pdf, File/Hellbirds/001 Table of Contents, etc.

To me, taking a few extra safety steps on my own is no problem to keep having the luxury of an efficient freeware application like Scan Tailor.
iPad: Over it. Android FTW.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: A task for volunteers: handling out of memory situations

Post by Tulon »

It may seem like not a big deal, but receiving those crash reports every day is significantly more annoying than receiving spam. I just ignore those that don't have a contact email (that's like 90%), but it's still annoying to get them at all.
I would be happy if a volunteer steps in and implements out-of-memory handling. Otherwise, sooner or later I am going to disable the crash reporter. Fortunately, it's implemented in such a way that it asks the server for permission to send the crash report, so it's very easy to disable crash reports from certain or even all versions of ST.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
StevePoling
Posts: 290
Joined: 20 Jun 2009, 12:19
E-book readers owned: SONY PRS-505, Kindle DX
Number of books owned: 9999
Location: Grand Rapids, MI
Contact:

Re: A task for volunteers: handling out of memory situations

Post by StevePoling »

spam just says your email address has gotten into the wrong hands, bug reports go right to your ego, saying "you're not as perfect as you thought yourself." of course they'll be bigger pain.

what software needs is another function that generates "attaboy" reports when the software does something particularly helpful for the user just to tell the author s/he's appreciated.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: A task for volunteers: handling out of memory situations

Post by daniel_reetz »

I'm going to throw this request up on the blog to see if we can get you some developer help; I'll also ask a few personal friends if they want to contribute.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: A task for volunteers: handling out of memory situations

Post by Tulon »

I didn't mind getting crash reports back when I was actively developing Scan Tailor and when many of them were actually relevant, that is not just out of memory situations.
BTW, developers getting crash dumps by email is just not how things are done in serious projects. They use crash processing servers, like Socorro, that extract stack traces automatically, analyse them, aggregate them, and produce goodies like top 10 crash locations.
I would certainly want to have one of those. The reason I don't is the lack of a suitable machine to act as a server and my unwillingness to administer it. Too bad SourceForge doesn't provide it as an integrated solution.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: A task for volunteers: handling out of memory situations

Post by Tulon »

A member of this forum requested the technical details, so here we go.

By handling out-of-memory situations I mean catching std::bad_alloc exceptions and informing the user about them rather than crashing. We are not talking about memory leaks here - am not aware of any in Scan Tailor. We are talking about situations where more than 3GB of memory is required, which is the maximum you are going to get in a 32-bit environment. The rest of 4GB 32-bit address space is reserved by the OS.

First, the good news. Most of Scan Tailor's code is exception-safe. It means it was coded in such a way that exceptions don't leave the objects they pass through in an inconsistent state. This makes it possible to recover from an exception. There are two notable exceptions (no pun intended) though:
1. The GUI code is of course not exception-safe. It never is. On the other hand, this code never allocates large memory chunks, so it's very unlikely for an out-of-memory situation to happen there. If you catch exceptions in code that's not exception-safe, it's a hit and miss game. The program may or may not survive that.
2. When an out-of-memory situation happens inside the QImage class (part of Qt), it doesn't throw an exception, instead turning the image into a null image. That's very mean if you ask me. This means we would need to insert explicit checks every time we construct a QImage. That's currently done in some places but not in all of them. Fortunately, QImage is rarely used as is. Most of processing inside ST works with either imageproc::BinaryImage or imageproc::GrayImage classes that don't have this problem, even though GrayImage is a wrapper around QImage.

These were still good news, if you are wondering. The bad news is that ST is a multi-threaded application, making handling out-of-memory situations difficult. It's not a problem to catch the std::bad_alloc exception in a background thread. The real question is what to do next. You can't inform the user from a non-GUI thread and you can't just eat the exception, because the task that failed probably has someone in another thread expecting its completion.

Now would be a good time to enumerate Scan Tailor's threads:
1. The GUI thread. Deals with the user interface only.
2. Main processing thread. Does stuff like page splitting, content box detection, output, etc.
3. Auxiliary processing thread. Does stuff like delayed high quality image transformations, interactive despeckling, on-demand loading of debugging images.
4. Thumbnail thread. Loads, scales and caches thumbnails.

How does inter-thread communication work? It's done by sending commands rather than messages back and forth. Imagine sending a black box with a button on it saying "push me". The receiver doesn't have to figure out what to do with it - it just has the push the button. The black box then does it's stuff, and if it wishes to report back, it spews another black box with a button that is to be delivered back. Why send such black boxes anywhere when you could press the button where you are? Two possible reasons:
1. The black box may be doing its stuff for a long time, and you want to do other things meanwhile.
2. The black box may want to do stuff that's only possible in a specific environment. For example, interacting with the GUI is only possible in the GUI thread.
Black boxes are typically represented by the AbstractCommand family of classes.

So, where to catch exceptions in each of those threads?
1. In the GUI thread, override QApplication::notify() in the Application class and catch it there. From there you can just show an error dialog.
2. Fortunately, the main processing thread already has a notion of a failed task. To see it in action, create a project and then move its files to another directory. Exceptions are to be caught in WorkerThread::Dispatcher::processTask(). Having caught an exception, send an error reply to the main thread. There is a private class LoadFileTask::ErrorResult that should be factored out, made public and used for generic error reporting from the main processing thread.
3. This thread is represented by the BackgroundExecutor class. It doesn't have a notion of a failed task, though it should be easy to introduce, as its reply mechanism is almost identical to that of WorkerThread mentioned above. Why do we have two classes for doing essentially the same thing? Well, WorkerThread is more specialized than BackgroundExecutor. It supports task cancellation for instance. BackgroundExecutor is more generic, was written later, and the benefits of moving the main processing thread to it weren't worth the effort.
Exceptions are to be caught in BackgroundExecutor::Dispatcher::customEvent(). Again, we would create a generic ErrorReply class, subclassing AbstractCommand0<void> and doing some GUI notification in its function call operator.
4. The thumbnail thread has status codes. We could add an OUT_OF_MEMORY status. Exceptions would be caught in ThumbnailPixmapCache::Impl::backgroundProcessing() around loadSaveThumbnail(). The GUI notification would be done in ThumbnailBase::handleLoadResult().


After writing this, the task sounds more complex than initially. Still, it's manageable and worth doing.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: A task for volunteers: handling out of memory situations

Post by spamsickle »

Tulon, thanks for posting these details. They're very helpful in understanding the overall structure of your application, and provide useful keywords for tracking down the details. I'll definitely take a look at this to see if it's something I can handle -- which shouldn't discourage anyone else from tackling it first.
Post Reply