1. It misses a part of the content which is far from the main text body. This is common in the pages where the page number is in the bottom of the page and the text takes only some lines in the upper part of the page. Tipically, this happens in the endings of chapters; or because
2. It includes in the content stuff that isn't text nor images. It happens to me a lot when I scan books that I have underlined when I read them, and when scanning clear books there is the occasional dot, line or spot in the book that is detected as part of the content if it is near the text.
I have two ways of dealing with this. One of them helps ST a lot, can be done very quickly and can overcome the underlining problem (but not other problems) almost completely. The other is painfully slow and tedious:
1. The fast workaround is miraculous when it comes to select the content of dirty (i.e., underlined) pages. It basically consists in feeding ST with clean files which are identical to the actual files we want ST to output with, let it work with them and finally making it work with the original files only in the output stage. This is how I do it:
a. I scan the files (in grayscale, obviously) and load them in ABBYY FineReader (which is the software I use for this). There I deskew them (it does it better than ST in my opinion) and save the files. Then I apply them some aggresive levels. This way virtually all the dirt is gone, but the text boundaries are clear enough for ST to select content correctly.
b. I now save the "leveled" files and go through stages 1-5 in ST with them. The files must have the exact same names than the files we saved in the step a. This will allow relinking later. After every step in ST everything (specially content selection and margins) must be set to manual. This is what allows us to apply the work done on clean files to the original files.
c. Once the content has been selected, the margins set and only the output stage is remaining, I relink (Tools > Relinking...") the project with the original deskewed files saved in step a and simply let ST process the output stage.
1. The slow and tedious way is simply editing the images with a image processing software. I use GIMP:
a. For parts that are ignored by ST I just draw lines that run from the text which is being correctly detected to the part of the page which ST ignores and overwrite the files. After this, when ST does auto content detection, it includes the part that it originally missed. For example:

In this page only the text in the upper part of the page was being detected. After drawing the lines, the whole content is detected. I do this instead of just dragging the content selection margins because I like the accuracy of ST when it comes to select the content and I know I won't be so accurate. Besides, having the files drawn doesn't matter, because these are the disposable files I talked about earlier.
b. For the issue 2 (stuff which is not text nor images being incorrectly detected as content), I just erase the offending bits with GIMP, save the files and let ST autodetect again.
I'd like to have these 2 last features included in ST in the form of the ability to include or exclude selected areas from the automatic content selection. The latter more like "try to autodetect also in this area" rather than the "include exactly this" the margin dragging does. This is more or less what ST already does with images in mixed mode, so it won't be hard to implement and would speed a lot the content selection stage.
So what do you think? Would you like this is ST? It would be the icing in the cake IMO.