Beware of Otsu’s Thresholding for Tissue Segmentation in Digital Pathology

Why do Tissue Segmentation?

Histopathology whole slide images can be multi-gigapixel in resolution – millions (if not billions) of pixels. This is the main reason that building deep learning pipelines for pathology can be so challenging.

It’s obvious that one cannot just feed entire images into deep learning models (I can’t think of any standard architecture which can handle a 100k x 100k pixel input), so a common workflow is to tile the image into patches of a certain size – in the realm of around 256x x 256 pixels. Obviously, this can create thousands upon thousands of image patches – and not all of these patches are as useful as others. Looking at the example slide below (left), we can see that if we just naively grid over the entire slide and extract patches out this way, then we will have many patches which are just simply background – uninformative for any model we might want to train.

It makes far more sense to tile as we’ve done the on right hand side: only from the actual tissue areas, where we are going to be doing our analysis. This means all of our input data is informative and that we can reduce the amount of input into our models.

Note that these tiles are for illustration – in an actual use case, a 256×256 patch would cover so little tissue compared to the whole time, it would just appear as a spot!

This is why tissue segmentation is quite a standard preprocessing task for digital pathology deep learning workflows. There are a few methods, but they broadly fall into the category of either deep learning based approaches or traditional computer vision (CV) approaches (ie, no ML). The deep learning approach is pretty straightforward (on a high level, anyway) – get a pathologist to trace the outline of the tissue areas on some training set of slides, and train a deep learning model to recognise tissue, then use that model to do tissue segmentations for the rest of your slide.

Use Traditional Computer Vision for Segmentation

I am not a big fan of this deep learning approach – this just creates more work for the pathologists you’re working with (or yourself, if you make these annotations manually), and there are CV methods which work much better, and are actually probably more common in workflows.

The traditional CV method involves using Otsu’s Method. This is a method that converts grayscale images into black and white images by finding an optimal threshold – thus binarising the image. In the tissue segmentation sense, this would mean turning the areas of the image where there is tissue white, and where there is just background black, creating a ’tissue mask’. Otsu’s method works by assuming that the image histogram is bimodal (two main peaks) and searches for the threshold that best splits these two classes. The algorithm tests all possible thresholds and selects the one that maximizes the between-class variance (equivalently, minimizes within-class variance). This yields a data-driven threshold that adapts to the image’s intensity distribution without needing manual tuning.

But note – Otsu’s method is usually applied to grayscale images, whereas your traditional digital pathology images are obviously not that, so we need a way to take our slides and get a grayscale representation.

In the literature, you’ll find a split between two methods of doing this. One involves computing gradient magnitudes and one involves utilising the HSV color structure of the images. The main point of this article is to show that we should really only choose to do the latter method. If you want to see how I do this in my own research pipeline, please see this article here.

Gradient Magnitudes

So what is a gradient magnitude, and why it used for tissue segmentation?

In image processing, the gradient of an image measures how quickly pixel intensity changes across the image. If you think of an image as a surface where brightness corresponds to height, the gradient at any point tells you the direction and steepness of the slope at that point. In practice, we compute this by looking at how much the intensity changes in both the horizontal (x) and vertical (y) directions. These partial derivatives are typically estimated using small convolution filters – the most common being the Sobel operator, which applies a 3×3 kernel across the image to approximate the derivatives in each direction.

Once we have the horizontal gradient Gx and the vertical gradient Gy, the gradient magnitude at each pixel is simply the Euclidean combination of the two: sqrt(Gx² + Gy²). The result is a new grayscale image where bright pixels correspond to areas of rapid intensity change (edges) and dark pixels correspond to areas where intensity is relatively uniform (flat regions).

The logic for using this in tissue segmentation is intuitive enough: tissue regions tend to have texture, structure, and staining variation, all of which produce strong gradients. Background, being mostly uniform white, produces very low gradient values. So in principle, you threshold the gradient magnitude image and you get tissue versus background.

The problem is what the histogram of this gradient magnitude image actually looks like.

Gradient Magnitudes Produce Unimodal Intensity Distributions – This is Risky

Think about what a typical histopathology slide contains. There is a huge expanse of background with very little intensity variation – this produces an enormous number of pixels with near-zero gradient magnitude. Then there is the tissue, which produces a wide spread of gradient values depending on local texture, staining intensity, cell density, and so on. These tissue-related gradient values don’t cluster neatly into a single peak. Instead, they tend to smear out into a long, fat tail extending from the dominant background peak. This is a textbook unimodal distribution – one massive peak with a tail, not two distinct peaks. And this also happens to be the exact scenario where Otsu’s method runs into trouble – remember, the method assumes a bimodal distribution (and, moreover, assumed that each peak is also Normal).

My supervisor, Prof. Paul Rosin, wrote a paper in 2001 called “Unimodal Thresholding” (Pattern Recognition, 2001) which discusses this issue. Otsu’s method assumes bimodality. It is searching for the best split between two classes, and when the histogram doesn’t actually contain two well-separated peaks, the threshold it selects can be unreliable. He demonstrated this with synthetic experiments, showing that when the secondary class (in our case, the tissue-related gradient values) constitutes a very small proportion of the total histogram, Otsu’s threshold effectively collapses – it either sets the threshold too low, capturing too much noise, or converges to an arbitrary value that doesn’t meaningfully separate the two populations.

This maps directly onto our tissue segmentation problem. On a whole slide image, especially one with a relatively small tissue area compared to the total slide, the gradient magnitude histogram will be dominated by near-zero background values. The tissue gradients will form a tail, not a peak. Otsu is the wrong tool for this job.

Now, to be fair, this doesn’t mean Otsu on gradient magnitudes will always fail catastrophically. If you have a slide that is mostly tissue with relatively little background, you might get a more bimodal-looking histogram and Otsu will cope. But that’s relying on luck rather than methodology, and in any large-scale computational pathology pipeline, you will inevitably encounter slides where the tissue-to-background ratio is small – and those are precisely the slides where your tissue mask will silently fail.

Using HSV Images instead

The alternative approach is to convert the RGB image into HSV (Hue, Saturation, Value) colour space and apply Otsu’s method to the saturation channel.

Why does this work so much better? Because the saturation channel naturally produces the bimodal distribution that Otsu actually needs. The background of a histopathology slide is white (or very close to it), and white has near-zero saturation – it is essentially colourless. Tissue stained with H&E, on the other hand, is distinctly coloured: the haematoxylin produces blues and purples, the eosin produces pinks and reds. These stained regions have meaningfully high saturation values.

So when you look at the saturation histogram, you get exactly what Otsu wants: one peak near zero (background) and another peak at higher saturation values (tissue). Two classes, two peaks, bimodal histogram. The algorithm’s assumptions are met, and it can find a clean threshold between them.

This is more robust across the range of challenging cases that gradient magnitudes struggle with. Adipose tissue, which is white and would produce low gradient magnitudes (potentially being missed as background), still picks up enough staining at cell boundaries to register in the saturation channel – and you can always apply some morphological closing to address this. Lightly stained tissue that might not produce dramatic intensity gradients still has colour. Pen marks and other artefacts, while they can introduce additional modes, are at least not undermining the fundamental assumption of the method in the way that the unimodal gradient histogram does.

Again, that is not to say saturation thresholding is perfect. There are edge cases: unstained tissue, tissue that has been significantly faded, or regions where the mounting medium has introduced colour artefacts. But the point is that the underlying statistical assumption – bimodality – is far more reliably met when working with saturation than with gradient magnitudes. You are choosing a representation that plays to Otsu’s strengths.

Takeaways

Preprocessing in computational pathology doesn’t get a lot of attention. Tissue mask generation is treated as a solved problem, something you set up once and never think about again. But the choice of what image representation you feed into Otsu’s method matters a great deal, and it matters for well-understood, decades-old mathematical reasons.

If you are using gradient magnitudes with Otsu thresholding, I would encourage you to actually look at your histograms across a representative set of your slides. You may find they are far more unimodal than you assumed, and that your tissue masks are quietly failing on the slides where getting them right matters most – the difficult ones with small tissue areas, light staining, or unusual backgrounds.

Converting to HSV and thresholding the saturation channel is a simple change that better aligns your data with the assumptions of the algorithm you’re applying to it. It’s not glamorous, but getting your tissue masks right means every downstream step – your tiling, your feature extraction, your model training – is built on a more solid foundation.

If you enjoy this content and want to see more, please do subscribe!

Burhan Anis