Thresholding segmentation and histogram techniques.
In this lecture, we are going to talk about some pixel-wise intensity transformations and we will also be discussing histograms or histograms of intensities and what we can use this for.
Using some threshold value $\tau$ we can construct a binary image from the input
\[J(x, y) = \begin{cases} 1, & if \ I(x, y) > \tau \\ 0, & Otherwise \end{cases}\]We can use this for a simple foreground-background segmentation or segmentation grouping of pixels in the image.
We say that whole bright pixels they now has the value one, they are grouped together. Similar for all dark pixels, all dark pixels they become zero.
Here we piceked a proper $\tau$ and then apply this transformation to the image and then get this binarized image.
However, in the example, some of rice still in the background, that means, the threshold we picked doesn’t seem to actually select the individual rice grain so well.
Segmentation algorithm => Group pixels together according to the threshold.
If the scene has a illumination that is varing across the scene, then this simple approach might cause some problem.
But we can fix it => Adaptive thresholding
Thresholding transformation is non-linear because if you take the intensity and add some new intensity to it and then perform thresholding you get one result.
Adaptive thresholding: The idea is that we picked a threshold that depended on the local image.
Now we start to construct non-linear filter.
Let the threshold depend on intensities in a $N\times M$ neighborhood around the target pixel.
\[\tau = mean_{N\times M}(I)+C\]$C$ is to make sure that the threshold is high enough, and this $C$ would be the same for all pixels.
We choose a specific mean filter size, $15\times15$ pixels, $C=20$, then we get this result.
Now we are able to see the rice grains clearly.
It’s a non-linear filter because we are operating on a pixels in a non-linear manner. And it is also shifted variant because even though we get different $\tau$ in different location, we are performing the same operation, but the operation now depends on the neighbor pixels.
But if I try to apply thresholding approch to more complicated image?
I just converted this to grayscale and then it picked some threshold, and then I get this result.
Now this is slightly more difficult to use thresholding. You can’t really use this result here.
The problem here is that this intensity distribution of this image is no longer bimodal whereas the intensity distribution for the previous example is bimodal which allows us to pick a proper threshold that separates two groups of pixels.
This way of performing segmentation only works in certain cases where we have a simple distribution of intensities. => This naive form of segmentation only works for simple images (with bimodal intensity distribution)
Simple color-based segmentation
We can also use this approach on a color image.
*Step 1 => Convert image from RGB to HSV.*
I am interested in segmenting the red picture. I could do this on red channel but it doesn’t work. But if I instead pick a range of Hue value, which represents the color.
Step 2 => Choose a range of Hue values $[H{\min}, H{\max}]$ to represent the relevant color (red).
Step 3 => Set all pixels outside this range to zero and all inside to 1.
To do some statistics on the intensity of an image, we will first define what we mean by an intensity histogram.
Intensity histogram: Count the number of pixels having specific intensity level $v\in [0, L-1]$ (assuming $L$ intensity quantization levels).
\[H_1(v) = \#\{(x, y), I(x, y) = v \}, v\in[0, L-1]\]Here the range of $v$ is $[0, L-1]$ is for simplicity, in practice, it should be a set of allowable intensity values.
Basically, the histogram is just to count the number of pixels here denoted by $(x, y)$. We don’t care about the coordinates but just the number of pixels that have if its intensity value is $v$.
We can take an image and then, form the histogram of intensities for all possible intensities. => Example
In the coins example, we have two peaks. The first peak is most likely the number of the pixels representing the background. The second peak is light gray and this is mostly likely the coins that we observe.
There’s better variation in the colors so we have apparently some intensities at most places along the intensity axis.
Now we recall from the thresholding on the rice grain image and because the rice grain is looks a little bit like the coin when we do the histogram of the rice grain. We also get a bimodal distribution. There are two modes in the distribution.
But if we look at the more complex example, then we can see that the intensities distribution is much more complicated so it’s hard to find a threshold that separate out the meaningful regions or objects in the image.
If we want to find coins, we probably need to choose a threshold somewhere in the middle of the intensity level and then we might do a good job at segmenting the coins.
Pick a threshold value $\tau = 126$ and then we’re using this, I can do a segmentation and we can see that we get a pretty good result out of this.
The distribution of intensities (gray scale) or colors in an image reveals information about its content.
Bimodal distribution: Find an optimal threshold that separates the two peaks.
This gives a good threshold-based foreground-background segmentation.
Let’s just see what happens with the intensity histograms if we apply an intensity transform like the Gamma-correction or exponential and the like.
Firstly, we have some image with the original histogram for the image.
Then, we perform a transformation that takes all pixels and squares them. Then we can see that in this shape. Especially, we can see the range of the intensities is $10^4$ whereas the original is from $0$ to $255$. So of course we have stretched the allowed intensity range but it also means that we effectively stretch out the intensity. => We would see some holes because we have these white lines where the image there were no observations, no pixels that have the value. So when we do this transformation, we are creating such holes.
Here is another example with $\log$ transform. It’s very clear that at the lower range of intensities, we now get a lot of holes here. Remember that the $\log$ transformation stretching the low-intensity part and compressing the high-intensity part. => Just like the image shown below. However, what we also now see that simply because we have a fixed number of discrete quantization levels, we get these holes (white spaces). We don’t have any pixels that have a value in between here. In effect with the quantization level we chose for this image, there will be no pixels that can have this value because that cannot be represented with the quantization level of the original image.
The last one doesn’t change the actual shape much. It changes the scale of the intensity axis along the x-axis.
Model choice: The intensity at any pixel in an image is a random variable.
We can normalize the histogram with respect to the number of pixels in our image $I$. If $I$ has a spatial resolution of $N\times M$ pixels, then $P_I(v) = \frac{H_1(v)}{NM}$ is the probability that intensity $v$ is found in the image $I$.
$\frac{数量}{面积}=概率$
$P_I(v)$ is a discrete probability mass function (PMF) (a.k.a. probability mass function $(PDF)$) with $0\leq P_I(v)\leq 1$ and $\sum^{L-1}_{v=0}P_I(v)=1$.
The discrete cumulative distribution function (CDF) is
\[F_I(x) = P_I(v\leq x) = \sum_{v\leq x}P_I(v)\]How to select an optimal threshold?
Consider a histogram with two modes as being separated in two class, then find the optimal threshold $k^*$ by maximizing the between-class variance
\[\sigma^2_B = \omega_0(\mu_0 - \mu_T)^2 + \omega_1(\mu_1 - \,u_T)^2 = \omega_0\omega_1(\mu_1 - \mu_0)^2\] \[\begin{align} k^* &= \arg\max_{0\leq k\leq L-1 }\sigma_B^2(k)\\ \mu_T &= \mu(L-1) \Leftarrow Total\ Mean\\ \mu(k) &= \sum_{i=0}^k ip(i) \Leftarrow Mean\ of\ Class\ 0\\ \omega(k) &= \sum^k_{i=0}p(i)\Leftarrow CDF\\ \sigma^2_B(k) &= \frac{[\mu_T\omega(k)-\mu(k)]^2}{\omega(k)[1-\omega(k)]} \end{align}\]Histogram matching between the histograms of two images
We consider intensities in an image as a random variable (RV). The normalized histogram gives the distribution of these intensities.
In the following, we will consider limiting case of an image with a continuous range of intensities. That is, we have a continuous probability density function (PDF) over intensities.
Question: Given two intensity probability transforms, can we derive an intensity transformation when applied to one image transforms the CDF (or PDF) to match the CDF (or PDF) of the other image? => If an intensity transformation is applied to the image, can one derive the new intensity probability function? => If an intensity transformation is applied to an image, can one derive the new intensity probability function. Or, if we have an image with specific PDF, we apply some transformation to the intensities, can we figure out what is the probability density function for this new transformed image? => YES
Let $R$ be a real RV and $g: \R \mapsto \R$ (or $[a, b] \mapsto [c, d]$) a function that maps $R$ to the RV $S = g(R)$.
In the following, I will use $r$ to denote intensities in the original image and $s$ to denote intensities in the transformed image.
Can we compute the CDF $F_S$ from $F_R$?
Step 1. Start from the CDF of $S$
\[F_S(s) = \int^s_{-\infty}p_S(x')dx' = P(S\leq s) = P(S\in (-\infty, s])\]Step 2. Since $S = g(R)$, we have
\[S\in (-\infty, s] \Leftrightarrow g(R) \in (-\infty, s) \Leftrightarrow R\in g^{-1}((-\infty, s])\]Step 3. So we can write
\[F_S(s) = P(S\in (-\infty, S]) = P(R\in g^{-1}((-\infty, s]))\]Hence, we need to focus and compute $g^{-1}((-\infty, s])$!
The simplest case is to assume that $g$ has an inverse, which implies that it is strictly monotonic (either increasing or decreasing).
Simplest simple case: $g$ is strictly increasing, then
\[g^{-1}\left((-\infty, s]\right) = (-\infty, g^{-1}(s)]\]Thus, the relation between the CDFs are
\[F_S(s) = P(R \in (-\infty, g^{-1}(s)]) = F_R(g^{-1}(s))\]Lets set $r = g^{-1}(s) \Leftrightarrow s = g(r)$, and then, we get
\[F_S(s) = F_R(g^{-1}(s)) \Leftrightarrow F_S(g(r)) = F_R(r)\]If $F_S$ is strictly increasing then we can retrieve $g$ by
\[g(r) = F_S^{-1}(F_R(r))\]Explanation
Let’s consider this scenario that we have an input image.
It has a histogram from the histogram, we can compute the CDF of this image. $F_R$.
The formula tells us that if I pick any pixel in my image and if I figure out what I look up what is the intensity value at that pixel. Let’s assume that it’s $r$.
First, we need to look up what is the value of the CDF of the original image of $r$ => $F_R(r)$.
Then, we need to do an inverse look up in the CDF $F_S$ of the result image that we want.=> Gives me the new intensity value that I should write into the output image. And this value is computing by $g(r) = F_S^{-1}\left(F_R(r)\right)$
The consequence is that we can change from the distribution given by CDF $F_R$ to one with CDF $F_S$.
We can use this to match intensity histograms of two images. => Histogram Matching
Application
We can take one image as the input, and compute its CDF. Then, we can take another image which has a distribution that we would like, and then, we compute the CDF of that. Then, we can take the original image and apply the strategy. Construct a version of the image that has the same probability CDF as the latter image.
Contrast stretching
We can see there’s a lot of pixels in the dark range of the intensity range. Now, the question is can we do something to improve the contrast in this image to stretch the dynamic range of the image so that we can easily see more details in the tire region.
One way to view this is that we would like to make a mapping of this intensity distribution and wrap it towards a probability distribution that is uniform and so that all the intensities are equally.
If we do this, then we sort of redistribute the inside the individual pixel so that they take up ore use the complete range of available intensities for our quantization.
We are looking for remapping to a new PDF with $p_S(s) = \frac{1}{L}$. => Uniform distribution $p_S: [0, L] \to \R$
How to find a mapping $g$ such that $S = g(R)$ has a flat PDF?
I will go about deriving histogram equalization in a slightly different way than for histogram matching. But it will turn out that we are actually performing histogram matching.
Question. We have $p_S(s) = \frac{1}{L}$ and $p_R(r)$, and want to find $g(r)$.
I pull a matehmatical white rabbit and write up this differential equation (we use this notation $g’(r) = dg/dr$)
\[g'(r) = \frac{p_R(r)}{p_S(g(r))}\]The expression above is because => we know that $F_S(g(r)) = F_R(r)$. Lets compute derivative with respect to $r$ on both sides:
And since $p_S(s) = 1/L$, we have $g’(r)=L_{p_R}(r).$
Solve this differential equation by integration
\[g(r) = \int^r_0 g'(\rho)d\rho = L\int^r_0p_R(\rho)d\rho = LF_R(r)\]Actually, we have just shown that histogram equalization is histogram matching of CDF $F_R(r)$ to CDF $F_S(s) = s/L$ (i.e. the CDF of the uniform distribution $p_S(s) = 1/L$).
Perfect histogram equalization is in general not possible due to quantization, missing intensity ranges. =>
In practice, not all CDF’s are strictly increasing! It is possible to have constant plateau’s, if a range of intensities are not occurring in the image.
How do we perform histogram equalization of color images?
Step 1. Convert RGB image to HSV.
Step 2. Perform histogram equalization on V channel. V channel is the intensity of the pixel values. Have nothing to do with color.
Step 3. Convert processed image from HSV back to RGB
We can see that histogram equalization doesn’t do anything good to this region of a bright pixels, it looks a little bit worse than in the original image. But it definitely brightens up the dark part. But we can handle this problem with a variant. => Adaptive Histogram Equalization / Matching
A non-linear filter