Segmentation Basics
Disclaimer: All the material below is refered to the course, Medical Image Analysis, delievered by the University of Copenhagen.
Today’s Learning Objectives
- Define what segmentation means
- What kind of segmentation methods exist
- Explain how connected component decomposition works
- How can you use dilation-erosion operations
First, I will tell you what segmentation means and then I will familliarize you with some existing simple segmentation methods. These methods can be used as and are used very often despite very simple, not usually as independent solutions, but actually, as auxiliary methods. And I will also show you a couple of useful tools, and one of them will be the connected component decomposition and another one will be more for logical operations. And I will show you how we use them in practice, how you can augment medical image segmentation with these methods.
Topics of Interest
In this chart, I tried to recover from my own memory, and one of the medical image conferences the organizer presented a separation of all submissions into different topics. Of course, you can separate this submissions separately, but as far as I remeber, the separation is approximately like this. You can see that, segmentation actually occupies the majority of submissions which are presented at conferences. So this is a topic which attracts the most interest in the community.
Reconstruction
When the 3D images of human body are required, the human body are typically positioned inside a tubular room, and the detector goes around the human body, spins around and acquire a series of 2D image projections. And these images are stacked one after another into 3D volume, which further doesn’t look like a 3D volume of human body. It looks like the image on the left, which is the spectrum of the image. In CT, it’s called sinogram. And there is an algorithm, which can then reconstruct the true 3D volume from this sinogram, and there is a theory which says, it is always possible for a perfect accurate sinogram to reconstruct a perfect accurate 3D volume.
However, in real life, we never have a perfect smooth and complete image of spectrum or sinogram. And the reason is very simple, we can not acquire images at every possible step (degree) when the detector spins around the human body. And we have to do some sacrifices. We have to acquire images from some steps, and due to these steps, due to a lower number of images acquired, we first of all speed up the acquisition time where reduce the radiation dose delievered to the patient. For example, we require CT images, and at the same time, we kind of make our CT scanners less expensive.
And the problem what we are facing is, what is the minimum number of images need to reconstruct to another anatomy. It turns out that currently, if we cannot use really record high-quality motion life images which we may need sometimes. For example, to record how heart moves, how human breaths, and so on. And this problem results in two options. Either we have a poor quality image with motions, or we cannot acquire motion at all. We require static images, which can be affected by motion artifacts. The researchers start to think, what if we acquire less images, what if we start from a poor solution, poor quality images, and try to develop algorithms. Knowing something about human anatomy, try to develop algorithms to improve image reconstruction. Maybe we can use the existing scanners, poor quality scanners, and see if we can get good images with using computational methods. And here is some example from work from Nvidia where they only take 10% of 2D projection images generated by a MR scanner, and apply deep learning to generate a reconstruction (center above). So you can see that, by using existing techniques, it can work well.
Synthesis
The next topic of interest is image synthesis. The idea of image synthesis is to generate artificial image modalities from other modalities of the same patient.
Let’s say we have a CT image of a patient. And image synthesis algorithm try to generate a MR image of a same patient in the same position having in the CT image. Why is this useful?
Let’s say we have a solution with toolbox, which require CT and MRI images to diagnose the disease. And let’s say we install this toolbox in a place where there is no MRI scanner. In such a place, we may either wait when the MRI scanner gets installed too so that we start to use this toolbox. Or we can apply the synthesis to generate the artificial MR scanners, and try to use this toolbox. This is one of the way how synthesis can be used.
Another way of synthesis where synthesis is useful is detection of abnormalities. Let’s say we have a CT image of a patient. As you can see from the illustration (above), some of the structures of the brain, some of them are soft tissues are very poorly visible in CT. And this means, if there is a small tumor, it’s possible that we will not even see it in the CT image. At the same time, this tumor will be very well visible in MR. And what we can try to do is we can try to use CT image if we have both CT and MR images of the same patient, we can try to take a CT image and generate a diffierential MR image. During this generation, that newly generated MR, will have no tumor because the tumor is almost invisible in CT, because the algorithm doesn’t actually know where is the tumor. And it will generate us a more or less clean image of overhead. And when we subtract our true MR from the artificially synthesised MR, it is possible that tumor will be highlighted because there is no tumor under artificial synthesised MR image, there is one in the true one.
Image-guided interventions
Another topic of interest is a very large one, it is Image-guided procedures. Usually, image procedures require the use of many combinational tools, including image segmentation, and image registration. The key think that we separate this topic of interest is that they are much centered at a specific procedure. They are very target. Actually protocal is procedure. And one good example is image-guided procedure, image intervention is insertion of medical screws into the patient vertebra.
Why do we even need this procedure? Some patients have very severe form of scoliosis or we have spine damage. In order to fix the spine curvature, it might be necessary to insert screws into the patient’s spine. And put a back cord for the screws so that the spine will be tilted and straighted. According to the anatomical requirements of human body, and insert screws is a procedure will actually actively straighten this spine and allow the patient to function properly. When we do this insertion, we need to be very careful that the screws do not go into wrong place. First of all, the screws should not break the vertebra. If we break the vertebra, it is possible that we instead of help patient but more serverely than he was already injured. Another problem is that the screws should not go into the spinal cords. We should be very careful that screws go into the spinal cords, otherwise, the outcome might be devastating. And also, the screws inserted properly may get loosen, so the procedure should be repeated again. To answer all these questions, we need to analyse images, we need to see how the anatomy of the patient is organized. In order to position these screws perfectly through particles of vertebra.
There are many other applications of Image-guided interventions, for example, epidemic treatment, femoral head replacement, and some surgeries.
Computer-aided diagnose
One of the topics that has been gaining a lot of interest recently is computer-aided diagnoses. For example, it could be a paper which was published in nature on skin image analysis by stanford. The idea here is that they take an image of the skin, take photos of them, and pass them for neural network, try to predict different skin dieases. Other applications could be prediction of diabetes, and some other applications like prediction of long pathologies from just xrays.
Registration
During our course, you will have several lectures of image registration, because if not knowing what image registration is or how it works, it is impossible to work in the field of medical image analysis.
And the image registration, what it tries to do is, it tries to find a transformation from one image to another. As you can see in this example, what we try to do is we’ll try to take the image which is located at the left up top corner, and slowly step by step with deform it so that the it can come to the image located at the right bottom corner. And we slowly step by step deform it until it turns into again into the image at the left top. And the practical accountability of image registration is significant. One of the ways we can use registration is to monitor disease progression.
Let’s say the patient has been imaged, and the doctor prescribed him a treatment. And after some time, after several months, he would like to follow up how this diease is managed by treatment. And let’s say these dieases affect some structures in the human body, maybe it’s a tumor, which should shrink. Let’s see if the treatment is chemotherapy or regular therapy or maybe it is a disease, like brain, some degenerative diease in the brain like dementia, which results in degradation of certain brain structures. And what you would like to know is quantatively assess how much the tumor or some brain starts just change. What we would do is to take the image of the patient which came at the time of diagnosis. Annotate the structure there, and deform it towards a patient image which we see after at the follow-up time. A lot of changes will tell us how fast it is progressing and whether it’s a treatment has been affected. Another way to use image registratin is to segment structures of interest through registration.
Let’s say we have a patient whose structure inside his body has been segmentated manually and very lightly. We’re interested to segment the same structure on a new image. What we can do is we can manually do it take a instrument like paint brush in some toolbox, and manually outline the structure which is very time-consuming and might be imprecise. But the other way of doing it is to take an image which we have and deform this image of other patients towards our new image and simultaneously deform the structures. Use these deformation to understand how structures are depicted.
Segmentation Applications
After familiarizing you with a main topics of interest in the field, let me go directly to image segmentation.
- Process of partitionig an image into distinct regions
- Why do we need segmentation in computer vision or medical imaging
So topic which receives the most attention by researchers, the topic which used the most and contribute the most, and different applications in the field.
What definition for segmentation is basically personalization of the image into meaninful regions. And if you think about it, such personalization is extremely valuable, not only in medical imaging. Everytime when we analyze image, when we try to understand what is the depicted in the image, we perform segmentation. At least we can perform segmentation at least what is depicted.
Let’s say here is an image of a dog. To understand this is a dog, what we can actually apply an algorithm can develop analogy which will segment which will basically contour the dog on even images, and this contouring can be used to actually find out where the object is located, what is in the background and so on.
Another application a more close to industry application is self-driving cars. These days like self-driving cars, is really flying there and not much time will pass before the self-driving cars will become the main way of transportation. And the way the car, the algorithm inside the car understands where to turn and how to interact. It usually requires segmentation, the algorithm needs to know what happens around it. Where are other cars? Where are the traffic lights? Where is the pedestrians? Where obstacles like shops or buildings. To do this, we need segmentation. We need to take an image around us and partition into meaningful regions in order to understand what happens around us.
Automation of manual work
One of the ways we can use segmentation is for automation of manual work.
In medical image, it is sometimes needed to segment organs to understand somethimg about human anatomy. And if you can see that many different areas, this is one area where this is done the constantly old time and takes a lot of time over for human. And this area is radiation therapy planning.
Why the segmentation is needed there?
Here is an example of two radiant therapist, one is having neuron therapy and one is liver radiant therapy. What we do when we deliver radiation to patient? What we do is we don’t only radiate tumor but actually radiation does not differentiate between tumor and healthy structures.
So if we shoot radiation beam, it will pass through the human body and it will radiate everything on its way. How can we be sure that we kill the tumors selves but actually do not kill or minimally damage the surrounding issues?
The way to do it is actually to use several beams to shoot from different directions. In such way that all of these beams face centered to the tumor, so that radiation delivered to surrounding structures is somehow really distributed. We have high dose delivered to the tumor and not such a high dose delivered to all surrounding structures.
When we do this, we still want to be sure that no critical structure is damaged. This is something very difficult to achieve. Let’s say a tumor is located very close to some critical structures like in the brain or close to spinal cord.
Does it mean that we can simply position beams equidistantly around the human body and shoot as long as we deliver enough dose to the tumor, do not care how much dose is distributed surrounding structures?
Well actually, that’s not true. It turns out that different human organs have different radiation doses activity. Let’s say if we irritate spinal cord, it’s actually way worse in contrast to irrigating the same volume of the ear. All of these relationships is divided of different organs has been carefully evaluated during the last couple of decades by oncologists and they established a set of rules saying that 30% of liver who received those below certian level or tge maximum dose delivered to spinal cord should be below such and such level.
But how we can be sure that this constraints are satisfied?
The way to do it is to segment all organs which are at risk to receive a dose. In the image, segment tumor and plan the treatment before I actually shooting the doses plan treatment and see, what is the volume? How much dose each organ receives? Have the segmentation, we can measure what is the volume in certain millimeters of inhibitors of what is the volume? What is the dose each organ receives and this segmentation of organs is performed continously in the medical field is doing every day now and it takes hours to do this. And many companies like Philips, they focus on developing solutions for segmentation of organs for radiation therapy planing.
Computer-aided diagnosis
Another area of interest is computer-aided diagnosis.
In before, the radiation therapy is treament planning. Here, we don’t know if patient is sick or not, but we can use segmentation to do that. We can use segmentation to estimate if the patient is sick. For example, a cardiacally enlargement of heart can be detected on X-ray images. To do that, what they do is they matter these two lines like basically you can see in the center of an image. A measure the length of this, the height of a heart and the width of the heart in X-ray and from this measurements. They can say if a patient has a normal heart. However, instead of measuring this two lines, it's way better and it might even be more precise and useful to actually segment the whole heart and estimate this volume extremely from segmentation instead of measuring it's height and width. And here, we need segmentation.
Another example is from detection of a vertebra block fractures. When we age, our bones weaken and some vertebra these can collapse. And in some cases, we need to perform a treatment, we need to maybe strengthen the vertebra. If the patient is going to develop very serious complication affect the vertebra about to break or it’s too weak.
The question is how do we estimate the state of the vertebra?
What the doctors do is they take 3D image of a patient and they measure some metrics, they measure height of vertebral points at the front area in the middle and at the back. And there are tables which says what should be the height and what should be the propulsion of these heights respect to each other for healthy vertebra and for pathological vertebra.
Instead of measuring these distances manually, what we can do is we can apply an algorithm for measuring them automatically. And such algorithms actually perform better than human because instead of taking one point in the image, they can segment the whole vertebra. And measure the height of the level of compression more comprehensively, not just from one distance.
Another application of computer aided diagnosis is, for example, writing an automated reports. Let’s say we have a lung field image. And the doctor or maybe a machine annotated pathologist on this image, for example, annotated regions which are suspicious to be cancer or some other opacities or pneumatics or pneumonia or covid. And they would like to generate an automated report from this, what will may need is actually segmentation of lungs in order to say where exactly the pathology is located. From segmentation, we can say that, the pathology is located at the lower part of our right lock or at the upper lobe of a left clunk. And for this, we need segmentation.
Image-guided procedures
Here is the screenshot from the software from planning dental implants.
When a patient comes for the dental implant, it’s especially if implant requires insertion of screws into paitent’s jaw and the 3D image of patient jaw is required. And my doctor, when he sees how a patient’s jaw looks like, and he sees the place for the implant to be inserted.
He faces a couple of challenges, one of them is he needs to design an implant wihch will fit patients jaw perfectly. One way to do is, to have a collection of teeth, basically, image collection of different teeth. And take an average teeth from population, average shape for a specific tooth from population, and put it artificially into the jaw, adjust the shape of his tooth towards to the patient. Unfortunately this procedure is very lengthy because average tooth will never fit very well to a specific patient. We have very different tooth.
The way better strategy is to use the patients anatomy to generate an implant. What we can do is we can use a symmetry of patient's jaw. It is known that a specific tooth will be very similar to the same tooth on the other side of the jaw. So what we can do is we can apply segmentation algorithm, extract the same tooth from other parts of the jaw. And then we can reflect it and position, then start to adjust. This usually require way of less time.
Another thing which should take into account is when we insert screws into the jaw, we should not injure the nerves. Otherwise, the patient will have serious complications, or his muscles in jaw get paralyzed. And here we can do segmentation to be sure to understand how this nerve is going.
Motion analysis
Another area of significant interest is motion analysis.
It is often the case then we image some moving structures like heart or lungs, we would like to know how exactly they move. We don’t only want their static appearance but also dynamic appearance. And there are image modalities which allows us to do that. For example, we can do that with MR images which captures the motion.
ANd when we acquire such an image, it might do very beneficial to understand the anatomy, to understand the move patterns, and with the segmentation for that. For example, here on the right there is segmentation of heart, then we use heart segmentation to model how blood travels inside of a heart so that to understand if there is any problem of heart. If any vessel is weakened, and then we need segmentation to do that.
Segmentation Methods
There are already a lot different versions of segmentation methods, and segmentation has been developed for many many years now. The common and best way to separate the methods into two categories is,
- Supervised -> Annotated Database
- Unsupervised -> Using common sense without use annotation manually
Thresholding
This is some technique involved in the course Signal and Image Processing, check it in detail to help recall.
- Histogram of intensity distribution
- One or multiple thresholds are used for classifying image pixels into sub partitions
Thresholding: gaussian mixture model
- Usually, you need to know in advance how many gaussians you need
- A good initial guess can help significantly
Gaussian Mixture Model (GMM)
A Gaussian Mixture Model is a probabilistic model that represents a mixture of multiple Gaussian (normal) distributions. In image segmentation, GMM is used to model the distribution of pixel intensities in an image. Instead of assuming a single global threshold like in simple thresholding, GMM assumes that pixel intensities in an image come from a mixture of several Gaussian distributions.
Why use GMM?
- Complex Distributions: GMM is more robust when dealing with complex, multi-modal intensity distributions in an image. It can model situations where the pixel intensities don't follow a simple bimodal distribution.
- Flexibility: It allows for a more flexible and probabilistic approach to modeling image data, which can be particularly useful when objects in an image have varying intensity characteristics.
- Adaptive Thresholding: GMM can adapt to local variations in intensity, making it useful for images with non-uniform lighting conditions.
- Soft Segmentation: GMM can provide soft (probabilistic) segmentations, where each pixel is assigned a probability of belonging to different classes. This can be useful in cases where hard segmentation may be ambiguous.
In summary, Gaussian Mixture Models are used when the intensity distribution in an image is not well-suited for a simple global thresholding approach. GMM provides a more flexible and probabilistic way to model the pixel intensity distribution, which can lead to better segmentation results, especially in cases with complex or non-uniform intensity distributions. However, GMM is computationally more expensive than thresholding, and the choice between these techniques depends on the specific characteristics of the image and the requirements of the segmentation task.
- Expectation Maximization (EM) algorithm
- Iterative improvement of log-likelihood function
The Expectation-Maximization (EM) algorithm is a statistical iterative optimization technique used for finding maximum likelihood estimates of parameters in probabilistic models, especially when dealing with incomplete or missing data. The EM algorithm consists of two main steps, the Expectation (E-step) and the Maximization (M-step), which are iteratively performed until convergence. Here’s an overview of how the EM algorithm works:
- Initialization: Start with an initial guess for the parameters of the model.
- Expectation (E-step):
- Compute the expected value (expectation) of the log-likelihood of the data with respect to the current estimate of the parameters. This step involves estimating the values of hidden or unobserved variables, given the observed data and the current parameter estimates. It is called the “E-step” because it computes the expected values of these hidden variables.
- These expected values are also called the “posterior probabilities” or “responsibilities” and represent the probability that each data point belongs to a particular component of the mixture model.
- Maximization (M-step):
- Update the model parameters to maximize the expected log-likelihood computed in the E-step. This involves finding new parameter estimates that make the observed data more likely given the estimated hidden variables.
- In many cases, this step involves solving optimization problems to find the best-fitting parameters.
- Iteration:
- Repeat the E-step and M-step iteratively until the algorithm converges. Convergence typically occurs when the change in the estimated parameters between iterations becomes small.
- Final Estimates:
- Once the EM algorithm converges, the final estimates of the model parameters are obtained.
Region growing
What we do with thresholding? We actually do not take into account the neighboring pixels. So the region growing is based on that if two pieces are neighboring, two neighboring pixels have similar intensities, they probably belong to the same object.
And what region growing does is it's simply adds pixels one by one and merge together in order to get homogeneous regions. Despite being very simple, it can be very useful and relatively accruate in some specific situations.
Let’s see we have an example of this lung fields. We can try to have some seeds, see the regions which are marked in green and slowly add pieces of the pixel surrounding lungs and add into the mask fluxes.
Neighboring Pixels
- You will need seed pixels that are known to belong to the object (objects)
- Add neighboring pixels $p_t$ as long as $P(p_{t-1}, p_t)=1$
- Iterate until no further pixels could be added
- Move to new seed pixels if there is any
The basic idea of region growing is connectivity. In 2D, there are two types of connectivity,
-
4-connectivity
-
8-connectivity
We can extend the connectivity into this distance-based connectivity, you can see in the same way that pixels are connected if a distance between them is less than 2 millimeters.
In this way, we will add more pixels into our neighborhood.
The original region growing is extremely simple. We have seed points, and we have a rule which says that if two neighboring pixels have distance less than $x$. They belong to the same class. And what we do is we slowly iteratively expand our seed pixels until we cannot expand any longer.
Let’s look at this toy example.
- $P(x, y) = \mid x-y\mid \leq 3$
- We do not need to have both seeds; the result will be the same
In the first, we have a pixel $1$ with intensity $1$, marked as dark gray. And pixel $7$ marked as light gray. And the expansion rule is that if neighboring pixels have distance less than $3$, then we connected together.
In the first iteration we add all neighbors to pixel $7$ because the distance betweem them is less than $3$, and we also add some numbers to pixel $1$. And we can just continuous process until we cannot add any more pixels and we get a pretty logical separation of darker object at the left of our image and brighter object at the right of our image.
Region growing works good only in very specific circumstances.
For example, segmentation of lungs. You can see that when we give some seeds inside lungs, and we apply region growing and the results are pretty decent.
Of course, region growing is imperfect. It can not work well for complex anatomy.
For example, here is an attempt of segmenting lung cells and vertebra by using region growing. Due to its similicity alogirthms, it cannot usually produces somooth anatomically relevant borders. So you can see it suffer from leaks, and it also has a problem of noise. If there is a noisy pixel, it cannot penetrate it.
Problems:
- Leaks
- Noise
Split/merge
Another way of doing region growing is approach it from different direction instead of adding pixels to our seeds.
What we can try to do is we can separate image into pieces slowly into most dissimilar pieces.
For example, we label our whole image as one object. And the first iteration, what we try to do is separating this image into four squares. And we see how similar with this average intensity individual squares. If average intensity is similar, we keep the squares to be the same object. But if is too different, we separate them into pieces. And then we go into each of the squares, and try to separate it further. And we continue until we cannot separate any longer.
- First, the whole image is one initial region
- In each iteration:
- Merge neighboring regions that are too similar $P(R, \hat) = 1$
- Split regions that are too different $P(R,\hat{R})=0$
- Stop when no split or merge can be performed
$P(R, \hat{R})$ can be based on the standard deviation of the pixel intensities.
For the reason of simplicity split/merge happens according to the image quadtree:
- For pixels in a $2\times 2$ square belong to a split node of a higher level
- For squares belong to a split node of a higher level etc.
In such a way, we can get a three potential separations at three of different connected and disconected regions.
Here is an example of how this algorithm work.
On the left is the result of applying of starting from the labeling the whole image to be one object. And then starting separating the image into squares, and connected each other. However, the problem with such idea is that once we separated to rectangle regions, they cannot anymore merge together. And it might be needed, for example, our rectangle boxes, they never pass perfectly at the border of different structures and it might be needed to go back into some levels and merge some smaller regions together.
We call it split and merge region growing where we not only separate them into smaller pieces but we can merge them back some of the smaller pieces back if needed.
And you can see that a separation with split and merge region growing is better than the one with only split region growing.
You do not have to use thresholding or region growing on raw image intensities.
The thing about the region growing and thresholding is that these days we don’t use them on raw image intensities because organs are much more complicated than just regions of homogeneous region of specific intensity. But what we can do is we can apply a more sophisticated algorithms, for example, deep learning to enhance organs which we are interested about.
For example, here’s is the enhancement of kidney. And then when we enhance the kidney, we can apply thresholding. To analyse this enhanced image, it’s very easier to separate the background from object on the enhanced image instead of separating the kidney on the raw image.
And this is how thresholding is currently used. It's more used to analyse results of more sophisticated algorithms as an auxiliary method to analyse this.
Other similar segmentation methods
- Watersheds
- Level sets
- K-means
- Various thresholding (Otsu, Huang, etc)
Example
Let’s have a problem of lung fields segmentation.
Let’s say we have a 3D image of lung fields of a human body and our aim is to segment lung fields from this image. And the lung fields are darker than other structures around. Considering this fact, what we can do is we can use thresholding to separate the lungs. So we can take a histogram and use this histogram to separate according to the first valley which separates air from soft tissues.
And here is what we get, it’s perfect but it’s already somehow decent trade. Let’s see if we can improve it by using some simple logical ideas which come down in mind.
So here is an illustration of 3D render of our segmentation. Although it looked very nice on the slice we saw on a cross section, we’ve tried to visualize the threshold of mask in 3D, what we see is that apart from the lung fields which are full with air, we also segment that the air outside the human body which doesn’t belong to human body.
So the first thing we need to think about is how to remove this air outside the human body?
Must-have tools
Connected component decomposition
And the way to do is by using a tool which is called connected component decomposition.
So if you look again to our example fo segmentation of lung fields, what you can see is that the air outside the human body is a very large connected object. So when all areas connected to each other always the air outside and its size is quite significant. And at the same time, lungs are also significant size. And when you look more closely into the image, you find out that not only the air outside the lung fields a human body is also segmentated by thresholding but also air in the stomach is selected. Also air around some image artifacts is also segmented.
And the way to solve it is to use anatomical knowledge which we have prior to solving, to addressing this problem. And what we know is that the air outside with very large volume. And the air outside which was supposed to the largest which we will see in the image. Then, the next largest object is lungs. So for lungs, as the second largest acquistion of air inside of a human body. So what we can do is to find the object which is the second largest and hopefully this object will be lung fields.
To do that, we need an algorithm which is called connected component decomposition.
What the algorithm tries to do is tries to connect and label the same label all pixels in the mask of we have valued $1$ and connect with which are connected to each other using 8 or 4 connectivity in 2D or 27 or 6 connectivity in 3D.
Separating segmentation results into regions:
- Removes noise from segmentation
- Separates segmentation into regions that can be further analysed
We have a binary image (0-background)
Connected component:
- All pixels have intensity value 1
- There is a path between any two pixels using 4-connectivity or 8-connectivity
- No pixel can be added
Here is a little bit of algorithmical depth of connected component decomposition.
It will be skipped as it is not listed in the learning outcomes.
Morphological dilation/erosion
The results probably not that satisfied because the lung fields are not only the air. The lungs also contain vessels inside of them. And these vessels will not be segmented by thresholding. However, we still would likt to add these vessels into our mask, then the segmentation will be complete.
And the problem is that in some cases, especially in X-ray images, the air in the stomach comes very close to lung fields and sometimes it may come be very connected to lung fields and our lung fields mask can sometimes leak inside of a stomach. And we don’t want these to happen.
So here is an example of the vessels inside of lung fields, and also example of artificial leak. We don’t want them to be present in our mask. And the way to do it is to apply mathematical operations.
There are two types of operations, one is morphological erosion and the other one is morphological dilation.
So the erosion is removes one layer of pixels from our mask. So our masks shrinks one layer inside. So we cut one layer from our mask, for every pixel which has at least one neighbour which is not mask will remove this pixel.
For dilation is opposite, what it does is, it adds a few layer of pixels into our mask. So for every known mask pixel, which has a mask pixel in our original mask for every such pixel, we include this pixel into our mask.
What happens when we apply the erosion and dilation, we don’t return to the original mask.
Informally:
- Dilation - expanding of binary mask
- Erosion - shrinking of binary mask
Dilation/erosion is useful:
- To remove all internal noise pixels in segmentation mask
- To remove all boundary artifacts
- To smooth the boundaries
Algorithms
So, let’s look at this example.
We have an original mask which looks like this.There is a hole inside, value $0$. Let’s see what will happen if we apply the dilation around this mask.
All pixels which had value $0$ in our original image $I$, all pixels which had value $0$ and at least one neighbor which has 1 in the 4 neighborhood, they all turn into $1$ into our new mask. => Expand our original mask a little bit.
So for every pixel which has a label 1 and at least one pixel is neighborhood is 0, and also consider the image border to have 0 labels. We turn this pixel into 0. If it has one neighborhood which is 0 and then we turn it into 0. So all these pixels at the border may turn into 0 because we have at least on neighborhood which has 0. But the pixel which it didn’t turn back into 0 is the one which is in the middle. It was 0 in the very begining, but after the dilation, all its neighbors now are 1. So it cannot anymore turn into 0.
So what we get is after applying dilation over, we've got almost the same mask except one pixel in the middle, except for one noisy pixel in the middle, which we turn into one by these operation.
Let’s see what happens if we look at the opposite situation.
All the pixels which have at least one pixel with label 0 as neighbor will be removed, including this guy who is close to border. After this, we will only have two pixels left as label 1. Then, we play dilation. The result shows that compared to the original one, something bottom is not needed for us will be removed.
In some situations, we also lost some pixels and for such a small mask, it’s a significant loss. But you can imagine in our real life, our mask is very larger. So losing only some pixels at the borders will not be that critical in comparison to removing the noise and the comparison to remove the leaks which we don’t want to have.
5 times dilation and then 5 times erosion. => we add all the vessels into our lung field mask, and then the borders of our mask get smoothed. Some inhomogeneity of border is disappeared.
So we not only included the vessels into our mask, which we wanted the beginning, but we also get an opportunity to remove noisy at the borders.
This note will be fixed in soon once I am not that busy.
Enjoy Reading This Article?
Here are some more articles you might like to read next: