Mga Pahina

Linggo, Setyembre 29, 2013

Neural Network

When we were young, we were taught about the basics. It could be either in arithmetic or just plain understanding of the objects around us. Apple is red, while banana is yellow. One plus one equals two, and two plus two equals four. These understandings, to most people, are deeply learned after a number of examples and trials. Although some (or very few) are quick in learning, we all need a set of examples before our brain could finally interpret a given idea. The reason behind how the nervous system works have been a topic of debate in neuroscience since the nineteenth century.

This type of learning have been modeled in 1943 by McCulloch and Pitts, particularly how a neuron (in this case, the brain) responds, collects and pass information in the central nervous system. This came first to be the artificial neural network. An illustration of a neuron is shown in Figure 1. 

Figure 1. Artificial Model of a Neuron

 The basic idea is that, a number of input (x1, x2, x3) is connected to a neuron via edges, each of which has corresponding weights (w1, w2, w3). The neuron then collects and receives the weighted inputs from the other neurons, sums them up, and passes it to the activation function, g. An initialization of the weights is done and a desired output is placed. This part is called the learning process of the network. Based from the deviation of the output of the network to the desired output, the weights of the input are adjusted, until the deviations are minimal, and the network is able to classify the inputs.  This process of determining the weights of the input is an iterative process with a prior knowledge of the output. In this activity, we will also try to determine the effect of varying the number of iterations and the learning rate of a the network.

In the last activity, we worked on pattern recognition, which relies on a set of features that distinguish the different classes to be segregated. This process is an example of a linear discriminant analysis. In the activity, I classified five different types of Philippine Peso coin. These are 5 centavo, 10 centavo, 25 centavo, 1 peso, and 5 peso coins. Each coin have different area, and different colors. However, to some of these, the color gradient are relatively the same from one another, especially if other factors such as fading and scratches in the coin is present.

Unlike pattern recognition where the program performs the same method in all input subjects, the method of neural network learns the pattern and then uses it to classify the next input. A connection of many neurons make up a neural network. It typically consists of an input layer, a hidden layer, and an output layer as shown in Figure 2.

Figure 2. Neural Network
The input layer receives the features, say size and color, and then fed to the hidden layer, where they are acted upon and passed on to the output layer. In supervising the learning, a desired output is initialized and an error back-propagation is performed, where the difference of the output layer and the desired output is computed, and together with the activation functions in each layer, the error derivatives with respect to the weights are computed until epoch is attained. This is the time where all the data sets are already entered in the network, and the modification of the weights is performed.

The ANN (artificial neural network) toolbox present in Scilab was used in this activity. This toolbox is straightforward. I had thought about coding the error propagation before just like what we did in AP 156, but thanks to this toolbox, the activity is made relatively simpler and easier. I think the crucial thing here is to know  and understand how neural network works. 

The image below is the plot of the predictions performed by my neural network program for a set of 50 training data points, 10 for each classification. 
Figure 3. Result of the algorithm for the test data, ten for each classifications. The red ones correspond to the desired and the blue points correspond to the output of the network.

Notice that the result of the program is most accurate in classifying the 5 peso coin. 

The ANN toolbox gives the user the luxury to change the parameters such as the training cycle and the learning rate. I tried to determine the effect of these two parameters, and I got the following plot.
Figure 4. % Error as a function of increasing training cycle
The y-axis corresponds to the percent deviation of the result of the program for a single class. In this study, I only used the 5-centavo coin class. As you increase the number of training cycle, the result of the ANN becomes more accurate, which is intuitive since you feed your neuron with more and more example. Isn't it very similar with the human brain? :) The more we are exposed to a certain thing, the more we get acquainted to it, and the more we get familiarized with it.

Now, how about the learning rate? Is it intuitive the the higher the learning rate, the more accurate the result should be? Well, apparently, that is not always the case. This is as suggested by the following plot.

Figure 5. % error for increasing values of learning rate [0.5 10]

There is a certain range where the increasing the learning rate would decrease the accuracy of the result, nd that is at about 4. Beyond this, increasing the learning rate would make the result more accurate, and closer to the correct values. This result is quite interesting! :)


And that's it! We're down to the last activity. Now, we'll proceed to the project. My project is about the topography of a granular collapse, which is my current topic in research. I'm glad Dr. Soriano has allowed us to do project that has something to do with our own research. So yeah, lucky me!

For this activity though, I give myself a grade of 11/10 for investigating the effects of different learning rate and training cycle. :) 


Martes, Setyembre 24, 2013

Pattern recognition

And we're back to blogging! The last topic of this blog was image compression using PCA. This time, we will be discussing about pattern recognition. We have tried a number of image segmentation techniques from the past. Pattern recognition is a bit similar to these, only this time, we need some training sets  before the program can detect the pattern. For this activity, I used Philippine coins as subjects, particularly the five peso, one peso, 25, 10 and 5 cents. There are 5 classes to be distinguished. For each class, I used ten training sets. Representative images of these set is presented in Figure 1.

 Figure 1. Training sets arranged according to different classes

From the images above, one can immediately think that size can be a distinguishing feature. However, there is a possibility that 10 cents could be near the sizes of 25 and 5 cents. Thus, we need another distinguishing feature. Since all classes are circular, the eccentricity can no longer be used. Thus, I'm left with color as an alternative feature. The problem with this is that, 10 cents and 5 cents, and 5 peso and 25 cents are slightly the same in colors. Moreover, if we consider the color of newly released coin from the bank, we will be able to observe that they are relatively shinier and more colorful than the old ones. Regardless of these hypothetical problems, I still tried using these features to classify my objects.

I incorporated the following steps to each of my training set for every classes.
1. Convert the image to binary.
2. Apply morphological operations to remove unwanted blobs as well as isolate the main blob (that of the coin)
3. Filtered the resulting image by the corresponding sizes, until only the coin remains.
4. Obtained the size of the blob
5. Compute mean and standard deviation of the sizes of each class.
The result of these steps for each class are shown in Figure 2.

Figure 2. Blobs of the coins 
For the color feature, I just computed for the mean of the normalized pixel values of red, green, and blue channels. I can opt to choose at least one of the R G and B channels, plus the size feature, thereby giving us at most 4 features to consider. I only used three (size, G and B) to be able to fully visualize the spread of the data in a 3-dimensional plot. Examples of my 3-dimensional scatter plot of the data are shown below:
Figure 3. 3D scatter plot of all classes. Red, blue, black, green, and dark blue colors represent 0.05, 0,10, 0.25, 1.00, and 5.00 respectively.

It can be observed from the plot that the 0.05 class (red colors) are relatively more scattered than the rest. This is due to the difference in size. Sometimes, the result of the thresholding and morphological operations are not enough to isolate the coin alone. This is due to the difference in light exposure. When I captured an image of the individual coins, I made sure that the tripod is fixed and that there must be no shadow casted in the image. If a shadow is present, a bias would be introduced in the size of the coin. In order to accomplish this, I made sure that all light sources are blocked and majority of the light that illuminates the coin comes from secondary sources. I also thought of capturing all the coins as a single image and just crop the images one by one, but I realized that it is more tedious to do especially that the number of classes I need to classify is 5 and the training set is 10 samples each class, giving a total of 50 samples.

Sure enough, the size of the coins give the most accurate result. There are more intersections between classes in the RGB feature space. A 2D image of the feature space is shown in the image below.

Figure 4. 2D scatter plot of features size vs blue, green and red channels.

Based from the plot, the size and the color of the 5 and 10 centavo coins are relatively near each other. Only the 1 peso coin has a color relatively different from the rest of the classes, except in the green channel.

After getting the scatter plot of the data and computing for the mean values of these training set, I tried performing a random test to my program to test if it's effective enough to classify a coin according to its class. From 52 test sets I have, 47 of these have been classified correctly. The problem with the rest of coins that have been misclassified are due to the darkness/brightness of the image. The isolation of the coin itself is where the problem lies. Thus, the feature that corresponded to the size was miscalculated in these cases. Moreover, it so happened that the objects are of relatively the same color, making it difficult for the program to distinguish the classes.

In order to improve this experiment, it would be better if the coins were to be captured at once, so that the lighting, and camera settings are made sure to be the same for all, thereby removing the bias that could be imposed to the images.


For this activity, I give myself a grade of 10/10 for being able to perform pattern recognition. It took me quite a long time to decide what objects to used. At first  was thinking of using leaves and flowers, but I didn't have enough number of images for the training and test sets.

Huwebes, Setyembre 5, 2013

Principal Component Analysis as a tool for Image Compression

It's blogging time again! :) Now that we have discussed about the Image file format and Fourier Analysis, we are now ready to tackle a more advanced method that is almost the same with Fourier Transforms. We have learned from the past activities that Fourier transform is very essential not only in the world of signal processing, but also in image processing.

Now, let's tackle about the PCA or the Principal Component Analysis. Basically, PCA to me is like a sister of the Fourier transform. They are both orthogonal and express a signal into a linear superposition of subsignals or components. Suppose a signal is represented with xy-coordinates shown in Figure 1. The information is represented by a 2-dimensioanal coordinate and the signal follows a straight line. If I introduce another coordinate, say ji –coordinate where it is defined by the rotation of the signal with respect to the x-coordinate, I can compress the information into 1-dimensional since the signal has 0 values in the jth coordinate.
Figure 1. PCA: Decomposing the number of representation of a certain information

Now, extending our analysis in an N-dimensional space, information can be expressed in a lesser number of components or dimension through a number of rotations or change of coordinates. This is what PCA does: to find the principal components of a given signal.

I find it really helpful that the AP187 and AP186 class are in sync. This topic has already been discussed in AP187, so it was easier for me to understand the concepts when it comes to compressing images.
Essentially, we can think of an image as a signal, only that it is 2-dimensional. According to what Dr Soriano said, an image of a face needs a minimum number of 100 pixels to be able to comprehend or distinguish its features. Each set of block pixels (say 10x10) stores a large array of information that can represent the whole image. When using PCA, we can minimize this with a minimal loss in the resolution or quality of the image. This is how jpeg images are being compressed, which is why it is an example of a lossy compression. PCA uses the weighted basis eigenfunctions (or in this case, eigenimages) to reconstruct the original image.

Let’s now try to reconstruct an image using PCA. I just got back from Cagayan de Oro last weekend and I had a one –kind of an adventure with my high school barkada. The picture of us below is taken at a coffee shop while we were chilling out around the city proper.

Figure 2. Image to be reconstructed

The size of my image is 336x448 pixels. Now, the first step in reconstructing this image is dividing the image into a number of 10x10 pixel blocks, creating a total of 1452 sub-blocks. Each block is then translated into a single column to have 1452 sets of 100x1 values. We call this matrix x. We can now decompose x using pca() to get the principal components and the eigenvalues. These are stored in the second (facpr) and third (comprinc) matrix, respectively, that is returned by pca(). The comprinc stores the eigenvalues themselves so there is no more need to normalize it. We can just get the dot product of this and the eigenvectors given by the facpr. The result then gives us the reconstructed image information. After getting this, we are now ready to assign back the information back to its matrix form of 330x440 pixels.

Since the goal of PCA here is to compress the image by decreasing the information stored in a number of dimensions, we can just get the correct number of principal components needed to construct the image up to 99% accuracy instead of using all principal components. This can be obtained by getting the cummulative sum of the second row of the matrix lambda that is returned by the pca(). My sample picture needs the first M = 16 principal components to reconstruct the image with at least 99% information.

 Figure 3. Reconstructed grayscale images with varying number of components (M) used

Notice from the figure above that the reconstructed image even using only the first principal component appears successful. As you increase the M, you increase the number of principal components is increased. At M = 16, we were able to reconstruct the image without any sign of information loss.

Another method I used is using only the eigenvectors provided by the facpr matrix, I calculated for the eigenvalues by taking the dot product of the signals with the facpr, the same way I did when I took the dot product of the principal components to the desired signal to get the coefficients of the principal components  back in our Applied Physics 187 activity. I then used these coefficients and obtained the superposition of the principal components multiplied with their corresponding coefficients. The result is the information regarding the reconstructed image.

I also wondered how the pictures in Figure 3 would look like if I reconstruct them into their respective RGB equivalents. I tried reconstructing each channel (R,G and B) of the original image.




Figure 4. Reconstructed RGB image using comprinc as eigenvalues at different number of principal components

You can't almost see the difference between the different number of principal components used for M=16, 50 100. This is why PCA is a very effective way of compressing images. 


Figure 5. Comparison of reconstructed images using different methods

Figure 5 shows a comparison in the reconstructed images between using comprinc and facpr and that of using facpr only. There is no apparent difference between the two.


For this activity, I give myself a grade of 11/10 for being able to reconstruct the images in two ways.

Miyerkules, Setyembre 4, 2013

Playing musical notes with Scilab

Hey! Are you fond with music? If yes, then you made the right choice of viewing this blog. :) Now, in case you're not so familiar with musical notes, you can try and search google about the different notations made available to everyone. A table of values for the correct frequency and pitch are available in the web. You can visit these websites [1] [2] [3] to start learning. Be sure to go back to this page and be fascinated with how powerful image processing is! :)

For the past weeks, I've been blogging about different image processing techniques we have learned in our Applied Physics 186 class. We learned about image enhancement using both histogram and Fourier analysis, morphological filtering, and of course, (color) image segmentation. Now that we are equipped with enough knowledge and experience on these different techniques, it's time to integrate them all in a more challenging activity: playing musical notes.

Given a musical note found on the web, the challenge for us is to play the keys using Scilab. We have to use all the image processing techniques we have learned to be able to segment the image and let the computer read the corresponding notes dictated by the processed image. For this activity, I chose the traditional version of the Auld Lang Syne. I've always been fascinated to this music. I wanted to play the Pachelbel's Canon, but I figured out it's better if I start first with something simpler.
Figure 1. Auld Lang Syne notes

I started out by cropping this image and obtaining just the notes and disregarding the lyrics at the bottom. By using imcomplement(), I obtained the inverse of the image so that the notes are valued 1 and the background are valued 0. My first instinct in removing the unwanted lines (staff) was to apply Fourier transform and creating a mask, but I figured out that applying morphological operations are much easier. I created a structuring element that is vertical and 3 pixels long and applied opening operation to the image to remove the single-pixel horizontal lines. The result is shown below:
Figure 2. Result after cropping, inverting the image and applying the opening operation with a vertical structuring element


In order for me to determine the correct notes, I also isolated the circular-shaped blobs found at the bottom of each note. I determined the centroid of each of these blobs using AnalyzeBlobs(). Also, I cropped the notes from each row and aligned them all into a single row so that the notes are read by the computer from left to right. The image is shown in the following figure.

Figure 3. notes arranged from left to right in correct order

This time, I can now assign the range of possible location of the vertical axis of the centroid for a particular note. The challenge in this method lies in the presence of whole and half notes that do not form as whole blobs in the image. I also need to isolate those that are non-quarter notes. I isolated these notes by getting the correct size of the blobs and filtered them using FilterBySize()

Now my next problem lies on the dots beside the notes. In detecting the additional time, I searched all the blobs first and compute for their corresponding area. The histograms are then plotted to check the distribution of the blob sizes.
Figure 4. Size Histogram of the isolated blobs

The ones below the size of 20 sqpixels corresponds to the dots, while those that are greater than 150 sqpixels are probably that of the eighth note. Those that belongs to the  range of 100-120 sqpixel size corresponds to the quarter notes. The rest of the blobs correspond to the half notes.

Having known this, the code is now ready to read the musical keys, and play the "Auld lang syne".


As you may well observe, it follows the traditional old lang syne. For this activity, I give myself a grade of 10/10 for being able to play musical notes using Scilab.



Martes, Agosto 20, 2013

Application of Binary Operations 1: Blob Analaysis

Here we go again. Time really flies. We're now down to our 11th activity! It seems like yesterday that I was exploring the syntax and functions used in Scilab. This time, we are more equipped with the knowledge and programming skills to deal with more challenging problems. Our topic this time is blob analysis which integrates all the lessons we have learned from the past. In segmenting an image, we often find the suitable region of interest (ROI) and apply thresholding or color image segmentation. Sometimes, the image that we have an overlap in the graylevel histogram of the ROI and the background[1]. In cases like this, the binarized image has to be further cleaned using morphological operations such as opening or closing to be able to separate the region of interest from the unwanted parts that was segmented. Now, you may be wondering what are opening and closing operations. If you have read my previous blog regarding morphological operations, you must be knowledgeable on erosion and dilation. Opening is often done to preserve foreground regions that are similar to the chosen structuring element[2]. It is done by performing erosion and then followed by dilation. Thus, opening is not as destructive as erosion, and is used if the main goal is to remove unwanted dirt in the segmented image.

The aim is to apply binary operations to be able to analyze blobs of a given cell sample. The sample cell image is shown in the figure below.

Figure 1. Normal cell sample (left) and the image subject to be analyzed (right)

The first task is to separate the cell sample image into a number of 256x256 pixels of subimages, and to get the correct threshold from the its histogram that would separate the image from the background. I generated the histogram for the whole image using Scilab this is shown in the next figure.

Figure 2. Histogram of the Cell Sample and the thresholded image

The image shows a bimodal histogram which dictates that the most number of counts corresponds to the background. From this, we can extract the correct threshold that should be used  if we aim to binarized the image and separate the region of interests (cells) from the gray background. For the whole image, I used 222 as the threshold. I also tried using 215 as the threshold value and compared the results with the first. 

Figure 3. Result of thresholding the image at 222 and 215

Notice that the resulting thresholded images shows some irregular shapes. Thus, we use morphological operations to "clean" the image so that the image left is that of regularly sized circular shapes. I used the opening operation since my aim here is to remove the dirt outside  region of interests. The structuring element that I used is a circle with a radius of 11 units. The result of applying morphological operations and filtering by size is shown in the following image.

Figure 4. Filtered image using Morphological Operation (Opening)

The use of two different threshold affects the resulting image after applying the opening operator significantly. Since we want to determine the average size of each cell and obtain a standard deviation that would give us the best estimates of the normal cell size, I obtained the pixel area of each blob in the image and obtained the histogram of the all the area. This is shown in the following figure.
Figure 5. Histogram of the area computed for each blob in the filtered image


In this histogram, we can obtain the majority pixel area of the blobs. Those that are greater than 600 corresponds to the area of the cells that are overlapping. We can thus exclude these and use filter by size using the correct interval of area. Using the information from the above histogram, I was able to separate those normal cells that are not interconnected. Using these, I calculated the best estimates of their size and obtained the same value for the 222 threshold and 215 threshold. The best estimate, same for both, is 474.03448+- 50.314357 pixels. The picture below are the resulting individual blobs for each threshold used.
Figure 6. Filtered individual blobs using the histogram of the area count

We can now use the best estimate we obtained in separating the abnormally large cancer cells in the next image.
Figure 7. Applying the best estimate for a normal sized cell

Finally, using a larger structuring element and the inspection of the histogram of the area count of all continuous blobs, I was able to separate the abnormally sized cells. The image is shown below:
Figure 8. Abnormally large cells separated from the rest of the normal cells


In this activity, I give myself a grade of 10/10 for all that is required and being able to determine the effect of changing the threshold and radius of structuring element. 





[1] Maricor Soriano. Activity 11 Manual Application of Binary Operations 1: Blob Analaysis


Linggo, Agosto 11, 2013

Morphological Operations

I must say that this activity is quite engaging since it actually requires us to draw on hand. Well to give you the gist, we were assigned to predict and observe the output of performing morphological operations to certain images by both hand-drawn and numerically. These two methods are then compared and the result of our imaginations (hand-drawn images) are verified.

Morphological operations are often used as pre or post-processing tool in image processing. It is applied to binary images and are used to either for thinning, filtering or pruning. It is also used to get a representation of the shape of an object  or regions such as boundaries, skeletons, convex hulls and the likes.

The two principal methods of the morphological operation are the erosion and dilation. Erosion basically shrinks the objects into smaller dimensions by eroding some components at the boundaries. Dilation, on the other hand, performs the opposite. It expands the object by filling in holes and connecting the disjoint objects. The extent to which the objects are shrank or expanded are dependent on the structuring element used. Structuring elements can be of any shape.

Mathematically, the dilation of A by B is defined as [1]:
where B is the structuring element. It is illustrated by the following figure:

Figure 1. Dilation of A by B obtained from [1]
Meanwhile, erosion is mathematically defined by the following operator [1]:
where B is again the structuring element. The illustration of erosion is shown in the figure below.
Figure 2. Erosion of A by B obtained from [1]
Notice from figure 1 and 2 that the effect of structuring element to A and B are to elongate and to shrink respectively. Given different figures (A) and structuring elements (B), we are tasked to perform morphological operations erosion and dilation. A picture of my hand-drawn result is shown below:




Figure 3. Hand-drawn result of the morphological operation erosion and dilation

Note that the blank areas (erosion part) corresponds to absence of the object itself. This means that it is possible that the object will be annihilated by the structuring element when performing erosion.

Let's now compare the result of the morphological operation performed using Scilab. The erosion and dilation results for the 5 by 5 square is shown in the following figure where the structuring elements are the blue colored ones.


Figure 4. Inverted result of the morphological operation performed on a 5x5 square and a cross using Scilab


Since in Scilab, the value equal to 1 is white and 0 is black, the result shown in figure 4 is inverted for easier matching to my hand-drawn predictions. Additional results are shown in Figure 5.

Figure 5. Inverted result of the morphological operation applied on a hollow square and a triangle using Scilab

Comparing my hand-drawn predictions to the actual results, we could observe a few mistakes. I apologize for those mistakes, I must have been really in a hurry when I did it. For this activity, I give myself a grade of 10/10 for doing all the required task.


[1] Maricor Soriano. Morphological Operation activity Manual
[2] Morphological Operations on Binary images from http://users.utcluj.ro/~raluca/ip_2013/ipl_07e.pdf











Enhancement in the Frequency Domain

It's been so long since the last time I have posted my results here. I figured out it's rather more difficult to write about a certain topic that is long overdue because the idea and the hype about it is not that  fresh anymore. Anyway, the idea here is to enhance an image using the frequency domain. If we are given an image with unwanted repetitive patterns and we want to remove them, the straightforward thing to do is to create filter masks that will block the unwanted frequencies. The key points behind this are the idea of the convolution theorem, and these are the following [1]:
1. The Fourier Transform of a convolution of two functions in space is just the product of their Fourier transforms. That is, 
2. The convolution of a dirac delta function and a particular function is the replication of that function to the location of the dirac delta. That is,

So to start with, I obtained the fourier transform of an image with two points separated and symmetric about the y-axis and the result is a sinusoid as shown in the following figure.
Figure 1. FT of two dots (one pixel each) separated and symmetrical to the y-axis

Suppose we increase the number of pixels and create a circular pattern of a certain radius r and the Fourier transform is applied, we should get the following result for different radius.
Figure 2. Increasing radius and their corresponding FT

As you increase the radius, the size of the airy pattern formed decreases. The FT of the circle with radius 1 appear almost the same as that of Figure 1, only the shade shows some curvature at the sides. This is because the resulting image is the product of the airy pattern and the sinusoid. The FT of the two dots is just the sinusoid alone. 
As for the FT of the two square symmetric at the center, the FT should be the product of the sinc function and the sinusoid. The increase in the width of the square resulted to the decrease in size of the pattern formed.
Figure 3. Increasing width of the square and their corresponding FT

Meanwhile, the same pattern is observed when the squares are replaced with a gaussian pattern. The effect of the increase in variance is shown in Figure 4. The same with the symmetrical squares and circles, the FT of the two symmetrical gaussian is a combination of the FT of gaussian and a sinusoid. Since the FT of a gaussian is also a gaussian of different form (different parameters), the result shows an image of another gaussian pattern with a sinusoid pattern. The increase in the variance results to the increase in size of the gaussian image, and consequently the decrease in the size of the corresponding FT.

Figure 4. FT of two symmetric gaussian about the center with increasing variance


To evaluate the effect of idea number 2 of the convolution theorem, I created 10 dots placed randomly on an array of zeros. These represents the dirac delta functions. I also created different patterns of stars of different sizes. Then, I took the convolution of the two functions and obtained the following result.
Figure 5. top: pattern convolved with the randomly placed dirac delta and their (bottom) corresponding results

As stated from number 2 above, the pattern was replicated to the locations of all the dirac delta functions. For the case of the last column in Figure 5, the pattern was too big, so the result is an overlapping images of the same pattern.

Now, let's go to the actual application of these concepts. Given an image of the craters of the moon from the NASA Lunar Orbiter, the goal is to remove the horizontal lines observed in the image. 

Figure 5. Image of the crater of the moon 
The only instruction we were given was "Remove the horizontal lines in the image by filtering in the Fourier Domain" so this part of the activity is much more challenging than the rest. The first thought that entered my mind when I saw the horizontal lines in the image is the fourier transform of a number dots symmetrical at the center. 



Figure 6. Enhanced Image of the crater of the moon


Another task is to remove the noise of the following image of a house. 

Figure Subject image

In this image, no pattern can be easily observed. Thus, we have to observe its fourier transform. I created a mask by thresholding the FT of the image and converting it to binary. The result is the following image.

Resulting enhanced image
 I couldn't have done this last part without the help of Eric who guided me in performing these image enhancement. Kudos to Eric for patiently helping me! :)

I give myself a grade of 10/10 in this activity for being able to do all the assigned task.



Sabado, Agosto 10, 2013

Color Image Segmentation

Finally! The topic that I have been aching to learn since I started doing  my research problem. Haha. The truth is, I've already tried learning this on my own since the summer of 2013. I had to present meaningful results to my advisers at the time and I found out that color segmentation is one of the topics taught in AP186. I was actually ranting why it has to be taken during the fifth year and not earlier. Well, that of course, is the selfish side of me talking. I knew there was a reason why it had to be taught on our last year.

My research is on the dynamics of granular collapse. The final configurations and the topology of the flow of granular particles are my main concerns. For that reason, I have to find the final position of all the particles resulting from the collapse of a granular column. In order for me to fully visualize and understand their dynamics, I had the grains color-coded and layered into three, both vertically and horizontally. An example of my raw data (compressed) is shown below:


Figure 1. Raw shots from my experiment on granular collapse
As can be observed from the pictures above, the red and yellow grains appear more apparent than the blue ones. My problem lies with the fact that the blue grains camouflage with the background. When I did the experiment, my initial plan was to make the background white or green. However, I have another experiment involving starch (colored white) which uses the same setup, so a white background was not a good option. My other experiment also involve the use of yellow green grains so a green background was not a good option either. I had to settle to black because it was the only color available at the time too.

Anyways, enough with my rants and stories. Let's get back to the main topic of this blog post -- the color segmentation. From the root word alone, segment, defined by the thefreedictionary.com as "to divide or become divided into segments", image segmentation is the partitioning of a digital image into smaller regions and separating a certain region of interest (ROI). There a number of processes in which one can segment an image. The simplest example of this is thresholding method where the desired regions of a certain image is characterized by particular gray-level value. The cut-off graylevel value is then chosen and pixel values not belonging to the corresponding chosen range of graylevel values are considered 0, otherwise, it's 1. Thus, thresholding converts a grayscale image to binary. This process is considered to be the simplest because it's very straightforward to perform especially if the image involve consists of a bimodal histogram wherein the foreground and the background can be easily separated.

If however, we are faced with images that include shading variations (for example: 3D objects), it is best to perform other methods that can separate the pure chromaticity information from brightness. In an RGB image, each pixel has a corresponding value for red, blue and green. If we then let I be equal to the sum of the values R, G, and B, then the normalized chromaticity coordinates are:
(NCC)

Thus, 1 = r +g+b so that the normalized blue coordinate would just be:
We can just therefore represent the chromaticity using two coordinates which are the r and g. We note that the brightness information is stored in the I value. Since we are now dealing with the normalized chromaticity coordinates, we were able to reduce the color information from 3-dimensions (RGB) to 2-dimensions (rgI) where I is just equal to unity. The normalized chromaticity space is shown in the following figure, where the y-axis and the x-axis corresponds to the r and g values, respectively.
Figure 2. Normalized chromaticity space
From the chromaticity space, a red colored pixel is therefore at the lower right corner with values (1, 0), and the value (0,0) corresponds to a pure blue pixel since b = 1-0-0 = 1. One can also notice that the white pixel value appears at the coordinate  (0.33, 0.33), so that the r, g and b, all have the same value.

We now proceed to the discussion of the different methods of segmentation based on color -- parametric and nonparametric. In performing these methods, one begin by choosing and cropping a region of interest  (ROI) from their image. In parametric segmentation, the color histogram of the chosen ROI is normalized to obtain the probability distribution function of the color. This PDF would then serve as the basis in determining if a certain pixel in the image belongs to the region of interest. A joint probability p(r)p(g) for green and red coordinates corresponds to the likelihood of a pixel belonging to the ROI where p(r) is given by the following equation:
In the above equation, we assume a Gaussian distribution independently along r and g values. The mean and standard deviations are therefore calculated first from the chosen ROI for both the red and green values. The same equation is computed for the green values.  This method was performed in the image below and the result is displayed along with it.

Figure 3. A copy of the original image obtained from [2]

Figure 4. Parametric segmentation of the image from figure 3


Since the probability distribution function of the color is dependent on the chosen ROI, it is safe to assume that the precision of the segmentation is dependent on the number of pixels present in the ROI. If say we increase the number of pixels in the ROI, there is a greater chance that a higher variety of shade will be present in the ROI and the standard deviation would be higher. Consequently, the probability of a random pixel to belong to the ROI would also increase. If however, there is uniformity in the chosen ROI even though you increase the number of pixels (that is, if no increase in standard dev), no difference should be observed. This can be best explained by the following results.

Figure 5. Effect of reducing the number of pixels included in the ROI

Notice that the decrease in the ROI resulted to the loss of some of the segmented images that appears when the bigger ROI is used. However, not all part of the concerned sections (red colored paint, bottom image)  are not completely filled as compared to the result bigger ROI. 

Another method of segmenting a colored image is by obtaining the 2D histogram of the region of interest and from it, the pixel is tagged whether or not it belongs to the obtained histogram. The 2D histogram of the red ROI is shown below.
'
Figure 6. 2D histogram of the region of interest
We then employ the method of histogram backprojection wherein a pixel location at the original image is given a certain value corresponding to its histogram value. Supposed a pixel colored blue appears at (0,0), then its corresponding value would be 0 since at (0,0), the 2D histogram is 0. The result of the nonparametric segmentation is shown in the following figure.
Figure 7. Nonparametric segmentation of the image from figure 3

The effect of increasing the number of pixels of an ROI for the nonparametric segmentation is also shown in the next figure. 

Figure 8. Effect of reducing the number of pixels included in the ROI for nonparametric (reduce patch size)

It is intuitive that if we decrease the patch size that we use as ROI, some shades would also be removed from the histogram. This is apparent in Figure 8.

If we now compare the result of the nonparametric to the parametric segmentation, we can easily say that the nonparametric segmentation shows a more accurate result. 

In conclusion, I would like to say that the key in segmentation is to find the right size and area for the ROI. I've learned this because I have used different sizes of ROI in my research. An ROI that is too big would include other unnecessary sections, while a very small ROI would exclude those that are supposed to be included. 

For this activity, I give myself a grade of 12/10 for exploring the parameters that could further affect the result.


[1] Maricor Soriano, Color Image Segmentation Activity Manual
[2] Color Oil paints in opened paint buckets from http://www.flashcoo.com/cartoon/colorful_objects_and_designs/Colorful_oil_paints_Opened_paint_bucket.html on August 10, 2013