understanding signals and photographs: Setyembre 2013

Linggo, Setyembre 29, 2013

Neural Network

When we were young, we were taught about the basics. It could be either in arithmetic or just plain understanding of the objects around us. Apple is red, while banana is yellow. One plus one equals two, and two plus two equals four. These understandings, to most people, are deeply learned after a number of examples and trials. Although some (or very few) are quick in learning, we all need a set of examples before our brain could finally interpret a given idea. The reason behind how the nervous system works have been a topic of debate in neuroscience since the nineteenth century.

This type of learning have been modeled in 1943 by McCulloch and Pitts, particularly how a neuron (in this case, the brain) responds, collects and pass information in the central nervous system. This came first to be the artificial neural network. An illustration of a neuron is shown in Figure 1.

Figure 1. Artificial Model of a Neuron

The basic idea is that, a number of input (x1, x2, x3) is connected to a neuron via edges, each of which has corresponding weights (w1, w2, w3). The neuron then collects and receives the weighted inputs from the other neurons, sums them up, and passes it to the activation function, g. An initialization of the weights is done and a desired output is placed. This part is called the learning process of the network. Based from the deviation of the output of the network to the desired output, the weights of the input are adjusted, until the deviations are minimal, and the network is able to classify the inputs. This process of determining the weights of the input is an iterative process with a prior knowledge of the output. In this activity, we will also try to determine the effect of varying the number of iterations and the learning rate of a the network.

In the last activity, we worked on pattern recognition, which relies on a set of features that distinguish the different classes to be segregated. This process is an example of a linear discriminant analysis. In the activity, I classified five different types of Philippine Peso coin. These are 5 centavo, 10 centavo, 25 centavo, 1 peso, and 5 peso coins. Each coin have different area, and different colors. However, to some of these, the color gradient are relatively the same from one another, especially if other factors such as fading and scratches in the coin is present.

Unlike pattern recognition where the program performs the same method in all input subjects, the method of neural network learns the pattern and then uses it to classify the next input. A connection of many neurons make up a neural network. It typically consists of an input layer, a hidden layer, and an output layer as shown in Figure 2.

Figure 2. Neural Network

The input layer receives the features, say size and color, and then fed to the hidden layer, where they are acted upon and passed on to the output layer. In supervising the learning, a desired output is initialized and an error back-propagation is performed, where the difference of the output layer and the desired output is computed, and together with the activation functions in each layer, the error derivatives with respect to the weights are computed until epoch is attained. This is the time where all the data sets are already entered in the network, and the modification of the weights is performed.

The ANN (artificial neural network) toolbox present in Scilab was used in this activity. This toolbox is straightforward. I had thought about coding the error propagation before just like what we did in AP 156, but thanks to this toolbox, the activity is made relatively simpler and easier. I think the crucial thing here is to know and understand how neural network works.

The image below is the plot of the predictions performed by my neural network program for a set of 50 training data points, 10 for each classification.

Figure 3. Result of the algorithm for the test data, ten for each classifications. The red ones correspond to the desired and the blue points correspond to the output of the network.

Notice that the result of the program is most accurate in classifying the 5 peso coin.

The ANN toolbox gives the user the luxury to change the parameters such as the training cycle and the learning rate. I tried to determine the effect of these two parameters, and I got the following plot.

Figure 4. % Error as a function of increasing training cycle

The y-axis corresponds to the percent deviation of the result of the program for a single class. In this study, I only used the 5-centavo coin class. As you increase the number of training cycle, the result of the ANN becomes more accurate, which is intuitive since you feed your neuron with more and more example. Isn't it very similar with the human brain? :) The more we are exposed to a certain thing, the more we get acquainted to it, and the more we get familiarized with it.

Now, how about the learning rate? Is it intuitive the the higher the learning rate, the more accurate the result should be? Well, apparently, that is not always the case. This is as suggested by the following plot.

Figure 5. % error for increasing values of learning rate [0.5 10]

There is a certain range where the increasing the learning rate would decrease the accuracy of the result, nd that is at about 4. Beyond this, increasing the learning rate would make the result more accurate, and closer to the correct values. This result is quite interesting! :)

And that's it! We're down to the last activity. Now, we'll proceed to the project. My project is about the topography of a granular collapse, which is my current topic in research. I'm glad Dr. Soriano has allowed us to do project that has something to do with our own research. So yeah, lucky me!

For this activity though, I give myself a grade of 11/10 for investigating the effects of different learning rate and training cycle. :)

Martes, Setyembre 24, 2013

Pattern recognition

And we're back to blogging! The last topic of this blog was image compression using PCA. This time, we will be discussing about pattern recognition. We have tried a number of image segmentation techniques from the past. Pattern recognition is a bit similar to these, only this time, we need some training sets before the program can detect the pattern. For this activity, I used Philippine coins as subjects, particularly the five peso, one peso, 25, 10 and 5 cents. There are 5 classes to be distinguished. For each class, I used ten training sets. Representative images of these set is presented in Figure 1.

Figure 1. Training sets arranged according to different classes

From the images above, one can immediately think that size can be a distinguishing feature. However, there is a possibility that 10 cents could be near the sizes of 25 and 5 cents. Thus, we need another distinguishing feature. Since all classes are circular, the eccentricity can no longer be used. Thus, I'm left with color as an alternative feature. The problem with this is that, 10 cents and 5 cents, and 5 peso and 25 cents are slightly the same in colors. Moreover, if we consider the color of newly released coin from the bank, we will be able to observe that they are relatively shinier and more colorful than the old ones. Regardless of these hypothetical problems, I still tried using these features to classify my objects.

I incorporated the following steps to each of my training set for every classes.
1. Convert the image to binary.
2. Apply morphological operations to remove unwanted blobs as well as isolate the main blob (that of the coin)
3. Filtered the resulting image by the corresponding sizes, until only the coin remains.
4. Obtained the size of the blob
5. Compute mean and standard deviation of the sizes of each class.
The result of these steps for each class are shown in Figure 2.

Figure 2. Blobs of the coins

For the color feature, I just computed for the mean of the normalized pixel values of red, green, and blue channels. I can opt to choose at least one of the R G and B channels, plus the size feature, thereby giving us at most 4 features to consider. I only used three (size, G and B) to be able to fully visualize the spread of the data in a 3-dimensional plot. Examples of my 3-dimensional scatter plot of the data are shown below:

Figure 3. 3D scatter plot of all classes. Red, blue, black, green, and dark blue colors represent 0.05, 0,10, 0.25, 1.00, and 5.00 respectively.

It can be observed from the plot that the 0.05 class (red colors) are relatively more scattered than the rest. This is due to the difference in size. Sometimes, the result of the thresholding and morphological operations are not enough to isolate the coin alone. This is due to the difference in light exposure. When I captured an image of the individual coins, I made sure that the tripod is fixed and that there must be no shadow casted in the image. If a shadow is present, a bias would be introduced in the size of the coin. In order to accomplish this, I made sure that all light sources are blocked and majority of the light that illuminates the coin comes from secondary sources. I also thought of capturing all the coins as a single image and just crop the images one by one, but I realized that it is more tedious to do especially that the number of classes I need to classify is 5 and the training set is 10 samples each class, giving a total of 50 samples.

Sure enough, the size of the coins give the most accurate result. There are more intersections between classes in the RGB feature space. A 2D image of the feature space is shown in the image below.

Figure 4. 2D scatter plot of features size vs blue, green and red channels.

Based from the plot, the size and the color of the 5 and 10 centavo coins are relatively near each other. Only the 1 peso coin has a color relatively different from the rest of the classes, except in the green channel.

After getting the scatter plot of the data and computing for the mean values of these training set, I tried performing a random test to my program to test if it's effective enough to classify a coin according to its class. From 52 test sets I have, 47 of these have been classified correctly. The problem with the rest of coins that have been misclassified are due to the darkness/brightness of the image. The isolation of the coin itself is where the problem lies. Thus, the feature that corresponded to the size was miscalculated in these cases. Moreover, it so happened that the objects are of relatively the same color, making it difficult for the program to distinguish the classes.

In order to improve this experiment, it would be better if the coins were to be captured at once, so that the lighting, and camera settings are made sure to be the same for all, thereby removing the bias that could be imposed to the images.

For this activity, I give myself a grade of 10/10 for being able to perform pattern recognition. It took me quite a long time to decide what objects to used. At first was thinking of using leaves and flowers, but I didn't have enough number of images for the training and test sets.

Huwebes, Setyembre 5, 2013

Principal Component Analysis as a tool for Image Compression

It's blogging time again! :) Now that we have discussed about the Image file format and Fourier Analysis, we are now ready to tackle a more advanced method that is almost the same with Fourier Transforms. We have learned from the past activities that Fourier transform is very essential not only in the world of signal processing, but also in image processing.

Now, let's tackle about the PCA or the Principal Component Analysis. Basically, PCA to me is like a sister of the Fourier transform. They are both orthogonal and express a signal into a linear superposition of subsignals or components. Suppose a signal is represented with xy-coordinates shown in Figure 1. The information is represented by a 2-dimensioanal coordinate and the signal follows a straight line. If I introduce another coordinate, say ji –coordinate where it is defined by the rotation of the signal with respect to the x-coordinate, I can compress the information into 1-dimensional since the signal has 0 values in the jth coordinate.

Figure 1. PCA: Decomposing the number of representation of a certain information

Now, extending our analysis in an N-dimensional space, information can be expressed in a lesser number of components or dimension through a number of rotations or change of coordinates. This is what PCA does: to find the principal components of a given signal.

I find it really helpful that the AP187 and AP186 class are in sync. This topic has already been discussed in AP187, so it was easier for me to understand the concepts when it comes to compressing images.

Essentially, we can think of an image as a signal, only that it is 2-dimensional. According to what Dr Soriano said, an image of a face needs a minimum number of 100 pixels to be able to comprehend or distinguish its features. Each set of block pixels (say 10x10) stores a large array of information that can represent the whole image. When using PCA, we can minimize this with a minimal loss in the resolution or quality of the image. This is how jpeg images are being compressed, which is why it is an example of a lossy compression. PCA uses the weighted basis eigenfunctions (or in this case, eigenimages) to reconstruct the original image.

Let’s now try to reconstruct an image using PCA. I just got back from Cagayan de Oro last weekend and I had a one –kind of an adventure with my high school barkada. The picture of us below is taken at a coffee shop while we were chilling out around the city proper.

Figure 2. Image to be reconstructed

The size of my image is 336x448 pixels. Now, the first step in reconstructing this image is dividing the image into a number of 10x10 pixel blocks, creating a total of 1452 sub-blocks. Each block is then translated into a single column to have 1452 sets of 100x1 values. We call this matrix x. We can now decompose x using pca() to get the principal components and the eigenvalues. These are stored in the second (facpr) and third (comprinc) matrix, respectively, that is returned by pca(). The comprinc stores the eigenvalues themselves so there is no more need to normalize it. We can just get the dot product of this and the eigenvectors given by the facpr. The result then gives us the reconstructed image information. After getting this, we are now ready to assign back the information back to its matrix form of 330x440 pixels.

Since the goal of PCA here is to compress the image by decreasing the information stored in a number of dimensions, we can just get the correct number of principal components needed to construct the image up to 99% accuracy instead of using all principal components. This can be obtained by getting the cummulative sum of the second row of the matrix lambda that is returned by the pca(). My sample picture needs the first M = 16 principal components to reconstruct the image with at least 99% information.

Figure 3. Reconstructed grayscale images with varying number of components (M) used

Notice from the figure above that the reconstructed image even using only the first principal component appears successful. As you increase the M, you increase the number of principal components is increased. At M = 16, we were able to reconstruct the image without any sign of information loss.

Another method I used is using only the eigenvectors provided by the facpr matrix, I calculated for the eigenvalues by taking the dot product of the signals with the facpr, the same way I did when I took the dot product of the principal components to the desired signal to get the coefficients of the principal components back in our Applied Physics 187 activity. I then used these coefficients and obtained the superposition of the principal components multiplied with their corresponding coefficients. The result is the information regarding the reconstructed image.

I also wondered how the pictures in Figure 3 would look like if I reconstruct them into their respective RGB equivalents. I tried reconstructing each channel (R,G and B) of the original image.

Figure 4. Reconstructed RGB image using comprinc as eigenvalues at different number of principal components

You can't almost see the difference between the different number of principal components used for M=16, 50 100. This is why PCA is a very effective way of compressing images.

Figure 5. Comparison of reconstructed images using different methods

Figure 5 shows a comparison in the reconstructed images between using comprinc and facpr and that of using facpr only. There is no apparent difference between the two.

For this activity, I give myself a grade of 11/10 for being able to reconstruct the images in two ways.

Miyerkules, Setyembre 4, 2013

Playing musical notes with Scilab

Hey! Are you fond with music? If yes, then you made the right choice of viewing this blog. :) Now, in case you're not so familiar with musical notes, you can try and search google about the different notations made available to everyone. A table of values for the correct frequency and pitch are available in the web. You can visit these websites [1] [2] [3] to start learning. Be sure to go back to this page and be fascinated with how powerful image processing is! :)

For the past weeks, I've been blogging about different image processing techniques we have learned in our Applied Physics 186 class. We learned about image enhancement using both histogram and Fourier analysis, morphological filtering, and of course, (color) image segmentation. Now that we are equipped with enough knowledge and experience on these different techniques, it's time to integrate them all in a more challenging activity: playing musical notes.

Given a musical note found on the web, the challenge for us is to play the keys using Scilab. We have to use all the image processing techniques we have learned to be able to segment the image and let the computer read the corresponding notes dictated by the processed image. For this activity, I chose the traditional version of the Auld Lang Syne. I've always been fascinated to this music. I wanted to play the Pachelbel's Canon, but I figured out it's better if I start first with something simpler.

Figure 1. Auld Lang Syne notes

I started out by cropping this image and obtaining just the notes and disregarding the lyrics at the bottom. By using imcomplement(), I obtained the inverse of the image so that the notes are valued 1 and the background are valued 0. My first instinct in removing the unwanted lines (staff) was to apply Fourier transform and creating a mask, but I figured out that applying morphological operations are much easier. I created a structuring element that is vertical and 3 pixels long and applied opening operation to the image to remove the single-pixel horizontal lines. The result is shown below:

Figure 2. Result after cropping, inverting the image and applying the opening operation with a vertical structuring element

In order for me to determine the correct notes, I also isolated the circular-shaped blobs found at the bottom of each note. I determined the centroid of each of these blobs using AnalyzeBlobs(). Also, I cropped the notes from each row and aligned them all into a single row so that the notes are read by the computer from left to right. The image is shown in the following figure.

Figure 3. notes arranged from left to right in correct order

This time, I can now assign the range of possible location of the vertical axis of the centroid for a particular note. The challenge in this method lies in the presence of whole and half notes that do not form as whole blobs in the image. I also need to isolate those that are non-quarter notes. I isolated these notes by getting the correct size of the blobs and filtered them using FilterBySize()

Now my next problem lies on the dots beside the notes. In detecting the additional time, I searched all the blobs first and compute for their corresponding area. The histograms are then plotted to check the distribution of the blob sizes.

Figure 4. Size Histogram of the isolated blobs

The ones below the size of 20 sqpixels corresponds to the dots, while those that are greater than 150 sqpixels are probably that of the eighth note. Those that belongs to the range of 100-120 sqpixel size corresponds to the quarter notes. The rest of the blobs correspond to the half notes.

Having known this, the code is now ready to read the musical keys, and play the "Auld lang syne".

As you may well observe, it follows the traditional old lang syne. For this activity, I give myself a grade of 10/10 for being able to play musical notes using Scilab.

understanding signals and photographs

Mga Pahina