Mga Pahina

Biyernes, Disyembre 18, 2015

Mapping of school-level frequency of disaster occurrence and NAT Scores using Neural Networks

Our task in this activity is to play around with neural network and use it for any fitting or pattern recognition applications. Since I work part-time as a research assistant in a university-based project on interdisciplinary research about the complexity of Philippine Public Schools, I have with me an almost complete dataset of public schools in the Philippines, together with their yearly counts of disaster experiences, evacuation center usage, dropouts, graduation and even NAT (National Achievement Test) Scores. The main plan was to develop a model that describes the resilience of public schools to natural disasters. Intuitively speaking, one would think that if a school always experience flooding or any type of disaster, then class disruptions would result to a lower level overall performance of the school. This hypothesis, however, is not readily observed if one only look at the national-level. National-level data is noisy, and presents zero variances.

My task in the project was to clean the data and look for correlations between variables in the dataset. For a correlation between datasets to exists, it is important to check first the corresponding variances. While national-level analyses do not show relationship between disaster experience and test scores, sub-national levels give a more meaningful result. These results are often masked by other noisy data when we consider the national-level.

In this work, I will look at the relationship between frequency of  experienced typhoon and overall disasters, and the number of times the school is used as an evacuation center to the performance of the schools in the NAT. In particular, I will focus on the data from the province of Leyte.

Will the Neural network be able to predict successfully the probable performance of a school given the disaster variables?

I used the existing Neural Network fitting in Matlab. There are three training algorithms present. For each of the algorithm, I varied the number of hidden layers, and observed the error distribution.

Training algorithms:

A. Levenberg-Marquardt

Here, I used the default number of hidden neurons, which is 10.


Figure A1. Left: Error histogram. Right: Regression analysis for training, validation and tests. 
The error distribution is Gaussian in shape with a mean at about -2.

I tried to get change the number of hidden neurons to 100, and I got the following error histogram.
Figure A2. Error histogram for hidden neurons = 100. 

The mean is still around 2.3, but an outlier is observed at around 109.9, thereby increasing the variance in the distribution.


 B. Scaled Conjugate Gradient
Figure C1.  Left: Error histogram. Right: Regression analysis for training, validation and tests. 

Figure C2. Error histogram for hidden neuron = 100, with the network retrained multiple times. 

Training multiple times will generate different results due to different initial conditions and sampling. As can be seen from Figure C2, sometimes, repeatedly training the network does not always lead to better results.

It should be noted that the these learning algorithms differ in terms of the speed of the training and the amount of memory used in the program. The number of hidden neurons is sometimes increased to improve the performance of the network. However, in cases where the data consists of simpler, smaller number of points, perhaps it is better to use a smaller number of hidden neurons. Based from what I've read, the number of hidden neurons roughly should be between the size of the input and the size of the output [1].

As can be observed from the regression  plots and error histograms, the neural network can indeed predict to a moderate level of accuracy the possible test score of a school given the frequency of disasters it experienced. 


[1] Jeff Heaton. Introduction to Neural Networks for Java.




Walang komento:

Mag-post ng isang Komento