My task in the project was to clean the data and look for correlations between variables in the dataset. For a correlation between datasets to exists, it is important to check first the corresponding variances. While national-level analyses do not show relationship between disaster experience and test scores, sub-national levels give a more meaningful result. These results are often masked by other noisy data when we consider the national-level.
In this work, I will look at the relationship between frequency of experienced typhoon and overall disasters, and the number of times the school is used as an evacuation center to the performance of the schools in the NAT. In particular, I will focus on the data from the province of Leyte.
Will the Neural network be able to predict successfully the probable performance of a school given the disaster variables?
I used the existing Neural Network fitting in Matlab. There are three training algorithms present. For each of the algorithm, I varied the number of hidden layers, and observed the error distribution.
Training algorithms:
A. Levenberg-Marquardt
Here, I used the default number of hidden neurons, which is 10.
Figure A1. Left: Error histogram. Right: Regression analysis for training, validation and tests.
The error distribution is Gaussian in shape with a mean at about -2.
I tried to get change the number of hidden neurons to 100, and I got the following error histogram.
Figure A2. Error histogram for hidden neurons = 100.
The mean is still around 2.3, but an outlier is observed at around 109.9, thereby increasing the variance in the distribution.
Figure C1. Left: Error histogram. Right: Regression analysis for training, validation and tests.
Figure C2. Error histogram for hidden neuron = 100, with the network retrained multiple times.
Training multiple times will generate different results due to different initial conditions and sampling. As can be seen from Figure C2, sometimes, repeatedly training the network does not always lead to better results.
It should be noted that the these learning algorithms differ in terms of the speed of the training and the amount of memory used in the program. The number of hidden neurons is sometimes increased to improve the performance of the network. However, in cases where the data consists of simpler, smaller number of points, perhaps it is better to use a smaller number of hidden neurons. Based from what I've read, the number of hidden neurons roughly should be between the size of the input and the size of the output [1].
As can be observed from the regression plots and error histograms, the neural network can indeed predict to a moderate level of accuracy the possible test score of a school given the frequency of disasters it experienced.
[1] Jeff Heaton. Introduction to Neural Networks for Java.
Walang komento:
Mag-post ng isang Komento