the xor classification problem
When given two binary inputs, a neural network should return 1 if the two inputs are not equal, and a 0 if the two inputs are equal.
Making a neural network learn the XOR function is a classic ai problem. (there are dozens of articles explaining this exact problem). XOR and XNOR are more complicated than any other logic gate because they are not linearly separable.
interactive simulation
This is a neural network that uses a truth table as training data. After it has trained, it will converge on an algorithm which can accurately implement the desired boolean algebra function.
A | B | f | NN |
---|---|---|---|
0 | 0 | . . . | |
0 | 1 |
. . . | |
1 |
0 | . . . | |
1 |
1 |
. . . |
Click on the outputs in the 'f' column to change them.
After clicking start, the network will output values that (should) closely match the desired 'f' column. The accuracy of the neural network is measured by a cost function, which is minimized over the many iterations.
a basic neural network
A neural network consists of many nodes, which are all interconnected. Here's the neural network structure that I used above:
Photo by researchgate.com
The output of each node is a weighted sum of its inputs, plus a bias value:
a = Σwx + b
where a is the activation, w is the weight, x is the input, and b is the bias.
With only 3 neurons, the network already has 6 weights and 3 biases. These values are initially randomized, meaning the network
produces very inaccurate results to begin with.
Adjusting all of these values slightly
with a process called backpropagation allows the network to find weights and biases that minimize it's cost (how wrong its prediction is).
By repeating this process thousands of times, the network learns to makes better predictions.
problems i ran into
reliabilityI struggled to make the network reliable: sometimes it would be completely wrong
despite how many iterations I trained it for.
Apparently
this is because there are multiple local minimums that the network
can get stuck in without being able to improve further.
I solved this by initializing the weights randomly with values from 0.5 to 1, instead of 0 to 1. Doing this greatly increased the
chances that it would end up in the correct local minimum. (is that cheating ??)
The first implementation I made would require 10 or 20 thousand iterations before becoming accurate.
I read here
that using the tanh activation function rather than the sigmoid function lets the network learn faster.
I also used web workers to put all of the computation on a separate thread and increased the number of hidden nodes from 2 to 3.
After making those changes, my network can reach an acceptable accuracy within 2 to 4 thousand iterations.
thoughts
Throughout this project I learned some things: