Neural Networks are an approach to artificial intelligence that was first proposed in 1944. Modeled loosely on the human brain, Neural Networks consist of a multitude of simple processing nodes (called neurons) that are highly interconnected and send data through these network connections to estimate a target variable.
In this article, I will discuss the structure and training of simple neural networks (specifically Multilayer Perceptrons, aka "vanilla neural networks"), as well as demonstrate an example neural network created by the Alteryx Neural Network Tool.
Question: Why do zombies only date intelligent women?
Answer: They just love a woman with brains.
How Neural Networks Work
Generally speaking, neural network models consist of thousands of neurons (nodes) that are densely connected. In most neural network models, neurons are organized into layers. This includes an input layer, which includes neurons for all of the provided predictor variables, hidden layer(s), and an output layer. The hidden layers of a neural network effectively transform the inputs into something that the output layer can interpret. The output layer returns either a category label (classification) or an estimated value (regression).
At each neuron, all incoming values are added together and then processed with an activation function (e.g., a sigmoid function), which will determine whether or not the neuron is “activated”. Often, a bias will also be included in this calculation, prior to the activation function. The bias is similar to an intercept term in a regression model.
A threshold value is then used to determine if a neuron will “fire”, which means activating all of the outgoing connections to the next layer. Neural Networks can be Feedforward, meaning all data is only passed forward, or Recurrent, which can include cycles or loops through the layers of the network. Multilayer Perceptrons are only Feedforward.
Each connection between neurons is given a (positive or negative) numeric weight. For each connection, this weight, multiplied by the numeric value of the neurons, will be passed on to the connected neuron in the next layer when a neuron meets or exceeds the given threshold. If the threshold value is not met, the neuron will not be activated, and data from that neuron will not be passed on to any neurons in the next layer.
In a classification model, the neuron in the Output Layer with the highest value will determine the model’s estimated label for each record. For a regression model, the neural network returns a value through a single output node.
If this quick overview isn’t making sense, or if you’d like to know a little more about the math, there is a really great video from 3BLUE1BROWN called But what *is* a Neural Network I recommend watching.
How a Neural Network is Trained
Multilayer Perceptron Neural Networks are typically trained with a method known as Backpropagation, which involves adjusting the weights of the neurons in the neural network by calculating the gradient of the cost (loss) function.
To start training a neural network, all of the initial weights and thresholds are randomly generated. The training data is then fed through the input layer and passes through the model until it arrives at the output layer. At the output layer, a cost function is calculated to estimate how the model performed in estimating the known target variable. The output of the cost function is minimized when the network confidently estimates the correct value and increases with misclassifications. The goal of the training algorithm is to minimize the value of the cost function.
Weights and thresholds in the neural network are then adjusted to minimize the cost function (this is the calculus part) until the model converges on a local minimum. The process is repeated and weights and thresholds continue to be adjusted based on the training data and a cost function until all data with the same labels result in similar values.
Part 2 and Part 3 of 3BLUE1BROWN’s series on Neural Networks cover Training and Backpropagation, respectively and might be helpful to you if you would like to know more. There is also an excellent open source textbook on Neural Networks and Deep Learning by Michael Nielsen available here.
Neural Networks in the Alteryx Neural Network Tool
The Alteryx Neural Network Tool uses the R package nnet which generates a feed-forward neural network with a single hidden layer. Feed-forward refers to the direction in which data can be passed between layers. A Feed-forward model can only pass data “downstream”. The single hidden layer is also an important aspect of this particular implementation. Although users can adjust the number of neurons included in the hidden layer, the number of hidden layers cannot be increased. These neural networks do fall under the classification of Multilayer Perceptrons.
The structure of a neural network trained in Alteryx might look something like this:
The light green lines (AYX Wasabi) are negative weights, and the dark green lines (AYX Green Apple) are positive weights. Line thickness is used to depict the relative weight assigned to each connection.
The predictor variables (if you haven’t guessed yet, this neural network was trained with the infamous Iris dataset) are included in the first layer.
The bias layers (AYX Hot Sauce) depicted at the top of the diagram apply constant values to the neurons (similar to intercept terms in a regression model). The activation function of the hidden layer will be sigmoid, and the activation function of the output layer will depend on the target field. Binary classifications will use a logistic activation function, multinomial classifications will use the softmax function, and regressions (models with a continuous target variable) will apply a linear activation function.
The cost function used to train the model (using the backpropagation method) will also depend on the target variable. For a classification model (the target variable is categorical), the cost function is an entropy measure called Cross-Entropy. For a regression model (where the target variable is continuous), the cost function used to train the neural networks is the residual sum of squares.
Each record (observation) in a data set is passed through this network. Depending on the values of the predictor variables, different neurons in each layer will be activated, resulting in an estimation for the record in the output layer.
Hopefully, this has cracked open the hard casing around Neural Networks and allowed you to digest the good, fleshy bits.
A geographer by training and a data geek at heart, Sydney joined the Alteryx team as a Customer Support Engineer in 2017. She strongly believes that data and knowledge are most valuable when they can be clearly communicated and understood. She currently manages a team of data scientists that bring new innovations to the Alteryx Platform.
A geographer by training and a data geek at heart, Sydney joined the Alteryx team as a Customer Support Engineer in 2017. She strongly believes that data and knowledge are most valuable when they can be clearly communicated and understood. She currently manages a team of data scientists that bring new innovations to the Alteryx Platform.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.