Rectifier Activation Function
A rectifier activation function (also referred to as a Rectified Linear Unit or ReLU) is defined as:
Rectified linear units, compared to sigmoid function or similar activation functions, allow for faster and effective training of deep neural architectures on large and complex datasets. As shown below, it changes any negative values to zero and has a straight line shape over the defined functional space:
The derivative of the Rectifier function is:
Common positive comments about ReLU activation functions include:
Faster convergence of stochastic gradient descent. Due to their linear, non-saturating form.
Simpler, faster computation. Due to their linear form not involving exponential operations.
Common negative comments include:
They can be fragile and 'die'. If the learning rate is set too high, a large gradient flowing through a ReLU neuron can cause weights to update is a way that the neuron will never activate again.