Exponential Linear Unit (ELU) is a popular activation function that speeds
up learning and produces more accurate results. This article
is an introduction to ELU and its position when compared to other popular
activation functions. It also includes an interactive example and usage with
PyTorch and Tensorflow.
Djork-Arné Clevert,
Thomas Unterthiner,
Sepp Hochreiter
introduced ELU in Nov 2015. It
outperformed ReLU-based CIFAR-100 networks at the time. To this day, ELUs are
still popular among Machine Learning engineers and are well studied by now.
What is ELU?
ELU is an activation function based on ReLU that has an extra alpha
constant (α) that defines function smoothness when inputs are negative.
Play with an interactive example below to understand how α influences the
curve for the negative part of the function.
ELU activation
Interactive chart
Alpha constant (α)
1
Drag the slider to adjust the alpha constant.
ELU calculation
ELU(x)={xα(ex−1)ifx>=0ifx<0
The ELU output for positive input is the input (identity). If the input is
negative, the output curve is slightly smoothed towards the alpha constant (α).
The higher the alpha constant, the more negative the output for negative inputs
gets.
ELU vs ReLU
ELU and ReLU are the most popular activation functions used. Here are the
advantages and disadvantages of using it when compared to other popular
activation functions.
Advantages of ELU
Tend to converge faster than ReLU (because mean ELU
activations are closer to zero)
Better generalization performance than ReLU
Fully continuous
Fully differentiable
Does not have a
vanishing gradients
problem
Does not have an
exploding gradients
problem
Does not have a
dead relu
problem
Disadvantages of ELU
Slower to compute (because of non-linearity for negative input values)
ELU is slower to compute, but ELU compensates this by faster convergence
during training. During test time ELU is slower to compute than ReLU
though.