Deep learning refers to multi-layer neural networks—supervised learning with stochastic gradient descent using backpropogation.
He gave the following tips for developing a robust deep learning model:
- Use H20 to distribute the algorithm across multiple nodes in order to get a quick result on a very large dataset.
- Automatically standardize the data with feature scaling, setting the mean to 0 and the standard deviation to 1. This helps to ensure that each feature contributes the proper amount to the final model, regardless of its original units and distribution.
- Automatically initialize the weights with a uniform distribution in +/- sqrt(6/(#units + #units_previous_layer))
- Use an adaptive learning rate—automatically set the learning rate for each neuron based on its training history. For more information, read Zeiler's 2012 paper "ADADELTA: An Adaptive Learning Rate Method."
- Use regularization by penalizing non-zero weights and strongly penalizing large weights in order to create a simpler model and reduce the risk of overfitting.
- Dropout randomly selected neurons in order to prevent complex co-adaptations. For more information, read Geoff Hinton's 2012 paper "Improving neural networks by preventing co-adaptation of feature detectors."
- Use grid search and checkpointing to scan many hyper-parameters, then continue training the most promising models.
- Use more layers for more complex functions, especially those with more non-linearity.
- Use more neurons per layer in order to detect a finer structure in the data.
- Use H20 config balance_classes = true for imbalanced classes.
You can find Arno's slides here.