A machine learning engineer trained a deep neural network for a complex classification problem. He realized that the model took so long to train with minimum changes to the weights. The activation function used in the hidden layer is sigmoid. What could he do to overcome this problem? (Select THREE.)