You designed a 5-billion-parameter language model gcp video
ml-engineer-pro video for you designed a 5-billion-parameter language model in TensorFlow Keras that used autotuned tf.data to load the data in memory. You
Answer
          Full Certification Question
You designed a 5-billion-parameter language model in TensorFlow Keras that used autotuned tf.data to load the data in memory. You created a distributed training job in Vertex AI with tf.distribute.MirroredStrategy , and set the large_model_v100 machine for the primary instance. The training job fails with the following error: “The replica 0 ran out of memory with a non-zero status of 9.” You want to fix this error without vertically increasing the memory of the replicas. What should you do?