You are a machine learning engineer tasked with optimizing the training process of a data-intensive computer vision model on Amazon SageMaker. You want to maximize parallelism and fully utilize the hardware capabilities for distributed training. Which configuration should you adopt to achieve the most efficient training?