AWS Exams GCP Exams Azure Exams GitHub Exams Jira Exams ISC2 Exams

Video: At Vertex Insights , you're building a managed gcp video

Question 1

« Back Next gcp data-engineer-pro Question »

Full Certification Question

At Vertex Insights , you're building a managed Hadoop-based data lake in Google Cloud. Your data processing pipeline consists of a series of Hadoop jobs that run one after another. To separate storage and compute, you've configured the system to use the Cloud Storage connector for reading inputs, writing outputs, and handling intermediate data. However, one specific Hadoop job is significantly slower on Cloud Dataproc compared to your on-premises bare-metal Hadoop setup (which uses 8-core nodes with 100 GB RAM). After analysis, you determine that the job is highly disk I/O intensive. You need to fix this performance issue. What should you do?