You're using Kubeflow Pipelines to build an end-to-end PyTorch-based MLOps pipeline, which involves data reading from BigQuery, processing, feature engineering, model training, evaluation, and model deployment to Cloud Storage. You're developing code for different versions of feature engineering and model training steps, running each in Vertex AI Pipelines. However, each pipeline run is taking over an hour, slowing down your development process and potentially increasing costs. What's the best approach to speed up execution while avoiding additional costs?