{ "query": "Which Auto Loader option keys correctly fill these blanks to read CSV, persist schema, checkpoint reliably, and enable automatic schema evolution? spark.readStream.format("cloudFiles").option("_____","csv").option("_____","dbfs:/etl/schema/foo/").load(srcPath).writeStream.option("_____","dbfs:/etl/checkpoints/foo/").option("_____","true").table(targetTable)", "processedQuery": "After each weekly stock reconciliation at AeroMart, a CSV report is dropped into cloud object storage. You must design a Databricks Auto Loader stream that ingests the files incrementally and automatically evolves the table as new columns appear. For the following snippet, which option keys correctly complete the blanks so the stream runs reliably and the table schema updates automatically? spark.readStream.format(\"cloudfiles\").option(\"_____\",\"csv\").option(\"_____\",\"dbfs:/etl/schema/aeromart/\").load(src_path).writeStream.option(\"_____\",\"dbfs:/etl/checkpoints/aeromart/\").option(\"_____\",\"true\").table(target_table)?", "type": "multiple-choice", "options": [ { "text": "format, checkpointLocation, schemaLocation, overwrite", "explanation": "Uses a generic format key instead of cloudFiles and overwrite does not handle schema evolution", "correct": false, "selected": false }, { "text": "autoloader.format, cloudFiles.schemaLocation, checkpointLocation, mergeSchema", "explanation": "There is no autoloader.format key; the correct key is cloudFiles.format", "correct": false, "selected": false }, { "text": "cloudfiles.format, checkpointlocation, cloudfiles.schemalocation, overwrite", "explanation": "Misorders schema vs. checkpoint keys and overwrite does not evolve schema", "correct": false, "selected": false }, { "text": "cloudFiles.format, cloudFiles.schemaLocation, checkpointLocation, mergeSchema", "explanation": "Correct Auto Loader and Delta keys to set format, store schema, checkpoint, and enable schema evolution", "correct": true, "selected": false }, { "text": "cloudFiles.format, cloudFiles.schemaLocation, checkpointLocation, ignoreChanges", "explanation": "ignoreChanges is unrelated to schema evolution and does not merge new columns", "correct": false, "selected": false } ], "answer": "cloudFiles.format, cloudFiles.schemaLocation, checkpointLocation, mergeSchema is correct because Auto Loader requires cloudFiles-specific options to declare the source format and persist the evolving schema, while Delta schema evolution is enabled by the write option mergeSchema. The second blank must be cloudFiles.schemaLocation to store the evolving inference schema, the third must be checkpointLocation for reliable streaming progress, and the final option mergeSchema must be true to allow new columns to be merged on write. The option format, checkpointLocation, schemaLocation, overwrite is wrong because format is not scoped under cloudFiles for Auto Loader and overwrite does not handle schema evolution. The option autoloader.format, cloudFiles.schemaLocation, checkpointLocation, mergeSchema is invalid since autoloader.format is not a recognized key; cloudFiles.format is required. The option cloudfiles.format, checkpointlocation, cloudfiles.schemalocation, overwrite misplaces the schema and checkpoint keys for the given blanks and still uses overwrite, which does not merge new columns. The option cloudFiles.format, cloudFiles.schemaLocation, checkpointLocation, ignoreChanges is incorrect because ignoreChanges is not for schema evolution; it does not add columns. Exam tips: Remember that Auto Loader keys are namespaced with cloudFiles. Always configure both cloudFiles.schemaLocation and checkpointLocation for reliability. For Delta schema evolution in streaming writes, set mergeSchema to true; do not confuse this with overwrite or unrelated options like ignoreChanges.
https://docs.databricks.com/en/ingestion/auto-loader/options.html
https://docs.databricks.com/en/delta/schema-evolution.html
", "answerCode": "4", "domain": "Development and Ingestion", "source": "assistant", "originalQuery": "At the end of the inventory process a file gets uploaded to the cloud object storage, you are asked to build a process to ingest data which of the following method can be used to ingest the data incrementally, schema of the file is expected to change overtime ingestion process should be able to handle these changes automatically. Below is the auto loader to command to load the data, fill in the blanks for successful execution of below code. spark.readStream .format(\u001ccloudfiles\u001c) .option(\u00001c_______\u00001c,\u001dcsv) .option(\u00001c_______\u00001c, \u0018dbfs:/location/checkpoint/\u0019) .load(data_source) .writeStream .option(\u00001c_______\u00001c,\u0019 dbfs:/location/checkpoint/\u0019) .option(\u00001c_______\u00001c, \u001ctrue\u001c) .table(table_name))", "originalOptions": "A. format, checkpointlocation, schemalocation, overwrite