{ "query": "In a Databricks Auto Loader CSV pipeline, which read/write options should fill these blanks to persist schema evolution and maintain streaming state? spark.readStream.format("cloudfiles").option("cloudfiles.format", "csv").option("_______", "dbfs:/bronze/inv_check_v5/").load(src_path).writeStream.option("_______", "dbfs:/bronze/inv_ckpt_v5/").option("mergeSchema", "true").table(foo)", "processedQuery": "At a retail analytics startup named Apex Retail Labs, CSV inventory snapshots from a nightly run land in cloud object storage. You must configure an incremental Auto Loader pipeline in Databricks that automatically evolves the schema as new columns appear. Using the following stream code, which pair of options correctly completes the blanks so the job stores schema history and maintains streaming state? spark.readStream .format(\"cloudfiles\") .option(\"cloudfiles.format\", \"csv\") .option(\"_______\", \"dbfs:/bronze/inv_check_v3/\") .load(incoming_path) .writeStream .option(\"_______\", \"dbfs:/bronze/inv_ckpt_v3/\") .option(\"mergeSchema\", \"true\") .table(target_table)?", "type": "multiple-choice", "options": [ { "text": "checkpointlocation and schemalocation", "explanation": "This reverses responsibilities and uses a non-Auto Loader schema key on read, so schema evolution and state tracking will not work correctly", "correct": false, "selected": false }, { "text": "cloudfiles.inferColumnTypes and checkpointlocation", "explanation": "inferColumnTypes only influences type inference and does not persist evolving schemas to a location", "correct": false, "selected": false }, { "text": "cloudfiles.schemalocation and checkpointlocation", "explanation": "cloudfiles.schemalocation on read persists schema history for Auto Loader; checkpointlocation on write maintains streaming offsets and progress", "correct": true, "selected": false }, { "text": "cloudfiles.schemahints and checkpointlocation", "explanation": "schemahints provides field guidance but does not store or evolve schemas over time", "correct": false, "selected": false }, { "text": "cloudfiles.schemalocation and cloudfiles.checkpointlocation", "explanation": "There is no cloudfiles.checkpointlocation; checkpointlocation is a writeStream option, not an Auto Loader read option", "correct": false, "selected": false } ], "answer": "cloudfiles.schemalocation and checkpointlocation is correct because Auto Loader requires cloudfiles.schemalocation on the read side to persist the inferred schema and support evolution, while writeStream checkpointlocation tracks streaming state (offsets, commits, and progress) for reliable recovery. mergeSchema enables applying the evolved schema to the target table, but without schemalocation the evolution metadata will not be stored, and without checkpointlocation the stream cannot maintain state. The option checkpointlocation and schemalocation is wrong because it swaps the roles and uses a non-Auto Loader schema key on read, breaking schema persistence and state tracking. The option cloudfiles.inferColumnTypes and checkpointlocation is wrong because inferColumnTypes affects CSV type inference only and does not persist evolving schemas. The option cloudfiles.schemahints and checkpointlocation is wrong because schemahints provides hints for inference but does not manage schema evolution storage. The option cloudfiles.schemalocation and cloudfiles.checkpointlocation is wrong because there is no cloudfiles.checkpointlocation; checkpointlocation belongs to writeStream, not the Auto Loader read options. Exam tips: Watch for the cloudfiles prefix on read options for Auto Loader, especially cloudfiles.schemalocation for schema evolution. Remember checkpointlocation always lives on writeStream. Be wary of distractors like cloudfiles.checkpointlocation or missing the cloudfiles prefix, and distinguish schema guidance options (schemahints, inferColumnTypes) from schema persistence (schemalocation).
https://docs.databricks.com/en/ingestion/auto-loader/options.html
https://docs.databricks.com/en/ingestion/auto-loader/schema.html
https://docs.databricks.com/en/structured-streaming/streaming-checkpoints.html
", "answerCode": "3", "domain": "Data Processing & Transformations", "source": "generated", "originalQuery": "At the end of the inventory process a file gets uploaded to the cloud object storage, you are asked to build a process to ingest data which of the following method can be used to ingest the data incrementally, the schema of the file is expected to change overtime ingestion process should be able to handle these changes automatically. Below is the auto loader command to load the data, fill in the blanks for successful execution of the below code. spark.readStream .format(\u0001cloudfiles\u0001) .option(\u0001cloudfiles.format\u0001,\u0001csv) .option(\u0001_______\u0001, \u0001dbfs:/location/checkpoint/\u0001) .load(data_source) .writeStream .option(\u0001_______\u0001,\u0001 dbfs:/location/checkpoint/\u0001) .option(\u0001mergeSchema\u0001, \u0001true\u0001) .table(table_name))", "originalOptions": "A. checkpointlocation, cloudfiles.schemalocation