A company is using a fleet of Amazon EC2 instances to ingest data from on-premises data sources. The data is in JSON format and ingestion rates can be as high as 1 MB/s. When an EC2 instance is rebooted, the data in flight is lost. The companys data science team wants to query ingested data in near-real time. Which solution provides near-real-time data querying that is scalable with minimal data loss?