This is a dedicated watch page for a single video.
A company uses anomaly detection to determine faulty servers within their fleet of servers. The company ingests real-time structured data containing features of their servers including -but not limited to- “Temperature” and “Usage” using Kinesis data streams. The data is sent to Kinesis Data Analytics for live anomaly detection which sends an alarm using kinesis data streams, lambda and SNS if a faulty server was found and at the same time to Kinesis Firehose to be dumped in S3 for storage. The company found that the system has been reliable enough to dispense employees looking after the servers on a daily basis and decided to only hire a small team to take care of a faulty server when it fails. The company lately installed a data center and connected the new servers’ sensors to the same architecture. One month later, they found out that 25 servers had failed without even noticing. A team was responsible for this investigation and the first thing was to check the data residing in S3 which held the server’s features. The data stored in the S3 bucket showed that those faulty servers sent information about their health, but no action was taken then. What was the most probable solution to this problem?