Anomaly Detection with AWS SageMaker, CloudTrail, and GuardDuty
AWS has published a fantastic workshop-style tutorial on leveraging SageMaker to develop a machine learning model that you can then use to detect anomalies in your AWS environment.
You build the model by training the IP Insights algorithm on CloudTrail logs. The idea is that once trained with a large enough data-set and armed with a threshold value, your model can classify any event as anomalous or not.
Because GuardDuty alerts can largely be thought of as being anomalous events, the workshop uses them to test the inference of your model.
While working my way through the tutorial, I noticed that the ingest pipelines for GuardDuty assumed that the alert data would be sitting in S3 as a JSON file.
Close enough, but not quite. Alerts are also sometimes also saved as jsonl.gzip objects.
To address the use-case, I ended up slightly modifying the GuardDuty ingest pipeline code.
The modified code is given below for anyone running into a similar issue:
Needless to say, massive thanks to our friends in Seattle for putting together this workshop and being kind enough to open-source the associated code.
The original repository and any associated copyrights and trademarks belong to Amazon Web Services. The license for the work can be found here.
— Arsh