CloudWatch2S3

AWS CloudWatch Logs is very useful service but it does have its limitations. Retaining logs for a long period of time can get expensive. Logs cannot be easily searched across multiple streams. Logs are hard to export and integration requires AWS specific code. Sometimes it makes more sense to store logs as text files in S3. That’s not always possible with some AWS services like Lambda that write logs directly to CloudWatch Logs.

One option to get around CloudWatch Logs limitations is exporting logs to S3 where data can be stored and processed longer-term for a cheaper price. Logs can be exported one-time or automatically as they come in. Setting up an automatic pipeline to export the logs is not a one-click process, but luckily Amazon detailed all the steps in a recent blog post titled Stream Amazon CloudWatch Logs to a Centralized Account for Audit and Analysis.

Amazon’s blog post has a lot of great information about the topic and the solution. In short, they create a Kinesis Stream writing to S3. CloudWatch Logs subscriptions to export logs to the new stream are created either manually with a script or in response to CloudTrail events about new log streams. This architecture is stable and scalable, but the implementation has a few drawbacks:

  • Writes compressed CloudWatch JSON files to S3.
  • Setup is still a little manual, requiring you to create a bucket, edit permissions, modify and upload source code, and run a script to initialize.
  • Requires CloudTrail.
  • Configuration requires editing source code.
  • Has a minor bug limiting initial subscription to 50 log streams.

That is why I created CloudWatch2S3 – a single CloudFormation template that sets everything up in one go while still leaving room for tweaking with parameters.

The architecture is mostly the same as Amazon’s but adds a subscription timer to remove the hard requirement on CloudTrail, and post-processing to optionally write raw log files to S3 instead of compressed CloudWatch JSON files.

architecture

Setup is simple. There is just one CloudFormation template and the default parameters should be good for most.

  1. Download the CloudFormation template
  2. Open AWS Console
  3. Go to CloudFormation page
  4. Click “Create stack
  5. Under “Specify template” choose “Upload  a template file”, choose the file downloaded in step 1, and click “Next”
  6. Under “Stack name” choose a name like “CloudWatch2S3”
  7. If you have a high volume of logs, consider increasing Kinesis Shard Count
  8. Review other parameters and click “Next”
  9. Add tags if needed and click “Next”
  10. Check “I acknowledge that AWS CloudFormation might create IAM resources” and click “Create stack”
  11. Wait for the stack to finish
  12. Go to “Outputs” tab and note the bucket where logs will be written
  13. That’s it!

Another feature is the ability to export logs from multiple accounts to the same bucket. To set this up you need to set the AllowedAccounts parameter to a comma-separated list of AWS account identifiers allowed to export logs. Once the stack is created, go to the “Outputs” tab and copy the value of LogDestination. Then simply deploy the CloudWatch2S3-additional-account.template to the other accounts while setting LogDestination to the value previously copied.

For troubleshooting and more technical details, see https://github.com/CloudSnorkel/CloudWatch2S3/blob/master/README.md.

If you are exporting logs to S3 to save money, don’t forget to also change the retention settings in CloudWatch so old logs are automatically purged and your bill actually goes down.

Why Kubernetes?

Kubernetes logoKubernetes (or k8s) allows you to effortlessly deploy containers to a group of servers, automatically choosing servers with sufficient computing resources and connecting required network and storage resources. It lets developers run containers and allocate resources without too much hassle. It’s an open-source version of Borg which was considered one of Google’s competitive advantages.

There are many features and reasons to use Kubernetes, but for me the following stood out and made me a happy user.

  1. It works on-premise and on most clouds like AWS, Google Cloud and Azure.
  2. Automatic image garbage collection makes sure you never run out of disk space on your cluster nodes because you push hundreds of versions a day.
  3. Scaling your container horizontally is as easy as:
    kubectl scale deployment nginx-deployment --replicas=10
  4. Rolling over your container to a new version one by one is as easy as:
    kubectl set image deployment/nginx-deployment nginx=nginx:1.91
  5. Easy to define health checks (liveness probes) automatically restart your container and readiness probes delay roll over update if the new version doesn’t work.
  6. Automatic service discovery using built-in DNS server. Just define service and reference it by name from any container in your cluster. You can even define a service as a load balancer which will automatically create ELB on AWS, Load Balancer on GCE, etc.
  7. Labels on everything can help organize resources. Services use label selectors to choose which containers get traffic directed at them. This allows, for example, to easily create blue/green deployments.
  8. Secrets make it easier to handle keys and configuration separately from code and containers.
  9. Resource monitoring is available out of the box and can be used for auto-scaling.
  10. Horizontal auto scaling can be based on resource monitoring or custom metrics.
  11. You can always use the API to create custom behaviors. For example, you can automatically create Route53 record sets based on a service label.

Is there anything I missed? What’s your favorite Kubernetes feature?