AWS CloudWatch Logs is very useful service but it does have its limitations. Retaining logs for a long period of time can get expensive. Logs cannot be easily searched across multiple streams. Logs are hard to export and integration requires AWS specific code. Sometimes it makes more sense to store logs as text files in S3. That’s not always possible with some AWS services like Lambda that write logs directly to CloudWatch Logs.
Amazon’s blog post has a lot of great information about the topic and the solution. In short, they create a Kinesis Stream writing to S3. CloudWatch Logs subscriptions to export logs to the new stream are created either manually with a script or in response to CloudTrail events about new log streams. This architecture is stable and scalable, but the implementation has a few drawbacks:
Writes compressed CloudWatch JSON files to S3.
Setup is still a little manual, requiring you to create a bucket, edit permissions, modify and upload source code, and run a script to initialize.
Configuration requires editing source code.
Has a minor bug limiting initial subscription to 50 log streams.
That is why I created CloudWatch2S3 – a single CloudFormation template that sets everything up in one go while still leaving room for tweaking with parameters.
The architecture is mostly the same as Amazon’s but adds a subscription timer to remove the hard requirement on CloudTrail, and post-processing to optionally write raw log files to S3 instead of compressed CloudWatch JSON files.
Setup is simple. There is just one CloudFormation template and the default parameters should be good for most.
Under “Specify template” choose “Upload a template file”, choose the file downloaded in step 1, and click “Next”
Under “Stack name” choose a name like “CloudWatch2S3”
If you have a high volume of logs, consider increasing Kinesis Shard Count
Review other parameters and click “Next”
Add tags if needed and click “Next”
Check “I acknowledge that AWS CloudFormation might create IAM resources” and click “Create stack”
Wait for the stack to finish
Go to “Outputs” tab and note the bucket where logs will be written
Another feature is the ability to export logs from multiple accounts to the same bucket. To set this up you need to set the AllowedAccounts parameter to a comma-separated list of AWS account identifiers allowed to export logs. Once the stack is created, go to the “Outputs” tab and copy the value of LogDestination. Then simply deploy the CloudWatch2S3-additional-account.template to the other accounts while setting LogDestination to the value previously copied.
Kubernetes (or k8s) allows you to effortlessly deploy containers to a group of servers, automatically choosing servers with sufficient computing resources and connecting required network and storage resources. It lets developers run containers and allocate resources without too much hassle. It’s an open-source version of Borg which was considered one of Google’s competitive advantages.
There are many features and reasons to use Kubernetes, but for me the following stood out and made me a happy user.
Rolling over your container to a new version one by one is as easy as:
kubectl set image deployment/nginx-deployment nginx=nginx:1.91
Easy to define health checks (liveness probes) automatically restart your container and readiness probes delay roll over update if the new version doesn’t work.
Automatic service discovery using built-in DNS server. Just define service and reference it by name from any container in your cluster. You can even define a service as a load balancer which will automatically create ELB on AWS, Load Balancer on GCE, etc.
Labels on everything can help organize resources. Services use label selectors to choose which containers get traffic directed at them. This allows, for example, to easily create blue/green deployments.
Secrets make it easier to handle keys and configuration separately from code and containers.
Today I will tell you about the massive success that is whypy3.com. With hundreds of users a day (on the best day when it reached page two of Hacker News and hundreds actually being 103), it has been a tremendous success in the lucrative Python code snippet market. By presenting small snippets of code displaying cool features of Python 3, I was able to single–handedly convert millions (1e-6 millions to be exact) of Python 2 users to true Python 3 believers.
It all started when I saw a tweet about a cool Python 3 feature I haven’t seen before. This amazing feature automatically resolves any exception in your code by suppressing it. Who needs pesky exceptions anyway? Alternatively, you can use it to cleanly ignore expected exceptions instead of the usual except: pass.
from contextlib import suppress
There are obviously way better and bigger reasons to finally make that move to Python 3. But what if you can be lured in by some cool cheap tricks? And that’s exactly why I created whypy3.com. It’s a tool that us Python 3 lovers can use to try and slowly wear down on an insistent boss or colleague. It’s also a fun way for me to share all my favorite Python 3 features so I don’t forget them.
I was initially going to to do the usual static S3 website with CloudFront/CloudFlare. But I also wanted it to be easy for other people to contribute snippets. The obvious choice was GitHub, and since I’m already using GitHub, why not give GitHub Pages a try? Getting it up and running was a breeze. To make it easier to contribute without editing HTML, I decided to use the full blown Jekyll setup. I had to fight a little bit with Jekyll to get highlighting working, but overall it took no time to get a solid looking site up and running.
After posting to Hacker News, I even got a few pull requests for more snippets. To this day, I still get some Twitter interactions here and there. I don’t expect this to become a huge project with actual millions of users, but at the end of the day this was pretty fun, I learned some new technologies, and I probably convinced someone to at least start thinking about moving to Python 3.
While debugging any issue that arises on Windows, my go-to trick is blaming the anti-virus or firewall. It almost always works. As important as these security solutions are, they can be so disruptive at times. For developers this usually comes in the form of a false positive. One day, out of the blue, a user emails you and blames you for trying to infect their computer with Virus.Generic.Not.Really.LOL.Sue.Me.1234775. This happened so many times with NSIS that someone created a false positive list on our wiki.
There are a lot of reasons why this happens and a lot of ways to lower the chances of it happening, but at the end of the day, chances are it’s going to happen. It even happened to Chrome and Windows itself.
So I created False Positive Watch. It’s a simple free service that periodically scans your files using Virus Total and sends you an email if any of your files are erroneously detected as malware. You can then notify the anti-virus vendor so they can fix the false positive before it affects too many of your customers.
I use it to get notifications about NSIS and other projects, but you can use it for your projects too for free. All you need is to supply your email address (for notifications) and upload the file (I delete it from my server after sending it to VirusTotal). In the future I’m going to add an option to just supply the hash instead of the entire file so you can use it with big files or avoid uploading the file if it’s too private.
I usually ended up creating my own image containing both Python and Node with:
RUN curl -sL https://deb.nodesource.com/setup_8.x | sudo -E bash -
RUN apt-get install -y nodejs
# ... rest of my stuff
There are two problems with this approach.
It’s slow. Installing Node takes a while and doing it for every non-cached build is time consuming.
You lose the Docker way of just pulling a nice prepared image. If Node changes their deployment method, the Dockerfile has to be updated. It’s much simpler to just docker pull node:8
The obvious solution is going to Docker Hub and looking for an image that already contains both. There are a bunch of those but they all look sketchy and very old. I don’t feel like I can trust them to have the latest security updates, or any updates at all. When a new version of Python comes out, I can’t trust those images to get new tags with the new version which means I’d have to go looking for a new image.
So I did what any sensible person would do. I created my own (obligatory link to XKCD #927 here). But instead of creating and pushing a one-off image, I used Travis.ci to update the images daily. This was actually a pretty fun exercise that allowed me to learn more about Docker Python API, Docker Hub and Travis.ci. I tried to make it as easily extensible as possible so anyone can submit a PR for a new combo like Node and Ruby, or Python or Ruby, or Python and Java, etc.
The end result allows you to use:
docker run --rm combos/python_node:3_6 python3 -c "print('hello world')"
docker run --rm combos/python_node:3_6 node -e "console.log('hello world')"
You can rest assured you will always get the latest version of Python 3 and the latest version of Node 6. The image is updated daily. And since the build process is completely transparent on Travis.ci you should be able to trust that there is no funny business in the image.
Django 1.10 added a new style of middleware with a different interface and a new setting called MIDDLWARE instead of MIDDLEWARE_CLASSES. Creating a class that supports both is easy enough with MiddlewareMixin, but that only works with Django 1.10 and above. What if you want to create middleware that can work with all versions of Django so it can be easily shared?
Writing a compatible middleware is not too hard. The trick is having a fallback for when the import fails on any earlier versions of Django. I couldn’t find a full example anywhere and it took me a few attempts to get it just right, so I thought I’d share my results to save you some time.
from django.core.exceptions import MiddlewareNotUsed
from django.shortcuts import redirect
from django.utils.deprecation import MiddlewareMixin
MiddlewareMixin = object
def __init__(self, *args, **kwargs):
raise MiddlewareNotUsed('DISABLE_MIDDLEWARE is set')
super(CompatibleMiddleware, self).__init__(*args, **kwargs)
def process_request(self, request):
if request.path == '/':
def process_response(self, request, response):
CompatibleMiddleware can now be used in both MIDDLWARE and MIDDLEWARE_CLASSES. It should also work with any version of Django so it’s easier to share.
I had a problem where HDFS would fill up really fast on my small test cluster. Using hdfs dfs -du I was able to track it down to the MapReduce staging directory under /user/root/.staging. For some reason, it wasn’t always deleting some old job directories. I wasn’t sure why this kept happening on multiple clusters, but I had to come up with a quick workaround. I created a small Python script that lists all staging directories and removes any of them not belonging to a currently running job. The script runs from cron and I can now use my cluster without worrying it’s going to run out of space.
This script is pretty slow and it’s probably possible to make it way faster with Snakebite or even some Java code. That being said, for daily or even hourly clean-up, this script is good enough.
Sometimes you get stuck with a broken or no dump at all. You know what you’re looking for but WinDBG just keeps refusing to load symbols as you continue to beg for mercy from the all knowing deities of Debugging Tools for Windows. You know what PDB you’re looking for but it just wouldn’t load. The only thing you do know is that you don’t want to go digging for that specific version of your product in the bug report and build a whole setup for it just so you can get the PDB. For those special times, some WinDBG coercion goes a long way.
To download the PDB create a comma separated manifest file with 3 columns for each row. The columns are the requested PDB name, its GUID plus age for a total of 33 characters and the number 1. Finally call symchk and pass the path to the manifest file with the /im command line switch. Use the /v command line switch to get the download path of the PDB.
To demonstrate I’ll use everyone’s favorite debugging sample process.
C:\>echo calc.pdb,E95BB5E08CE640A09C3DBF3DFA3ABCB42,1 > manifest
C:\>symchk /v /im manifest
SYMSRV: Get File Path: /download/symbols/calc.pdb/E95BB5E08CE640A09C3DBF3DFA3ABCB42/calc.pdb
DBGHELP: C:\ProgramData\dbg\sym\calc.pdb\E95BB5E08CE640A09C3DBF3DFA3ABCB42\calc.pdb - OK
To force load the PDB you need to update the PDB path, turn SYMOPT_LOAD_ANYTHING on, and use the .reload command with /f to force and /i to ignore any so called mismatches.
kd> .sympath C:\ProgramData\dbg\sym\calc.pdb\E95BB5E08CE640A09C3DBF3DFA3ABCB42
kd> .reload /f /i calc.exe=0x00400000
You should now have access to all the data in the PDB file and stack traces should start making sense.
An open source OS makes debugging applications so much easier. Instead of firing up IDA and going through opcodes, you can simply read the code and sometimes even find comments. However, searching through millions of lines of code can be a daunting task. Operation systems usually have a huge codebase and even the simple task of looking for one function can take a few good minutes. After reading that function, you usually want to search for functions it calls or functions that call it to better understand the flow. Those extra searches take time too. A good IDE would solve this issue but it requires downloading and indexing the massive source code first.
LXR was created for this exact reason. It allows hosting a fully indexed copy of the source code. It even makes it easy to publish an index of multiple versions of the source code. Want to compare a certain function between two versions of the Linux kernel? No problem. Want to know which functions use a certain function? Easy. LXR is awesome and fast.
Setting up LXR on your own, however, does take some time and effort. That is why I was happy to find AndroidXref.com while trying to hunt down a bug in one of my Android applications. It indexes both Android and patched Linux kernel sources for all major versions of Android. It is an invaluable resource every Android developer should know.
I originally had a question about this topic open on StackOverflow with AndroidXref as the accepted answer. It was recently deleted, probably because it didn’t have anything to do with C operator precedence. This is my AndroidXref.SEO++.