Stateless Password Manager Usability

June 19, 2020 kichik5 Comments

Every once in a while, the concept of a simple password manager that needs no storage and no state comes back around. The details differ but the basic premise is always the same. Instead of saving your passwords and encrypting them with a key derived from a master password, these password managers generate passwords on the fly by hashing a master password with the website name. To get your password back, you simply need to remember your master password and the exact name you used for any specific website.

It’s an intriguing technical idea but it sacrifices security and usability. I won’t touch on the security issues here as there are far more qualified people than me that have already addressed this topic. Instead I will focus on the significant usability concerns that would send any user looking for an alternative within days if not hours.

There is no indication if you have used this password manager for a particular website. This may be considered a privacy feature, but can make migrating passwords from different managers more difficult.
Saving multiple passwords for a single website is cumbersome. Since your only input is the website name, you have to include the username in the website name if you want to save multiple passwords for a single website. But what happens if you didn’t plan ahead and saved your first password without the user name? You now have to change the password.
Some websites have weird password requirements. If the default password generation scheme doesn’t fit exactly, you’re out of luck. This can be solved by adding the password rules to the website name, but then you have to remember the rules and type them every time you need your password.
You can’t change a password without changing the website name. Periodical password changes are still required by a lot of websites and even strong passwords can leak by human error. This leaves the user having to remember more than website name but the password iteration. Is it github1, github2 or github53 now?
It is impossible to change your master password without changing all the passwords for all websites you’ve used with the password manager. The master password is directly used to create all those passwords and when it changes, all passwords must change too. To make matters worse, you don’t have a list of websites you’ve used with this password manager. This essentially means you have to remember and try multiple master passwords until you get the right one.
Any security update or bug fix that alters the password generation algorithm will require all passwords to be changed. Standard password managers can simply rebuild their database but since there is no database here and the master password directly affects everything, all passwords must be changed.

All these issues combined mean you have to change your passwords way more often than usual, have to plan ahead a lot, and be very consistent or risk losing your passwords. It requires far more attention than I would be willing to pay just to get a cool stateless solution. At the end of the day, this solution is just not user-friendly.

Sanitized RDS Snapshots

June 15, 2020June 15, 2020 kichikLeave a comment

Testing on production data is very useful to root out real-life bugs, take user behavior into account, and measure real performance of your system. But testing on production databases is dangerous. You don’t want the extra load and you don’t want the potential of data loss. So you make a copy of your production database and before you know it has been two years, the data is stale and the schema has been manually modified beyond recognition. This is why I created RDS-sanitized-snapshots. It periodically takes a snapshot, sanitizes it to remove data the developers shouldn’t access like credit card numbers, and then optionally share with other AWS accounts.

As usual it’s one CloudFormation template that can be deployed in one step. The template is generated using Python and troposphere.

There are many examples around the web that do parts of this. I wanted to create a complete solution that doesn’t require managing access keys and can be used without any servers. Since all of the operations take a long time and Lambda has a 15 minutes time limit, I decided it’s time to play with Step Functions. Step Functions let you create a state machine that is capable of executing Lambda functions and Fargate tasks for each step. Defining retry and wait logic is also built-in so there is no need for long running Lambda functions or EC2 instances. It even shows you the state in a nice graph.

To create a sanitized snapshot we need to:

Create a temporary copy of the production database so we don’t affect the actual data or the performance of the production system. We do this by taking a snapshot of the production database or finding the latest available snapshot and creating a temporary database from that.
Run configured SQL queries to sanitize the temporary database. This can delete passwords, remove PII, etc. Since database operations can take a long time, we can’t do this in Lambda due to its 15 minutes limit. So instead we create a Fargate task that connects to the temporary database and executes the queries.
Take a snapshot of the temporary database after it has been sanitized. Since this process is meant to be executed periodically, the snapshot name needs to be unique.
Share snapshot with QA and development accounts.
Clean-up temporary snapshots and databases.

If the database is encrypted we might also need to re-encrypt it with a key that can be shared with the other accounts. For that purpose we have a KMS key id option that adds another step of copying the snapshot over with a new key. There is no way to modify the key of an existing database or snapshot besides when copying the snapshot over to a new snapshot. Sharing the key is not covered by this solution.

The step function handles all the waiting by calling the Lambda handler to check if it’s ready. If it is ready, we can move on to the next step. If it’s not ready, it throws a specific NotReady exception and the step function retries in 60 seconds. The default retry parameters are maximum of 3 retries with each wait twice as long as the previous one. Since this is not a real failure but an expected one, we can increase the number of retries and remove the backoff logic that doubles the waiting time.

{
  "States": {
    "WaitForSnapshot": {
      "Type": "Task",
      "Resource": "${HandlerFunction.Arn}",
      "Parameters": {
        "state_name": "WaitForSnapshot",
      },
      "Next": "CreateTempDatabase",
      "Retry": [
        {
          "ErrorEquals": [
            "NotReady"
          ],
          "IntervalSeconds": 60,
          "MaxAttempts": 300,
          "BackoffRate": 1
        }
      ]
    }
  }
}

One complication with RDS is networking. Since databases are not accessed using AWS API (and RDS Data API only supports Aurora), the Fargate task needs to run in the same network as the temporary database. We can theoretically create the temporary database in the same VPC, subnet and security group as the production database. But that would require modifying the security group of the production database and can pose a potential security risk or data loss risk. It’s better to keep the temporary and production databases separate to avoid even the remote possibility of something going wrong by accident.

Another oddity I’ve learned from this is that Fargate tasks with no route to the internet can’t use Docker images from Docker Hub. I would have expected the image pulling to be separate from the execution of the task itself like it was with AWS Batch, but that’s not the case. This is why the Fargate task is created with a public facing IP. I tried using Amazon Linux Docker image from ECR, but even that requires an internet route or VPC Endpoint.

All the source code is available on GitHub. You can open an issue or comment here if you have any questions.

Lovage

April 11, 2020 kichik2 Comments

I have been playing with serverless solutions lately. It started with a Django project that was dealing with customer AWS credentials both in background and foreground tasks. I wanted to keep those tasks compartmentalized for security and I wanted them to scale easily. Celery is the common solution for this, but setting it up in my environment was not straightforward. This was as good excuse as any to AWS Lambda. I gave Serverless Framework a try because it was the most versatile framework I could find with proper Python support.

It worked well for a long time. But over time I noticed the following repeating issues.

It requires Node.js which complicated development and CI environments. This is the reason I originally created docker combo images of Python and Node.js.
Packaging Python dependencies is slow and error prone. Every deployment operation downloaded all the dependencies again, compressed them again, and uploaded them again. On Windows, Mac, and some Linux variants (if you have binary dependencies) it requires Docker and even after multiple PRs it was still slow and randomly broke every few releases.
There was no easy way to directly call Lambda functions after they were deployed. I had to deal with the AWS API, naming, arguments marshaling, and exception handling myself.

To solve these issues, I created Lovage. The pandemic gave me the time I needed to refine and release it.

No Node.js

Lovage is a stand-alone Python library. It has no external dependencies which should make it easy to use anywhere Python 3 can be used. It also does away with the Node.js choice of keeping intermediate files in the source folder. No huge node_modules folders, no code zip files in .serverless, and no dependency caches.

Lambda Layers

Instead of uploading all of the project’s dependencies every time as part of the source code zip, Lovage uploads it just once as a separate zip file and creates a Lambda Layer from it. Layers can be attached to any Lambda function and are meant to easily share code or data between different functions.

Since dependencies change much less frequently than the source code itself, Lovage uploads the dependencies much less frequently and thus saves compression and upload time. Dependencies are usually bigger than the source code so this makes a significant difference in deployment time.

But why stop there? Lovage gets rid of the need for Docker too. Docker is used to get an environment close enough to the execution environment of Lambda so that pip downloads the right dependencies, especially when binaries are involved. Why emulate when we can use the real thing?

Lovage creates a special Lambda function that uses pip to download your project’s dependencies, package them up, and upload them to S3 where they can be used as a layer. That function is then used as a custom resource in CloudFormation to automatically create the dependencies zip file and create a layer from it. Nothing happens locally and the upload is as fast possible given that it stays in one region of the AWS network.

Here is a stripped down CloudFormation template showing this method (full function code):

Resources:
  RequirementsLayer:
    Type: AWS::Lambda::LayerVersion
    Properties:
      Content:
        S3Bucket:
          Fn::Sub: ${RequirementsPackage.Bucket}
        S3Key:
          Fn::Sub: ${RequirementsPackage.Key}
  RequirementsPackage:
    Type: Custom::RequirementsLayerPackage
    Properties:
      Requirements:
        - requests
        - pytest
      ServiceToken: !Sub ${RequirementsPackager.Arn}
  RequirementsPackager:
    Type: AWS::Lambda::Function
    Properties:
      Runtime: python3.7
      Handler: index.handler
      Code:
        ZipFile: |
          import os
          import zipfile

          import boto3
          import cfnresponse

          def handler(event, context):
            if event["RequestType"] in ["Create", "Update"]:
              requirements = event["ResourceProperties"]["Requirements"]
              os.system(f"pip install -t /tmp/python --progress-bar off {requirements}"):
              with zipfile.ZipFile("/tmp/python.zip", "w") as z:
                for root, folders, files in os.walk("/tmp/python"):
                  for f in files:
                    local_path = os.path.join(root, f)
                    zip_path = os.path.relpath(local_path, "/tmp")
                    z.write(local_path, zip_path, zipfile.ZIP_DEFLATED)
              boto3.client("s3").upload_file("/tmp/python.zip", "lovage-bucket", "reqs.zip")
              cfnresponse.send(event, context, cfnresponse.SUCCESS, {"Bucket": "lovage-bucket, "Key": "reqs.zip"}, "reqs")

This is by far my favorite part of Lovage and why I really wanted to create this library in the first place. I think it’s much cleaner and faster than the current solutions. This is especially true considering almost every project I have uses boto3 and that alone is around 45MB uncompressed and 6MB compressed. Compressing and uploading it every single time makes fast iteration harder.

“RPC”

Most serverless solutions I’ve seen focus on HTTP APIs. Serverless Framework does have support for scheduling and events, but still no easy way to call the function yourself with some parameters. Lovage functions are defined in your code with a special decorator, just like Celery. You can then invoke them with any parameters and Lovage will take care of everything, including passing back any exceptions.

import lovage

app = lovage.Lovage()

@app.task
def hello(x):
  return f"hello {x} world!"

if __name__ == "__main__":
  print(hello.invoke("lovage"))
  hello.invoke_async("async")

The implementation is all very standard. Arguments are marshaled with pickle, encoded as base85, and stuffed in JSON. Same goes for return values and exceptions.

Summary

Lovage deploys Python functions to AWS Lambda that can be easily invoked just like any other function. It does away with Docker and Node.js. It saves you development time by offloading dependency installation to Lambda and stores dependencies in Lambda layers to reduce repetition.

I hope you find this library useful! If you want more details on the layer and custom resource to implement in other frameworks, let me know.

CloudWatch2S3

March 13, 2019March 16, 2019 kichikLeave a comment

AWS CloudWatch Logs is very useful service but it does have its limitations. Retaining logs for a long period of time can get expensive. Logs cannot be easily searched across multiple streams. Logs are hard to export and integration requires AWS specific code. Sometimes it makes more sense to store logs as text files in S3. That’s not always possible with some AWS services like Lambda that write logs directly to CloudWatch Logs.

One option to get around CloudWatch Logs limitations is exporting logs to S3 where data can be stored and processed longer-term for a cheaper price. Logs can be exported one-time or automatically as they come in. Setting up an automatic pipeline to export the logs is not a one-click process, but luckily Amazon detailed all the steps in a recent blog post titled Stream Amazon CloudWatch Logs to a Centralized Account for Audit and Analysis.

Amazon’s blog post has a lot of great information about the topic and the solution. In short, they create a Kinesis Stream writing to S3. CloudWatch Logs subscriptions to export logs to the new stream are created either manually with a script or in response to CloudTrail events about new log streams. This architecture is stable and scalable, but the implementation has a few drawbacks:

Writes compressed CloudWatch JSON files to S3.
Setup is still a little manual, requiring you to create a bucket, edit permissions, modify and upload source code, and run a script to initialize.
Requires CloudTrail.
Configuration requires editing source code.
Has a minor bug limiting initial subscription to 50 log streams.

That is why I created CloudWatch2S3 – a single CloudFormation template that sets everything up in one go while still leaving room for tweaking with parameters.

The architecture is mostly the same as Amazon’s but adds a subscription timer to remove the hard requirement on CloudTrail, and post-processing to optionally write raw log files to S3 instead of compressed CloudWatch JSON files.

architecture

Setup is simple. There is just one CloudFormation template and the default parameters should be good for most.

Download the CloudFormation template
Open AWS Console
Go to CloudFormation page
Click “Create stack“
Under “Specify template” choose “Upload a template file”, choose the file downloaded in step 1, and click “Next”
Under “Stack name” choose a name like “CloudWatch2S3”
If you have a high volume of logs, consider increasing Kinesis Shard Count
Review other parameters and click “Next”
Add tags if needed and click “Next”
Check “I acknowledge that AWS CloudFormation might create IAM resources” and click “Create stack”
Wait for the stack to finish
Go to “Outputs” tab and note the bucket where logs will be written
That’s it!

Another feature is the ability to export logs from multiple accounts to the same bucket. To set this up you need to set the AllowedAccounts parameter to a comma-separated list of AWS account identifiers allowed to export logs. Once the stack is created, go to the “Outputs” tab and copy the value of LogDestination. Then simply deploy the CloudWatch2S3-additional-account.template to the other accounts while setting LogDestination to the value previously copied.

For troubleshooting and more technical details, see https://github.com/CloudSnorkel/CloudWatch2S3/blob/master/README.md.

If you are exporting logs to S3 to save money, don’t forget to also change the retention settings in CloudWatch so old logs are automatically purged and your bill actually goes down.

Why Kubernetes?

April 9, 2018April 9, 2018 kichikLeave a comment

Kubernetes (or k8s) allows you to effortlessly deploy containers to a group of servers, automatically choosing servers with sufficient computing resources and connecting required network and storage resources. It lets developers run containers and allocate resources without too much hassle. It’s an open-source version of Borg which was considered one of Google’s competitive advantages.

There are many features and reasons to use Kubernetes, but for me the following stood out and made me a happy user.

It works on-premise and on most clouds like AWS, Google Cloud and Azure.
Automatic image garbage collection makes sure you never run out of disk space on your cluster nodes because you push hundreds of versions a day.
Kubernetes has automatic image garbage collection built-in. So useful and considerate. https://t.co/YgLBklGEd4
— Amir Szekely (@virtuajack) June 1, 2016

Scaling your container horizontally is as easy as:

kubectl scale deployment nginx-deployment --replicas=10

Rolling over your container to a new version one by one is as easy as:
```
kubectl set image deployment/nginx-deployment nginx=nginx:1.91
```
Easy to define health checks (liveness probes) automatically restart your container and readiness probes delay roll over update if the new version doesn’t work.
Automatic service discovery using built-in DNS server. Just define service and reference it by name from any container in your cluster. You can even define a service as a load balancer which will automatically create ELB on AWS, Load Balancer on GCE, etc.
Labels on everything can help organize resources. Services use label selectors to choose which containers get traffic directed at them. This allows, for example, to easily create blue/green deployments.
Secrets make it easier to handle keys and configuration separately from code and containers.
Resource monitoring is available out of the box and can be used for auto-scaling.
Horizontal auto scaling can be based on resource monitoring or custom metrics.
You can always use the API to create custom behaviors. For example, you can automatically create Route53 record sets based on a service label.

Is there anything I missed? What’s your favorite Kubernetes feature?

Python 3 is Awesome!

September 4, 2017August 31, 2017 kichikLeave a comment

python Today I will tell you about the massive success that is whypy3.com. With hundreds of users a day (on the best day when it reached page two of Hacker News and hundreds actually being 103), it has been a tremendous success in the lucrative Python code snippet market. By presenting small snippets of code displaying cool features of Python 3, I was able to single–handedly convert millions (1e^-6 millions to be exact) of Python 2 users to true Python 3 believers.

It all started when I saw a tweet about a cool Python 3 feature I haven’t seen before. This amazing feature automatically resolves any exception in your code by suppressing it. Who needs pesky exceptions anyway? Alternatively, you can use it to cleanly ignore expected exceptions instead of the usual except: pass.

from contextlib import suppress

with suppress(MyExc):
    code

# replaces

try:
    code
except MyExc:
    pass

There are obviously way better and bigger reasons to finally make that move to Python 3. But what if you can be lured in by some cool cheap tricks? And that’s exactly why I created whypy3.com. It’s a tool that us Python 3 lovers can use to try and slowly wear down on an insistent boss or colleague. It’s also a fun way for me to share all my favorite Python 3 features so I don’t forget them.

I was initially going to to do the usual static S3 website with CloudFront/CloudFlare. But I also wanted it to be easy for other people to contribute snippets. The obvious choice was GitHub, and since I’m already using GitHub, why not give GitHub Pages a try? Getting it up and running was a breeze. To make it easier to contribute without editing HTML, I decided to use the full blown Jekyll setup. I had to fight a little bit with Jekyll to get highlighting working, but overall it took no time to get a solid looking site up and running.

After posting to Hacker News, I even got a few pull requests for more snippets. To this day, I still get some Twitter interactions here and there. I don’t expect this to become a huge project with actual millions of users, but at the end of the day this was pretty fun, I learned some new technologies, and I probably convinced someone to at least start thinking about moving to Python 3.

Do you use Python 3? Please share your favorite feature!

Hypervisor Hunt

August 30, 2017August 31, 2017 kichikLeave a comment

After getting burnt by Hyper-V, I decided to go for the tried and true and installed VMware Player on Windows 10. I had to install Ubuntu again on the new virtual machine, but it was a breeze thanks to VMware’s automated installation process. Everything that was missing in Hyper-V was there. I was able to use my laptop’s real resolution, networking over Wi-Fi was done automatically, audio magically started working, and even performance was noticeably better.

After a few weeks of heavy usage, I started noticing some problems with VMware. Every once in a while the guest OS would freeze for about a second. I didn’t pay too much attention to it at first, but it slowly started to wear on me. I eventually realized it always happens when I use tab completion in the shell and the real cause was playing sounds. It’s still progress over Hyper-V’s inability to play any audio, but it was not exactly a pleasant experience.

The other, far more severe issue, was general lack of performance. It just didn’t feel like I was running Ubuntu on hardware, or even close to it. I experienced constant lags while typing, alt+tab took about a second to show up, compiling code was weirdly slow, video playback was unusable, and everything was just generally sluggish and unresponsive. Overall it was usable, but far from ideal.

Today I finally broke down and decided to give yet another hypervisor a shot. Next up came VirtualBox. I didn’t have high expectations, but VMware was starting to slow me down so I had to try something. Installation was even easier since VirtualBox can just use VMware images. Then came the pleasant surprise. Straight out of the box performance was noticeably better. Windows moved without lagging, alt+tab reaction was instantaneous, and sound playback just worked. Once I installed the guest additions and enabled video acceleration, video playback started functioning too. I still can’t play 4K videos, but at least my laptop doesn’t crawl to a halt on every video ad.

As a cherry on the top, VirtualBox was also able to properly set the resolution on the guest OS at boot time. In VMware, I had to leave and enter full screen once after login for the real resolution to stick. Switching inputs between guest and host in VirtualBox is also easier. It requires just one key (right ctrl) as opposed to two with VMware (left ctrl+alt).

I realize these results depend on many things like hardware, drivers, host/guest versions, etc. I bet I could also solve some of these issues if I put some research into it. But for running Ubuntu 17.04 desktop on my Windows 10 Dell XPS 13 with the least hassle, VirtualBox is the clear winner. Let me know if you had different experience or know how to make it run even smoother.

Things They Don’t Tell You About Hyper-V

August 3, 2017July 28, 2017 kichik1 Comment

I really wanted to like Hyper-V. It’s fully integrated into Windows and runs bare metal, so I was expecting stellar performance and a smooth experience. I was going to run a Linux box for some projects, get to work with Docker for Windows, and do it all with good power management, smooth transitions and without sacrificing performance.

And then reality hit.

Hyper-V doesn’t support resolutions higher than 1920×1080 with Linux guests. And even that is only adjustable by editing grub configuration which requires a reboot. The viewer allows zooming, but not in full screen mode. With a laptop resolution of 3200×1800, that leaves me with a half empty screen or a small window on the desktop.
Networking support is mostly manual, especially when Wi-Fi is involved. You have to drop into PowerShell to manually configure vSwitch with NAT. Need DHCP? Nope, can’t have it. Go install a third party application.
Audio is not supported for Linux guests. Just like with the resolution issue, you’re forced to use remote X server or xrdp. Both are a pain to setup and didn’t provide acceptable performance for me.
To top it all off, you can’t use any other virtualization solution when Hyper-V is enabled. Do you want both Docker for Windows and a normal Linux desktop VM experience? Too bad… VMware allows you to virtualize VT-x/EPT so you can run a hypervisor inside your guest. Hyper-V doesn’t.

It seems like Hyper-V is just not there yet. It might work well for Windows guests or Linux server guests, but for Linux desktop guest it’s just not enough.

False Positive Watch

July 31, 2017July 27, 2017 kichikLeave a comment

While debugging any issue that arises on Windows, my go-to trick is blaming the anti-virus or firewall. It almost always works. As important as these security solutions are, they can be so disruptive at times. For developers this usually comes in the form of a false positive. One day, out of the blue, a user emails you and blames you for trying to infect their computer with Virus.Generic.Not.Really.LOL.Sue.Me.1234775. This happened so many times with NSIS that someone created a false positive list on our wiki.

There are a lot of reasons why this happens and a lot of ways to lower the chances of it happening, but at the end of the day, chances are it’s going to happen. It even happened to Chrome and Windows itself.

So I created False Positive Watch. It’s a simple free service that periodically scans your files using Virus Total and sends you an email if any of your files are erroneously detected as malware. You can then notify the anti-virus vendor so they can fix the false positive before it affects too many of your customers.

I use it to get notifications about NSIS and other projects, but you can use it for your projects too for free. All you need is to supply your email address (for notifications) and upload the file (I delete it from my server after sending it to VirusTotal). In the future I’m going to add an option to just supply the hash instead of the entire file so you can use it with big files or avoid uploading the file if it’s too private.

Docker Combo Images

July 27, 2017October 15, 2022 kichik1 Comment

I’ve been working with Docker a lot for the past year and it’s pretty great. It especially shines when combined with Kubernetes. As the projects grew more and more complex, a common issue I kept encountering was running both Python and JavaScript code in the same container. Certain Django plugins require Node to run, Serverless requires both Python and Node, and sometimes you just need some Python tools on top of Node to build.

I usually ended up creating my own image containing both Python and Node with:

FROM python:3

RUN curl -sL https://deb.nodesource.com/setup_8.x | sudo -E bash -
RUN apt-get install -y nodejs

# ... rest of my stuff

There are two problems with this approach.

It’s slow. Installing Node takes a while and doing it for every non-cached build is time consuming.
You lose the Docker way of just pulling a nice prepared image. If Node changes their deployment method, the Dockerfile has to be updated. It’s much simpler to just docker pull node:8

The obvious solution is going to Docker Hub and looking for an image that already contains both. There are a bunch of those but they all look sketchy and very old. I don’t feel like I can trust them to have the latest security updates, or any updates at all. When a new version of Python comes out, I can’t trust those images to get new tags with the new version which means I’d have to go looking for a new image.

So I did what any sensible person would do. I created my own (obligatory link to XKCD #927 here). But instead of creating and pushing a one-off image, I used Travis.ci to update the images daily (update 2022: GitHub Actions). This was actually a pretty fun exercise that allowed me to learn more about Docker Python API, Docker Hub and Travis.ci. I tried to make it as easily extensible as possible so anyone can submit a PR for a new combo like Node and Ruby, or Python or Ruby, or Python and Java, etc.

The end result allows you to use:

docker run --rm combos/python_node:3_6 python3 -c "print('hello world')"
docker run --rm combos/python_node:3_6 node -e "console.log('hello world')"

You can rest assured you will always get the latest version of Python 3 and the latest version of Node 6. The image is updated daily. And since the build process is completely transparent on Travis.ci you should be able to trust that there is no funny business in the image.

Images: https://hub.docker.com/r/combos/
Source code: https://github.com/kichik/docker-combo
Build server: https://github.com/kichik/docker-combo/actions

kichik's blog

Helpful infrastructure software tips and tools