matduggan.com

I, like the entire internet, has enjoyed watching the journey of 37Signals from cloud to managed datacenter. For those unfamiliar, it's worth a read here. This has spawned endless debates about whether the cloud is worth it or should we all be buying hardware again, which is always fun. I enjoy having the same debates every 5 years just like every person who works in tech. However mentioned in their migration documentation was a reference to an internal tool called "MRSK" which they used to manage their infrastructure. You can find their site for it here.

When I read this, my immediate thought was "oh god no". I have complicated emotions about creating custom in-house tooling unless it directly benefits your customers (which can include internal customers) enough that the inevitable burden of maintenance over the years is worth it. It's often easier to yeet out software than it is to keep it running and design around its limitations, especially in the deployment space. My fear is often this software is the baby of one engineer, adopted by other teams, that engineer leaves and now the entire business is on a custom stack nobody can hire for.

All that said, 37Signals has open-sourced MRSK and I tried it out. It was better than expected (clearly someone has put love into it) and the underlying concepts work. However if the argument is that this is an alternative to a cloud provider, I would expect to hit fewer sharper edges. This reeks of internal tool made by a few passionate people who assumed nobody would run it any differently than they do. Currently its hard to recommend to anyone outside of maybe "single developers who work with no one else and don't mind running into all the sharp corners".

How it works

The process to run it is pretty simple. Set up a server wherever (I'll use digital ocean) and configure it to start with an SSH key. You need to select Ubuntu (which is a tiny bummer and would have preferred Debian but whatever) and then you are off to the races.

Then select a public SSH key you already have in the account.

Setting up MRSK

On your computer run gem install mrsk if you have ruby or alias mrsk='docker run --rm -it -v $HOME/.ssh:/root/.ssh -v /var/run/docker.sock:/var/run/docker.sock -v ${PWD}/:/workdir ghcr.io/mrsked/mrsk' if you want to do it as a Docker container. I did the second option, sticking that line in my .zshrc file.

Once installed you run mrsk init which generates all you need.

The following is the configuration file that is generated and gives you an idea of how this all works.

# Name of your application. Used to uniquely configure containers.
service: my-app

# Name of the container image.
image: user/my-app

# Deploy to these servers.
servers:
  - 192.168.0.1

# Credentials for your image host.
registry:
  # Specify the registry server, if you're not using Docker Hub
  # server: registry.digitalocean.com / ghcr.io / ...
  username: my-user

  # Always use an access token rather than real password when possible.
  password:
    - MRSK_REGISTRY_PASSWORD

# Inject ENV variables into containers (secrets come from .env).
# env:
#   clear:
#     DB_HOST: 192.168.0.2
#   secret:
#     - RAILS_MASTER_KEY

# Call a broadcast command on deploys.
# audit_broadcast_cmd:
#   bin/broadcast_to_bc

# Use a different ssh user than root
# ssh:
#   user: app

# Configure builder setup.
# builder:
#   args:
#     RUBY_VERSION: 3.2.0
#   secrets:
#     - GITHUB_TOKEN
#   remote:
#     arch: amd64
#     host: ssh://[email protected]

# Use accessory services (secrets come from .env).
# accessories:
#   db:
#     image: mysql:8.0
#     host: 192.168.0.2
#     port: 3306
#     env:
#       clear:
#         MYSQL_ROOT_HOST: '%'
#       secret:
#         - MYSQL_ROOT_PASSWORD
#     files:
#       - config/mysql/production.cnf:/etc/mysql/my.cnf
#       - db/production.sql.erb:/docker-entrypoint-initdb.d/setup.sql
#     directories:
#       - data:/var/lib/mysql
#   redis:
#     image: redis:7.0
#     host: 192.168.0.2
#     port: 6379
#     directories:
#       - data:/data

# Configure custom arguments for Traefik
# traefik:
#   args:
#     accesslog: true
#     accesslog.format: json

# Configure a custom healthcheck (default is /up on port 3000)
# healthcheck:
#   path: /healthz
#   port: 4000

Good to go?

Well not 100%. On first run I get this:

❯ mrsk deploy
Acquiring the deploy lock
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
  ERROR (RuntimeError): Can't use commit hash as version, no git repository found in /workdir

Apparently the directory you work in needs to be a git repo. Fine, easy fix. Then I got a perplexing SSH error.

❯ mrsk deploy
Acquiring the deploy lock
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
  INFO [39265e18] Running /usr/bin/env mkdir mrsk_lock && echo "TG9ja2VkIGJ5OiAgYXQgMjAyMy0wNS0wOVQwOToyNzoxNloKVmVyc2lvbjog
SEVBRApNZXNzYWdlOiBBdXRvbWF0aWMgZGVwbG95IGxvY2s=
" > mrsk_lock/details on 206.81.22.60
  ERROR (Net::SSH::AuthenticationFailed): Authentication failed for user [email protected]

❯ ssh [email protected]
Welcome to Ubuntu 22.10 (GNU/Linux 5.19.0-23-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Tue May  9 09:26:40 UTC 2023

  System load:  0.0               Users logged in:       0
  Usage of /:   6.7% of 24.06GB   IPv4 address for eth0: 206.81.22.60
  Memory usage: 19%               IPv4 address for eth0: 10.19.0.5
  Swap usage:   0%                IPv4 address for eth1: 10.114.0.2
  Processes:    98

0 updates can be applied immediately.

New release '23.04' available.
Run 'do-release-upgrade' to upgrade to it.


Last login: Tue May  9 09:26:41 2023 from 188.177.18.83
root@ubuntu-s-1vcpu-1gb-fra1-01:~#

So Ruby SSH Authentication failed even though I had the host configured in the SSH config and the standard SSH login worked without issue. Then a bad thought occurs to me. "Does it care....what the key is called? Nobody would make a tool that relies on SSH and assume it's id_rsa right?"

❯ mrsk deploy
Acquiring the deploy lock
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
  INFO [6c25e218] Running /usr/bin/env mkdir mrsk_lock && echo "TG9ja2VkIGJ5OiAgYXQgMjAyMy0wNS0wOVQwOTo1Mjo0NloKVmVyc2lvbjog
SEVBRApNZXNzYWdlOiBBdXRvbWF0aWMgZGVwbG95IGxvY2s=
" > mrsk_lock/details on 142.93.110.241
Enter passphrase for /root/.ssh/id_rsa:

Moving past the bad SSH

Then I get this error:

❯ mrsk deploy
Acquiring the deploy lock
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
  INFO [3b53d161] Running /usr/bin/env mkdir mrsk_lock && echo "TG9ja2VkIGJ5OiAgYXQgMjAyMy0wNS0wOVQwOTo1ODoyOVoKVmVyc2lvbjog
SEVBRApNZXNzYWdlOiBBdXRvbWF0aWMgZGVwbG95IGxvY2s=
" > mrsk_lock/details on 142.93.110.241
Enter passphrase for /root/.ssh/id_rsa:
  INFO [3b53d161] Finished in 6.094 seconds with exit status 0 (successful).
Log into image registry...
  INFO [2522df8b] Running docker login -u [REDACTED] -p [REDACTED] on localhost
  INFO [2522df8b] Finished in 1.209 seconds with exit status 0 (successful).
  INFO [2e872232] Running docker login -u [REDACTED] -p [REDACTED] on 142.93.110.241
  Finished all in 1.3 seconds
Releasing the deploy lock
  INFO [2264c2db] Running /usr/bin/env rm mrsk_lock/details && rm -r mrsk_lock on 142.93.110.241
  INFO [2264c2db] Finished in 0.064 seconds with exit status 0 (successful).
  ERROR (SSHKit::Command::Failed): docker exit status: 127
docker stdout: Nothing written
docker stderr: bash: line 1: docker: command not found

docker command not found? I thought MRSK set it up.

From the GitHub:

This will:

    Connect to the servers over SSH (using root by default, authenticated by your ssh key)
    Install Docker on any server that might be missing it (using apt-get): root access is needed via ssh for this.
    Log into the registry both locally and remotely
    Build the image using the standard Dockerfile in the root of the application.
    Push the image to the registry.
    Pull the image from the registry onto the servers.
    Ensure Traefik is running and accepting traffic on port 80.
    Ensure your app responds with 200 OK to GET /up.
    Start a new container with the version of the app that matches the current git version hash.
    Stop the old container running the previous version of the app.
    Prune unused images and stopped containers to ensure servers don't fill up.

However:

root@ubuntu-s-1vcpu-1gb-fra1-01:~# which docker
root@ubuntu-s-1vcpu-1gb-fra1-01:~#

Fine I guess I'll install Docker. Not feeling like this is saving a lot of time vs rsyncing a Docker Compose file over.

sudo apt update
sudo apt upgrade -y
sudo apt install -y docker.io curl git
sudo usermod -a -G docker ubuntu

Now we have Docker on the machine.

Did it work after that?

Yeah so my basic Flask app needed to have a new route added to it, but once I saw that you need to configure a route at /up and did that, worked fine. The traffic is successfully paused during deployment and rerun once the application is healthy again. Overall once I got it running it worked much as intended.

I also tried accessories, which is their term for necessary internal services like mysql. These are more like standard Docker compose commands but they're nice to be able to include. Again, it feels a little retro to say "please install mysql on the mysql box" and just hope that box doesn't go down, but it's totally serviceable. I didn't encounter anything interesting with the accessory testing.

Impressions

MRSK is an interesting tool. I think, if the community adopts it and irons out the edge cases, it'll be a good building-block technology for people not interested in running infrastructure. Comparing it to Kubernetes is madness, in the same way I wouldn't compare a go-kart I made in my garage to semi-truck.

That isn't to hate on MRSK, I think it's a good idea to solve for people with less complicated concerns. However part of the reasons more complicated tools are complicated is because they cover more edgecases and automate more failure scenarios. MRSK doesn't cover for those, so it gets to be more simple, but as you grow more of those concerns shift back to you.

It's the difference between managing 5 hosts with Ansible and 1500. 5 is easy and scales well, 1500 becomes a nightmare. MRSK in its current state should be seen as a bridge technology unless your team expends the effort to customize it for your workflow and add in the gaps in monitoring.

If it were me and I was starting a company today, I'd probably invest the effort in something like GKE Autopilot where GCP manages almost all the node elements and I worry exclusively about what my app is doing. But I have a background in k8s so I understand I'm an edge case. If you are looking to start a company or a project and want to keep it strictly cloud-agnostic, MRSK does do it.

What I would love to see added to MRSK to work-proof it more:

Adding support for 1Password/secret manager for the SSH key component so it isn't a key on your local machine
Adding support for multiple users with different keys on the box managed inside of some secret configuration so you can tell what user did what deployment and rotation of keys is part of deployment as needed (you can set a user per config file but that isn't really granular enough to scale)
Fixing the issue where the ssh_config doesn't seem to be respected
Providing an example project in the documentation of what exactly you need to hit msrk deploy and have a functional project up and running
Let folks know that having the configuration file inside of a git repo is a requirement
Ideally integrating some concept of autoscaling group into the configuration with some lookup concept back to the config file (which you can do with an template but would be nice to build in)
Do these servers update themselves? What happens if Docker crashes? Can I pass resource limits to the service and not just accessories? A lot of missing pieces there.
mrsk details is a great way to quickly see the health status, but you obviously need to do more to monitor whether your app is functional or not. That's more on you than the MRSK team.

Should you use MRSK today

If you are a single developer who runs a web application, ideally a rails application, and you are provisioning your servers one by one with Terraform or whatever, where static IP addresses (internal or external) are something you can get and don't change often, this is a good tool for you. I wouldn't recommend using the accessories functionality, I think you'll probably want to use a hosted database service if possible. However it did work, so I mean just consider how critical uptime is to you when you roll this out.

However if you are on a team, I don't know if I can recommend this at the current juncture. Certainly not run from a laptop. If you integrate this into a CI/CD system where the users don't have access to the SSH key and you can lock that down such that it stops being a problem, it's more workable. However as (seemingly) envisioned this tool doesn't really scale to multiple employees unless you have another system swapping the deployment root SSH key at a regular interval and distributing that to end users.

You also need to do a lot of work around upgrades, health monitoring of the actual VMs, writing some sort of replacement system if the VM dies and you need to put another one in its place. What is the feedback loop back to this static config file to populate IP addresses, automating rollbacks if something fails, monitoring deployments to ensure they're not left in a bad state, staggering the rollout (which MRSK does support). There's a lot here that comes in the box with conventional tooling that you need to write here.

If you want to use it today

Here's the minimum I would recommend.

I'd use something like the 1Password SSH agent so you can at least distribute keys across the servers without having to manually add them to each laptop: https://developer.1password.com/docs/ssh/agent/
I'd set up a bastion server (which is supported by MSRK and did work in my testing). This is a cheap box that means you don't need to allow your application and database servers to be exposed directly to the internet. There is a decent tutorial on how to make one here: https://zanderwork.com/blog/jump-host/
Ideally do this all from within a CI/CD stack so that you are running it from one central location and can more easily centralize the secret storage.