Skip to content

Why are passwords a users problem?

Why are passwords a users problem?

In the light of GoTo admitting their breach was worse than initially reported I have found myself both discussing passwords with people more than ever before and directing a metric ton of business towards 1Password. However, it has raised an obvious question for me: why are users involved with passwords at all? Why is this still something I have to talk to my grandparents about?

Let us discuss your password storage system again

All the major browsers have password managers that sync across devices. These stores are (as far as I can tell) reasonably secure. Access to the device would reveal them, but excluding physical access to an unlocked computer they seem fine. There is a common API, the Credentials Management API docs here that allow for a website to query the password store inside of the browser for the login, even allowing for Federated logins or different or same logins for subdomains as part of the spec. This makes for a truly effortless login experience for users without needing them to do anything. These browsers already have syncing with a master password concept across mobile/desktop and can generate passwords upon request.

If the browser can: generate a password, store a password, sync a password and return the password when asked, why am I telling people to download another tool that does the exact same thing? A tool made by people who didn't make the browser and most of whom haven't been independently vetted by anybody.

Surely it can't be that easy

So when doing some searching about the Credentials Management API, one of the sites you run across a lot is this demo site: https://credential-management-sample.appspot.com/. This allows you to register an account, logout and then see the login auto-filled by the browser when you get back to it. The concept seems to work as expected on Chrome.

Bummer

Alright so it doesn't work on Firefox and Safari but honestly, neither do 10% of the websites I go to. 88% of all the users in the world still isn't bad, so I'm not willing to throw the idea out entirely.

Diving into how the process works, again, it seems pretty straight forward.

var signin = document.querySelector('#signin');
signin.addEventListener('click', (e) => {
  if (window.PasswordCredential || window.FederatedCredential) {
    navigator.credentials
      .get({
        password: true,
        federated: {
          providers: ['https://accounts.google.com'],
        },
        mediation: 'optional',
      })
      .then((c) => {
        if (c) {
          switch (c.type) {
            case 'password':
              return sendRequest(c);
              break;
            case 'federated':
              return gSignIn(c);
              break;
          }
        } else {
          return Promise.resolve();
        }
      })
      .then((profile) => {
        if (profile) {
          updateUI(profile);
        } else {
          location.href = '/signin';
        }
      })
      .catch((error) => {
        location.href = '/signin';
      });
  }
});

If the user has a login then get it. It supports federated logins or passwords and falls back to redirecting to the sign-in page if you cannot locate a login. I tried the samples available here and they seemed to mostly be plug and play. In fact in my testing this seemed to be a far superior user experience to using traditional password managers with browser extensions.

Also remember that even for browsers that don't support it, I'm just falling back to the normal password storage system. So for websites that support it, the experience is magical on Chrome and the same as using a password manager with every other browser. It doesn't cost anything, it isn't complicated and it is a better experience.

I know someone out there is gearing up 

Are Password Managers Better?

One common theme when you search for this stuff is an often repeated opinion that browser password managers are trash and password managers are better. Looking more into how they work, this seems to come with some pretty big asterisks. Most password managers seem to use some JavaScript from their CDN to insert their interface into the form values.

This is a little nerve-racking because websites could interact with that element but also the communication between the password manager is a potential source of problems. Communication to a local HTTP target seems to make sense, but this can be the source of problems (and has been in the past). Example Example Example

So at a minimum you'd need the tool you chose to meet these requirements to reach or exceed the same level of security as the browser built-in manager.

  • The add-on runs in a sandboxed background page
  • Communication between the password manager and the page isn't happening in the DOM
  • Any password element would need to be an iframe or something else that stops the site from interacting with the content
  • CSP is set up flawlessly
  • Communication between the extension and anything outside of the extension is secure and involves some verification step
  • Code validation in pretty much every direction: is the browser non-modified, is the server process valid, is the extension good, etc

This isn't even getting to the actual meat of the encryption on the secrets or the security of the syncing. We're just talking about whether the thing that interacts with the secrets and jams them into the page.

To make a product that does this and does it well and consistently across releases isn't an easy problem. Monitoring for regressions and breaches would be critical, disclosures would be super important to end users and you would need to get your stack vetted by an outside firm kind of a lot. I trust the developers of my web browser in part because I have to and because, over the years, Mozilla has been pretty good to me. The entire browser stack is under constant attack because it has effectively become the new OS we all run.

Well these companies are really good at all that

Are they? Frankly in my research I wasn't really blown away with the amount of technical auditing most of these companies seem to do or produce any evidence of. The only exception to this was 1Password and Bitwarden.

1Password

I love that they have a whitepaper available here but nobody finished writing it.

No rush I guess

However they do have actual independent audits of their software. Recent audits done by reputable firms and available for review. You can see all these here. For the record this should be on every single one of these companies website for public review.

Keeper Password Manager

I found what they call a whitepaper but it's 17 pages and basically says "We're ISO certified". That's great I guess, but not the level of detail I would expect at all. You can read it here. This doesn't mean you are doing things correctly, just that you have generated enough documentation to get ISO certified.

Not only do we implement the most secure levels of encryption, we also adhere to very strict internal
practices that are continually audited by third parties to help ensure that we continue to develop secure software and
provide the world’s most secure cybersecurity platform.
Great, can I read these audits?

Dropbox Password

Nothing seems to exist discussing this products technical merits at all. I don't know how it works. I can look into it more if someone can point me to something else, but it seems to be an encrypted file that lives in your Dropbox folder secured with a key generated by Dropbox and returned to you upon enrollment.  

Dashlane

I found a great security assessment from 2016 that seemed to suggest the service was doing pretty well. You can get that here. I wasn't able to find one more recent. Reading their whitepaper here they actually do go into a lot of detail and explain more about how the service works, which is great and I commend them for that.

It's not sufficient though. I'm glad you understand how the process should work but I have no idea if that is still happening or if this is more of an aspirational document. I often understand the ideal way software should work but the real skill of the thing is getting it to work that way.

Bitwarden

They absolutely kill it in this department. Everything about them is out in the open like it should be. However sometimes they discover issues, which is good for the project but underscores what I was talking about above. It is hard to write a service that attempts to handle your most sensitive data and inject that data into random websites.

These products introduce a lot of complexity and failure points into the secret management game. All of them with the exception of 1Password seem to really bet the farm on the solo Master/Primary Password concept. This is great if your user picks a good password, but statistically this idea seems super flawed to me. This is a password they're going to enter all the time, won't they pick a crap one? Even with 100,000 iterations on that password it's pretty dangerous.

Plus if you are going to rely on the concept of "well if the Master/Primary Password is good then the account is secure" then we're certainly not justifying the extra work here. It's as good as the Firefox password manager and not as good as the Safari password manager. Download Firefox and set a good Primary Password.

Can we be honest with each other?

I want you to go to this website and I want you to type in your parents password. You know the one, the one they use for everything? The one that's been shouted through the halls and texted/emailed/written on so many post-it notes that any concept of security has long since left the building.

That's the password they're gonna use to secure the vault. They shouldn't, but they're gonna. Now I want you to continue on this trust exercise with me. If someone got access to read/write in a random cross-section of your coworkers computers, are passwords really the thing that is gonna destroy your life? Not an errant PDF, Excel document of customer data or unsecured AWS API key?

I get it, security stuff is fun to read. "How many super computers will it take to break in" feels very sci-fi.

Well but my family/coworkers/lovers all share passwords

I'm not saying there is zero value to a product where there is a concept of sharing and organizing passwords with a nice UI, but there's also no default universal way of doing it. If all the password managers made a spec that they held to that allowed for secure bidirectional sharing between these services, I'd say "yeah the cost/benefit is likely worth it". However chances are if we're in a rush and sharing passwords, I'm going to send you the password through an insecure system anyway.

Plus the concept of sharing introduces ANOTHER huge layer of possible problems. Permission mistakes, associating the secret with the wrong user, the user copying the secret into their personal vault and not getting updated when the shared secret is updated are all weird issues I've seen at workplaces. To add insult to injury, the requirements of getting someone added to a shared folder they need is often so time-consuming people will just bypass the process and copy/paste the secret anyway.

Also let's be honest among ourselves here. Creating one shared login for a bunch of employees to use was always a bad idea. We all knew it was a bad idea and you knew it while you were doing it. Somewhere in the back of your mind you were like "boy it'll suck if someone decides to quit and steals these".

I think we can all agree on this

I know, "users will do it anyway". Sure but you don't have to make it institutional policy. The argument of "well users are gonna share passwords so we should pay a service to allow them to do it easier" doesn't make a lot of sense. I also know sometimes you can't avoid it, but for those values, if they're that sensitive, it might not make sense to share them across all employees in a department. Might make more sense to set them up with a local tool like pass.

Browsers don't prompt the user to make a Master/Primary Password

That is true and perhaps the biggest point in the category of "you should use a password manager". The way the different browsers do this is weird. Chrome effectively uses the users login as the key, on Windows calling a Windows API that encrypts the sqlite database and decrypts it when the user logs in. On the Mac there is a login keychain entry of a random value that seems to serve the same function. If the user is logged in, the sqlite database is accessible. If they aren't, it isn't.

On Firefox there is a Primary Password that you can set that effectively works like most of the password managers that we saw. Unlike password managers this isn't synced, so you would set a different primary for every Firefox device. That means the Firefox account is still controlling what syncs to what, this just ensures that a user who takes the database of username and passwords would need this key to decrypt it.

So for Chrome if your user is logged in, the entire password database is available. On macOS they can get access to the decryption key through the login keychain and on Firefox the value is encrypted in the file but for additional security and disallowing for random users to interact with it through the browser. There is a great write-up of how local browser password stores work here.

There are more steps than Chrome but allows for a Master Password

Is that a sufficient level of security?

Honestly? Yeah I think so. The browser prompts the user to generate a secure value, stores the value, syncs the value securely and then, for 88% of the users on the web, the site can use a well-documented API to automatically fill in that value in the future. I'd love to see Chrome add a few more security levels, some concept of Primary Password so that I can lock the local password storage to something that isn't just me being logged into my user account.

However we're also rapidly reaching a point where the common wisdom is that everything important needs 2FA. So if we're already going to treat the password as a tiered approach, I think a pretty good argument could be made that it is safer for a user to store their passwords in the browser store (understanding that the password was always something that a malicious actor with access to their user account could grab through keyloggers/clipboard theft/etc) and having a 2FA on a phone as compared to what a lot of people do, which is keep the 2FA and the password inside of the same third-party password manager.

TOTPs are just password x2

When you scan that QR code, you are getting back a string that looks something like this:

otpauth://totp/example:[email protected]?algorithm=SHA1&digits=6&issuer=mywebsite&period=30&secret=CelwNEjn3l7SWIW5SCJT

This combined with time gets you your 6 digit code. The value of this approach is twofold: does the user posses another source of authentication and now there is a secret which we know is randomly generated that effectively serves as a second password. This secret isn't exposed to normal users as a string, so we don't need to worry about that.

If I have the secret value I can make the same code. If we remove the second device component like we do with password managers, what we're saying is "TOTP is just another random password". If we had a truly random password to begin with, I'm not adding much to the security model by adding in the 2FA but sticking it in the same place.

What if they break into my phone

On iOS even without a Primary password set on Firefox, it prompts for FaceID authentication before allowing someone access to the list of stored passwords. So that's already a pretty intense level of security. Add in a Primary password and we've reached the same level of security as 1Password. Chrome is the same story.

It's the same level of security with Android. Attempt to open the saved passwords, get a PIN or biometric check depending on the phone. That's pretty good! Extra worried about it? Use a TOTP app that requires biometrics before they reveal that code. Here is one for iOS

Even if someone steals your phone and attempts to break into your accounts, there are some non-trivial security measures in their way with the standard browser password storage combined with a free TOTP app that checks identity.

I use my password manager for more than passwords

Sure and so do I, but that doesn't really matter to my point. The common wisdom that all users would benefit from the use of a dedicated password manager is iffy at best. We've now seen a commonly recommended one become so catastrophically breached that anything stored there is now needs to be considered leaked. This isn't the first credential leak or the 10th or the 100th, but now there is just a constant never-ending parade of password leaks and cracks.

So if that is true and a single password cannot ever truly serve as the single step of authentication for important resources, then we're always going to be relying on adding on another factor. Therefore the value a normal user gets out of a password manager vs the browser they're already using is minimal. With passkeys and Credentials Management API the era of exposing the user to the actual values being used in the authentication step is coming to a close anyway. Keys synced by the browser vendor will become the default authentication step for users.

In the light of that reality, it doesn't really make sense to bother users with the additional work and hassle of running a new program to manage secrets.

Summary of my rant

  • Normal users don't need to worry about password managers and would be better served by using the passwords the browser generates and investing that effort into adding 2FA using a code app on their phone or a YubiKey.
  • In the face of new APIs and standards, the process of attempting to manage secrets with an external manager will becoming exceedingly challenging. It is going to be much much easier to pick one browser and commit to it everywhere vs attempting to use a tool to inject all these secrets
  • With the frequency of breaches we've already accepted that passwords at, at best, part of a complete auth story. The best solution we have right now is "2 passwords".
  • Many of the tools users rely on to manage all their secrets aren't frequently audited or if they are, any security assessment of their stack isn't being published.
  • For more technical users looking to store a lot of secrets for work, using something like pass will likely fulfill that need with a smaller less complicated and error-prone technical implementation. It does less so less stuff can fail.
  • If you are going to use a password manager, there are only two options: 1Password and Bitwarden. 1Password is the only one that doesn't rely exclusively on the user-supplied password, so if you are dealing with very important secrets this is the right option.
  • It is better to tell users "shared credentials are terrible and please only use them if you absolutely have no choice at all" than to set up a giant business-wide tool of shared credentials which are never rotated.

My hope is with passkeys and the Credentials Management API this isn't a forever problem. Users won't be able to export private keys, so nobody is going to be sharing accounts. The Credentials Management UI and flow is so easy for developers and users that it becomes the obvious choice for any new service. My suspicion is we'll still be telling users to set up 2FA well after its practical lifespan has ended, but all we're doing is replicating the same flow as the browser password storage.

Like it or not you are gonna start to rely on the browser password manager a lot soon, so might as well get started now.

Wanna send me angry messages? What else is the internet good for! https://c.im/@matdevdug


Upgrading Kubernetes - A Practical Guide

Upgrading Kubernetes - A Practical Guide

One common question I see on Mastodon and Reddit is "I've inherited a cluster, how do I safely upgrade it". It's surprising that this still isn't a better understood process given the widespread adoption of k8s, but I've had to take over legacy clusters a few times and figured I would write up some of the tips and tricks I've found over the years to make the process easier.

A very common theme to these questions is "the version of Kubernetes is very old, what do I do". Often this question is asked with shame, but don't feel bad. K8s is better at the long-term maintenance story as compared to a few years ago, but is still a massive amount of work to keep upgraded and patched. Organizations start to fall behind almost immediately and teams are hesitant to touch a working cluster to run the upgrades.

NOTE: A lot of this doesn't apply if you are using hosted Kubernetes. In that case, the upgrade process is documented through the provider and is quite a bit less complicated.

How often do I need to upgrade Kubernetes?

This is something people new to Kubernetes seem to miss a lot, so I figured I would touch on it. Unlike a lot of legacy infrastructure projects, k8s moves very quickly in terms of versions. Upgrading can't be treated like switching to a new Linux distro LTS release, you need to plan to do it all the time.

To be fair to the Kubernetes team they've done a lot to help make this process less horrible. They have a support policy of N-2, meaning that the 3 most recent minor versions receive security and bug fixes. So you have time to get a cluster stood up and start the process of planning upgrades, but it needs to be in your initial cluster design document. You cannot wait until you are almost EOL to start thinking "how are we going to upgrade". Every release gets patched for 14 months, which seems like a lot but chances are you aren't going to be installing the absolute latest release.

Current support timeline

So the answer to "how often do you need to be rolling out upgrades to Kubernetes" is often. They are targeting 3 releases a year, down from the previous 4 releases a year. You can read the projects release goals here. However in order to vet k8s releases for your org, you'll likely need to manage several different versions at the same time in different environments. I typically try to let a minor version "bake" for at least 2 weeks in a dev environment and same for stage/sandbox whatever you call the next step. Prod version upgrades should ideally have a month of good data behind them suggesting the org won't run into problems.

My staggered layout

  1. Dev cluster should be as close to bleeding edge as possible. A lot of this has to do with establishing SLAs for the dev environment, but the internal communication should look something like "we upgrade dev often during such and such a time and rely on it to surface early problems". My experience is you'll often hit some sort of serious issue almost immediately when you try to do this, which is good. You have time to fix it and know the maximum version you can safely upgrade to as of the day of testing.
  2. Staging is typically a minor release behind dev. "Doesn't this mean you can get into a situation where you have incompatible YAMLs?" It can but it is common practice at this point to use per-environment YAMLs. Typically folks are much more cost-aware in dev environments and so some of the resource requests/limits are going to change. If you are looking to implement per-environment configuration check out Kustomize.
  3. Production I try to keep as close to staging as possible. I want to keep my developers lives as easy as possible, so I don't want to split the versions endlessly. My experience with Kubernetes patch releases has been they're pretty conservative with changes and I rarely encounter problems. My release cadence for patches on the same minor version is two weeks in staging and then out to production.
  4. IMPORTANT. Don't upgrade the minor version until it hits patch .2 AT LEAST. What does this mean?

Right now the latest version of Kubernetes is 1.26.0. I don't consider this release ready for a dev release until it hits 1.26.2. Then I start the timer on rolling from dev -> stage -> production. By the time I get the dev upgrade done and roll to staging, we're likely at the .3 release (depending on the time of year).

That's too slow. Maybe, but I've been burned quite a few times in the past by jumping too early. It's nearly impossible for the k8s team to possibly account for every use-case and guard against every regression and by the time we hit .2, there tends to be wide enough testing that most issues have been discovered. A lot of people wait until .5, which is very slow (but also the safest path).

In practice this workflow looks like this:

  • Put in the calendar when releases reach EOL which can be found here.
  • Keep track of the upcoming releases and put them in the calendar as well. You can see that whole list in their repo here.
  • You also need to do this with patch releases, which typically come out monthly.
  • If you prefer to keep track of this in RSS, good news! If you add .atom to the end of the release URL, you can add it to a reader. Example: https://github.com/kubernetes/kubernetes/releases.atom. This makes it pretty easy to keep a list of all releases. You can also just subscribe in GitHub but I find the RSS method to be a bit easier (plus super simple to script, which I'll publish later).
  • As new releases come out, roll latest to dev once it hits .2.  I typically do this as a new cluster, leaving the old cluster there in case of serious problems. Then I'll cut over deployments to new cluster and monitor for issues. In case of massive problems, switch back to old cluster and start the documentation process for what went wrong.
  • When I bump the dev environment, I then circle around and bump the stage environment to one minor release below that. I don't typically do a new cluster for stage (although you certainly can). There's a lot of debate in the k8s community over "should you upgrade existing vs make new". I do it for dev because I would rather upgrade often with fewer checks and have the option to fall back.
  • Finally we bump prod. This I rarely will make a new cluster. This is a matter of personal choice and there are good arguments for starting fresh often, but I like to maintain the history in etcd and I find with proper planning a rolling upgrade is safe.

This feels like a giant pain in the ass.

I know. Thankfully cloud providers tend to maintain their own versions which buy you a lot more time, which is typically how people are going to be using it. But I know a lot of people like to run their own clusters end to end or just need to for various reasons. It is however a pain to do this all the time.

Is there an LTS version?

So there was a Kubernetes working group set up to discuss this and their conclusion was it didn't make sense to do. I don't agree with this assessment but it has been discussed.

My dream for Kubernetes would be to add a 2 year LTS version and say "at the end of two years there isn't a path to upgrade". I make a new cluster with the LTS version, push new patches as they come out and then at the end of two years know I need to make a new cluster with the new LTS version. Maybe the community comes up with some happy path to upgrade, but logistically it would be easier to plan a new cluster every 2 years vs a somewhat constant pace of pushing out and testing upgrades.

How do I upgrade Kubernetes?

  1. See if you can upgrade safely against API paths. I use Pluto. This will check to see if you are calling deprecated or removed API paths in your configuration or helm charts. Run Pluto against local files with: pluto detect-files -d. You can also check Helm with: pluto detect-helm -owide. Adding all of this to CI is also pretty trivial and something I recommend for people managing many clusters.

2. Check your Helm releases for upgrades. Since typically things like the CNI and other dependencies like CoreDNS are installed with Helm, this is often the fastest way to make sure you are running the latest version (check patch notes to ensure they support the version you are targeting). I use Nova for this.

3. Get a snapshot of etcd. You'll want to make sure you have a copy of the data in your production cluster in the case of a loss of all master nodes. You should be doing this anyway.

3. Start the upgrade process. The steps to do this are outlined here.

If you are using managed Kubernetes

This process is much easier. Follow 1 + 2, set a pod disruption budget to allow for node upgrades and then follow the upgrade steps of your managed provider.

I messed up and waited too long, what do I do?

Don't feel bad, it happens ALL the time. Kubernetes is often set up by a team that is passionate about it, then that team is disbanded and maintenance becomes a secondary concern. Folks who inherit working clusters are (understandably) hesitant to break something that is working.

With k8s you need to go from minor -> minor in order, not jumping releases. So you need to basically (slowly) bump versions as you go. If you don't want to do that, your other option is to make a new cluster and migrate to it. I find for solo operators or small teams the upgrade path is typically easier but more time consuming.

The big things you need to anticipate are as follows:

  • Ingress. You need to really understand how traffic is coming into the cluster and through what systems.
  • Service mesh. Are you using one, what does it do and what version is it set at? Istio can be a BEAR to upgrade, so if you can switch to Linkerd you'll likely be much happier in the long term. However understanding what controls access to what namespaces and pods is critical to a happy upgrade.
  • CSI drivers. Do you have them, do they need to be upgraded, what are they doing?
  • CNI. Which one are you using, is it still supported, what is involved in upgrading it.
  • Certificates. By default they expire after a year. You get fresh ones with every upgrade, but you can also trigger a manual refresh whenever with kubeadm certs renew. If you are running an old cluster PLEASE check the expiration dates of your client certificates with: kubeadm certs check-expiration now.
  • Do you have stateful deployments? Are they storing something, where are they storing it and how do you manage them? This would be databases, redis, message queues, applications that hold state. These are often the hardest to move or interact with during an upgrade. You can review the options for moving those here. The biggest thing is to set the pod disruption budget so that there is some minimum available during the upgrade process as shown here.
  • Are you upgrading etcd? Etcd supports restoring from snapshots that are taken from an etcd process of the major.minor version, so be aware if you are going to be jumping more than a patch. Restoring might not be an option.

Otherwise follow the steps above along with the official guide and you should be ok. The good news is once you bite the bullet and do it once up to a current version, maintenance is easier. The bad news is the initial EOL -> Supported path is soul-sucking and incredibly nerve-racking. I'm sorry.

I'm running a version older than 1.21 (January 2023)

So you need to do all the steps shown above to check that you can upgrade, but my guiding rule is if the version is more than 2 EOL versions ago, it's often easier to make a new cluster. You CAN still upgrade, but typically this means nodes have been running for a long time and are likely due for OS upgrades anyway. You'll likely have a more positive experience standing up a new cluster and slowly migrating over.

You'll start with fresh certificates, helm charts, node OS versions and everything else. Switching over at the load balancer level shouldn't be too bad and it can be a good opportunity to review permissions and access controls to ensure you are following the best procedures.

I hate that advice

I know. It's not my favorite thing to tell people. I'm sorry. I don't make the rules.

Note on Node OS choices

A common trend I will see in organizations is to select whatever Linux distro they use for VMs as their Node OS. Debian, Ubuntu, Rocky, etc. I don't recommend this. You shouldn't think of Nodes as VMs that you SSH into on a regular basis and do things in. They're just platforms to run k8s on. I've had a lot of success with Flatcar Linux here. Upgrading the nodes is as easy as rebooting, you can easily define things like SSH with a nice configuration system shown here.

With Node OS I would much rather get security updates more quickly and know that I have to reboot the node on a regular basis as opposed to keeping track of traditional package upgrades and the EOL for different linux distros, then track whether reboots are required. Often folks will combine Flatcar Linux with Rancher Kubernetes Engine for a super simple and reliable k8s standup process. You can see more about that here. This is a GREAT option if you are making a new cluster and want to make your life as easy as possible in the future. Check out those docs here.

If you are going to use a traditional OS, check out kured. This allows you to monitor the "reboot-required" at /var/run/reboot-required and schedule automatic cordon, draining, uncordon of the node. It also ensures only one node is touched at a time. This is something almost everyone forgets to do with kubernetes, which is maintain the Node.

Conclusion

I hope this was helpful. The process of keeping Kubernetes upgraded is less terrible the more often you do it, but the key things are to try and get as much time in your environment baking each minor release. If you stay on a regular schedule, the process of upgrading clusters is pretty painless and idiot-proof as long as you do some checking.

If you are reading this and think "I really want to run my own cluster but this seems like a giant nightmare" I strongly recommend checking out Rancher Kubernetes Engine with Flatcar Linux. It's tooling designed to be idiot-proof and can be easily run by a single operator or a pair. If you want to stick with kubeadm it is doable, but requires more work.

Stuck? Think I missed something obvious? Hit me up here: https://c.im/@matdevdug


Make a Mastodon Bot on AWS Free Tier

Make a Mastodon Bot on AWS Free Tier

With the recent exodus from Twitter due to Elon being a deranged sociopath, many folks have found themselves moving over to Mastodon. I won't go into Mastodon except to say I've moved over there as well (@[email protected]) and have really enjoyed myself. It's a super nice community and I have a lot of hope for the ActivityPub model.

However when I got on Mastodon I found a lot of abandoned bot accounts. These accounts, for folks who don't know, tend to do things like scrape RSS feeds and pump that information into Twitter so you can have everything in one pane of glass. Finding a derelict Ars Technica bot, I figured why not take this opportunity to make a bot of my own. While this would be very easy to do with SQLite, I wanted it to be an AWS Lambda so it wouldn't rely on some raspberry pi being functional (or me remembering that it was running on some instance and then accidentally terminating it because I love to delete servers).

Criteria for the project

  • Pretty idiot-proof
  • Runs entirely within the free tier of AWS
  • Set and forget

Step 1 - DynamoDB

I've never used dynamoDB before, so I figured this could be a fun challenge. I'm still not entirely sure I used it correctly. To be honest I ran into more problems than I was expecting given its reputation as an idiot-proof database.

You can see the simple table structure I made here.

Some things to keep in mind. Because of how DynamoDB stores numbers, the type of the number is Decimal, not int or float. This can cause some strange errors when attempting to store and retrieve ID values. You can read the conversation about it here. I ended up storing the ID as a string which is probably not optimal performance but did make the error go away.

When using DynamoDB, it is vital to not use scan. Query is what I ended up using for all my requests, since then I get to make lookups on my secondary tables with the key. The difference in speed during load testing when I generated a lot of fake URLs was pretty dramatic, 100s of milliseconds vs 10s of seconds.

Source

Now that I've spent some time playing around with DynamoDB, I do see the appeal. It is a surprisingly generous free tier. I've allocated 5 provisioned Read and Write units but honestly we need a tiny fraction of that.

Step 2 - Write the Lambda

You can see my Python lambda here.

NOTE: This is not production-grade python. This is hobby-level python. Were this a work project I would have changed some things about its design. Before you ping me about collision, I calculate that with that big a range of random IDs to pull from it would take ~6 thousands of years of work in order to have a 1% probability of at least one collision. So please, for the love of all things holy, don't ping me. Wanna use UUIDs? Go for it.

For those who haven't deployed to AWS Lambda before, it's pretty easy.

  • Make sure you have Python 3.9 installed (since AWS doesn't support 3.10)
  • Copy that snippet to a directory and call it lambda_function.py
  • Change the rss_feed = to be whatever feed you want to make a bot of.
  • run python3.9 -m venv venv
  • run source venv/bin/activate
  • Then you need to install the dependencies:
    - pip install --target ./package feedparser
    - pip install --target ./package Mastodon.py
    - pip install --target ./package python-dotenv
  • You'll want to cd into the package directory and then run zip -r ../my-deployment-package.zip . to bundle the dependencies together.
  • Finally take the actual python file you want to run and copy it into the zip directory. zip my-deployment-package.zip lambda_function.py

You can also use serverless or AWS SAM to do this all but I find the ZIP file is pretty idiot-proof. Then you just upload it through the AWS web interface, but hold off on doing that. Now that we have the Python environment setup we can generate the credentials.

Step 3 - Mastodon Credentials

Now we're back in the python virtual environment we made before in the directory.

  1. Run source venv/bin/activate
  2. Start the Python 3.9 REPL
  3. Run from mastodon import Mastodon
  4. Run: Mastodon.create_app('your-app-name', scopes=['read', 'write'], api_base_url="https://c.im") (note I'm using c.im but you can use any server you normally use)
  5. Follow the steps outlined here.
  6. You'll get back three values by the end. CLIENT_ID, CLIENT_SECRET from when you registered the bot with the server and then finally an ACCESS_TOKEN after you make an account for the bot and pass the email/password.

6. Copy these values to a .env file in the same directory as the lambda_function.py file from before.

CLIENT_ID=cff45dc4cdae1bd4342079c83155ce0a001a030739aa49ab45038cd2dd739ce
CLIENT_SECRET=d228d1b0571f880c0dc865522855a07a3f31f1dbd95ad81d34163e99fee
ACCESS_TOKEN=Ihisuhdiuhdsifh-OIJosdfgojsdu-RUhVgx6zCows
Example of the .env file along the lambda_function.py

7. Run: zip my-deployment-package.zip .env to copy the secret into the zip directory.

You can also store them as environmental variables in the Lambda but I prefer to manage them like this. Make sure it's not committed in your git repo.

Step 4 - Deploy

  1. Make a new AWS Lambda function with whatever name and ensure it has the ability to access our DynamoDB table. You can get instructions on how to do that here._
  2. Upload the ZIP by just uploading it through the web interface. It's 2 MB total so should be fine.
  3. Set up an EventBridge cron job to trigger the lambda by following the instructions here.
  4. Watch as your Lambda triggers on a regular interval.

Step 5 - Cleanup

  1. Inside of the Mastodon bot account there are a few things you'll want to check. First you want to make sure that the following two options are selected under "Profile"

2. You'll probably want to add an alert for failures under Cloudwatch Alarms. AWS has docs on how to do that here.

Conclusion

Hopefully this is a fun way of adding a simple bot to Mastodon. I've had a lot of fun interacting with the Mastodon.py library. You can see the bot I ended up making here.

If you run into problems please let me know: https://c.im/@matdevdug


TIL Fix missing certificates in Python 3.6 and Up on MacOS

Imagine my surprise when writing a simple Python script on my Mac, I suddenly got SSL errors on every urllib request over HTTPS. I checked the site certificate, looked good. I even confirmed on the Apple help documentation that they included the CAs for this certificate (in this case the Amazon certificates). I was really baffled on what to do until I stumbled across this.

\f0\b0 \ulnone \
This package includes its own private copy of OpenSSL 1.1.1.   The trust certificates in system and user keychains managed by the 
\f2\i Keychain Access 
\f0\i0 application and the 
\f2\i security
\f0\i0  command line utility are not used as defaults by the Python 
\f3 ssl
\f0  module.  A sample command script is included in 
\f3 /Applications/Python 3.11
\f0  to install a curated bundle of default root certificates from the third-party 
\f3 certifi
\f0  package ({\field{\*\fldinst{HYPERLINK "https://pypi.org/project/certifi/"}}{\fldrslt https://pypi.org/project/certifi/}}).  Double-click on 
\f3 Install Certificates
\f0  to run it.\
Link

Apparently starting in Python 3.6, Python stopped relying on the Apple OpenSSL and started bundling their own without certificates. The way this manifests is:

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)>

The fix is to run the following from the terminal(replace 3.10 with whatever version you are running):

/Applications/Python\ 3.10/Install\ Certificates.command

This will install the certifi package, which has all the Mozilla certificates. This solved the problem and hopefully will help you in the future. Really weird choice by the Mac Python team here since it basically breaks Python.


You can finally delete Docker Desktop

You can finally delete Docker Desktop

Podman Desktop is here and it works great. When Docker changed their license to the following, it was widely understood that its time as the default local developer tool was coming to an end.

Docker Desktop remains free for small businesses (fewer than 250 employees AND less than $10 million in annual revenue), personal use, education, and non-commercial open source projects.
Hope you never get acquired I guess

Podman, already in many respects the superior product, didn't initially have a one to one replacement for Docker Desktop, the commonly used local development engine. However now it does and it works amazingly well. Works with your existing Dockerfiles, has all the Kubernetes functionality and even allows you to use multiple container engines (like Docker) at the same time.

I'm shocked how good it is for a not 1.0 release, but for anyone out there installing Docker Desktop at work, stop and use this instead. Download it here.


Passkeys as a tool for user retention

Passkeys as a tool for user retention

With the release of iOS 16 and MacOS Ventura, we are now in the age of passkeys. This is happening through WebAuthn, a specification written by the W3C and FIDO with the involvement of all of the major vendors such as Google, Mozilla, etc. The basic premise is familiar to anyone who has used SSH in their career: you login through the distribution of public keys, keeping the private key on the device.

Like all security initiatives, someone took the problem of "users sometimes reuse passwords" and decided the correct solution is to involve biometric security when I log into my coffee shop website to put my order in. I disagree with the basic premise of the standard on a personal level. I think it probably would have been fine to implement an API where the service simply requested a random password and then stored it in the existing browser password sync. It would have solved the first problem of password reuse, allowed for export and didn't require a total change in how all authentication works. However I am not the kind of person who writes these whitepapers full of math so I defer to the experts.

How WebAuthn works is the server receives from the user a public key and randomly generated ID. This private key is distributed and stored in the client vendor sync process, meaning it is available to different devices as long as those devices exist within the same vendor ecosystem. This stuck out to me as a fascinating transition for passwords and one with long-term implications for user retention across the different platforms.

Imagine being Apple or Google and getting to tell users "if you abandon this platform, you are going to lose every login you have". This is a tremendous threat and since all platforms will effectively be able to share it at the same time, not a legal threat. Let's get into the details of what WebAuthn is and how it works, then walk through how this provides tremendous value to platform holders as a way of locking in users.

WebAuthn Properties

WebAuthn is a web-based API that allows web servers, called Relying Parties to communicate with authenticators on a users device. I'll refer to these as RPs from now on. To get started, the wserver creates new credentials by calling navigator.credentials.create() on the client.

const credential = await navigator.credentials.create({
    publicKey: publicKeyCredentialCreationOptions
});

This object has a number of properties.

  • challenge: a buffer of random bytes generated by the server to prevent replay attacks.
  • rp: basically the website, needs to be a subset of the domain currently in the browser.
  • user: information about the user. Suggested not to use PII data but even if you use name and displayName it doesn't appear that this is ever relayed to the rp source
  • pubKeyCredParams: what public keys are acceptable to the server
  • authenticatorSelection: do you want anything to be allowed to be an authenticator or do you want a cross-platform authenticator like only a YubiKey
  • timeout: self-documenting
  • attestation: information from the authenticator that could be used to track users.

Attestation

What you are getting back as the service is the attestation statement, which is a somewhat vague concept. It's a signed data object that includes info about the public key and some other pieces of information. You can see the generic object below

Source

This part is kind of interesting. There are actually 4 tiers of "information you get back about the user".

  • none: You don't want anything and is the default
  • indirect: the client is allowed how to obtain such a statement. The client may replace an authenticator-generated statement with one generated with an Anonymization CA.
  • direct: you want the statement
  • enterprise: you want the statement that may include uniquely identifying information. The authenticator needs to be configured to say "this rp ID is allowed to request this information", so presumably this will be for devices enrolled in some sort of MDM.

You get back an attestationObject at the end of all of this which basically allows you to parse metadata about the registration event as well as the public key, along with the fmt of the attestation which is effectively the type of authenticator you used. Finally you have the statement itself.

When a user wants to sign in, the process works pretty simply. We call navigator.credentials.get() on the client, which basically says go get me the credentials I specify in allowCredentials. You can say what the ID is and how to get the credentials (through usb, bluetooth, nfc, etc).

For further code samples and a great walkthrough I recommend: https://webauthn.guide/

The key to how this all works is that the private key is synced by the vendor to the different devices (allowing for portability) but also allows for phone delegation. So for instance if you are a Windows Chrome user and want to sign in using passkeys, you can, as long as you still have the Apple device that you originally created the passkey on.

Good diagram if you are using the Okta service, but still valuable either way

Passkey cross-device flow

  1. The vendor supplies a "Passkey from nearby devices" option when the user is logging in
  2. The web site displays a QR code
  3. The device which contains the passkey points its camera and starts an auth flow
  4. The two devices perform a Bluetooth handshake to ensure they are actually near each others and agree on a server to use as an exchange post
  5. The device with the passkey is used to perform the actual auth process.
  6. Now in theory the service should offer some sort of "would you like to make a new passkey on this device".

At no point did you transfer the private key anywhere. That entire sync process is controlled by the vendor, meaning your option for portable authentication is going to be a roaming authentiator (aka a YubiKey).

It's important to note here that a lot of assumptions have been made about developers around the requirement of local storage of private keys. This isn't necessarily the case. Authenticators have the option of not storing their private keys locally at all. You have the option of instead storing the private keys with the rp, encrypted under a symmetric key held by the authenticator. This elasticity to the spec comes up a lot, with many decisions deferred to the vendors.

Why Does This Matter?

WebAuthn has taken a strong "decentralized" design philosophy, which makes sense until you realize that the inability to export private keys isn't....really true. Basically by submitting an attestation, the vendor is saying "these devices private keys cannot be stolen". You can see the initial conversation on GitHub here. It's making the problem someone else's issue.

By saying, in essence, portability of private keys is not a concern of the spec and leaving it entirely in the hands of vendors, we have created one of the greatest opportunities for user lock-in in recent history. It is now on the individual services and the vendors to allow for users to seamlessly sign in using different devices. The degree by which platform owners want to allow these devices to work with each other is entirely within their own control.

We can see that vendors understand this to some extent. Apple has announced that once passkey is supported in the OS, device-bound keys will no longer be supported. You will not have the option of not using iCloud (or Chrome Sync services). Administrators will likely not love the idea of critical keys being synced to devices possibly beyond their control (although the enterprise attestation provides some protection against this). So already in these early days we see a lot of work being done to ensure a good user experience but at the cost of increased vendor buy-in.

Scenario

You are a non-technical user, who used their iPhone in the normal way. When presented with a login you let the default of "use passkeys" ride, without doing anything special. You lose your phone, but don't own any other Apple products. By default iCloud Keychain only works on iOS, iPadOS and macOS. In order to seamlessly log into any service that you registered through your phone with passkeys, you have to purchase another Apple product.

If you attempt to switch to Android, while it supports passkeys, it is between you and the RP on how controlling access to your original account will work. Since allowing users to reset their passkeys through requesting a passkey reset through an email link eliminates a lot of the value of said service, I'm not exactly sure how this is going to be handled. Presumably there will need to be some other out-of-band login.

Also remember that from the RPs side, what you are getting is almost no information about the user. You aren't even (confidently) getting what kind of authenticator. This is great from a GDPR perspective, not having to hold email addresses and names and all sorts of other PII in your database (and does greatly eliminate many of the security concerns around databases). However if I am a web developer who goes "all-in" with this platform, it's hard to see how at some point I'm not going to fall back to "email this user a link to perform some action" and require the email address for account registration.

On the RP side they'll need to: verify this is the right user (hopefully they got some other ID in the registration flow), remove the passphrase and then have the user enroll again. This isn't a terrible flow, but is quite a bit more complicated than the story today of: log in again through SSO or by resetting a password by sending a reset link to email.

What about Chrome?

Chrome supports the standard, but not in a cross-platform way.

Passkeys are synchronized across devices that are part of the same ecosystem. For example, if a user creates a passkey on Android, it is available on all Android devices as long as the user is signed in to the same Google account. However, the same passkey is not available on iOS, macOS or Windows, even if you are using the same browser, like Chrome.
Source

Ultimately it is the hardware you are generating the key on that matters for vendor syncing, not the browser or OS.

The result of this is going to be pretty basic: as years go on, the inertia required to switch platforms will increase as more services are added as passkeys. There exists no way currently that I'm able to find that would allow you to either: add a secondary device to the exchange process or to bulk transfer out of Vendor A and into Vendor B. Instead any user who wants to switch services will need to go through the process of re-enrolling in each services with a new user ID, presumably hoping that email was captured as part of the sign-up flow so that the website or app can tie the two values together.

There is a proposed spec that would allow for dual enrollment, but only from the start. Meaning you would need to have your iOS authenticator, then attach your Chromebook authenticator to it from the start. There is no way to go back through and re-sync all logins with the new device and you would need constant access to both devices to complete the gesture element of the auth.

Yubico proposal

You can read that proposal here.

Yubico has an interesting idea here based on ARKG or Asynchronous Remote Key Generation. The basic idea is that you have a primary authenticator and a secondary authenticator that has no data transfer between the two. The proposed flow looks as follows

  • Backup device generators a private-public key pair and transfers the public key to the primary authenticator
  • This is used by the primary authenticator to derive new public keys on behalf of the backup device
  • Then the primary generates a new pair for each backup device registered and sends this on to the RP along with its primary key.
  • If the primary disappears, the backup device can request the cred from the RP and use it to derive the key used. In order to retrieve the cred associated with a user, there needs to be some sort of identifier outside of the user ID in the spec which is a random value not surfaced to the user.

For more math information on the spec itself check this paper out.

ARKG functionality. ARKG allows arbitrary public keys pk′ to
be derived from an original pk, with corresponding sk′ being cal-
culated at a later time—requiring private key sk for the key pair
(sk, pk) and credential cred.
Definition 3.1 (ARKG). The remote key generation and recovery
scheme ARKG B (Setup, KGen, DerivePK, DeriveSK, Check) con-
sists of the following algorithms:
• Setup(1𝜆 ) generates and outputs public parameters pp =
((G, 𝑔, 𝑞), MAC, KDF1, KDF2) of the scheme for the security
parameter 𝜆 ∈ N.
• KGen(pp), on input pp, computes and returns a private-
public key pair (sk, pk).
• DerivePK(pp, pk, aux) probabilistically returns a new public
key pk′ together with the link cred between pk and pk′, for
the inputs pp, pk and auxiliary data aux. The input aux is
always required but may be empty.
• DeriveSK(pp, sk, cred), computes and outputs either the new
private key sk′, corresponding to the public key pk′ using
cred, or ⊥ on error.
• Check(pp, sk′, pk′), on input (sk′, pk′), returns 1 if (sk′, pk′)
forms a valid private-public key pair, where sk′ is the cor-
responding private key to public key pk′, otherwise 0.
Correctness. An ARKG scheme is correct if, ∀𝜆 ∈ N, pp ←
Setup(1𝜆 ), the probability Pr [Check(pp, sk′, pk′) = 1] = 1 if
(sk, pk) ← KGen(pp);
(pk′, cred) ← DerivePK(pp, pk, ·);
sk′ ← DeriveSK(pp, sk, cred).
Look at all those fun characters.

Challenges

The WebAuthn presents a massive leap forward for security. There's no disputing that. Not only does it greatly reduce the amount of personal information flowing around the auth flow, it also breaks the reliance on email address or phone numbers as sources of truth. The back-bone of the protocol is a well-understand handshake process used for years and extremely well-vetted.

However the spec still has a lot of usability challenges that need to be addressed especially as adoption speeds up.

Here are the ones I see in no particular order:

  • Users and administrators will need to understand and accept that credentials are backed up and synced across unknown devices employing varying levels of biometric security.
  • Core to the concept of WebAuthn is the idea of unlinkability. That means different keys must be used for every new account at the RP. Transferring or combining accounts is a problem for the RP which will require some planning on the part of service providers.
  • In order to use this feature, services like iCloud Sync will be turned on and the security of that vendor account is now the primary security of the entire collection of passwords. This is great for consumers, less great for other systems.
  • There currently exists no concept of delegation. Imagine I need to provide you with some subset of access which I can delegate, grant and revoke permissions, etc. There is an interesting paper on the idea of delegation which you can find here.
  • Consumers should be aware of the level of platform buy-in they're committing to. Users acclimated to Chromebooks and Chrome on Windows being mostly interchangeable should be made aware that this login is now tied to a class of hardware.
  • We need some sort of "vendor exchange" process. This will be a challenge since part of the spec is that you are including information about the authenticator (if the RP asks for it). So there's no reason to think a service which generated an account for you based on one type of authenticator will accept another one. Presumably since the default is no information on this, a sync would mostly work across different vendors.
  • The vendor sync needs to extend outside of OEMs. If I use iCloud passkeys for years and then enroll in 1Password, there's no reason why I shouldn't be able to  move everything to that platform. I understand not allowing them to be exposed to users (although I have some platform ownership questions there like 'isn't it my passkey'), but some sort of certified exchange is a requirement and one that should have been taken care of before the standard was launched.

Conclusion

This is a giant leap forward in security for average users. It is also, as currently implemented, one of the most effective platform lock-ins I've ever seen. Forget the "green text" vs "blue text", as years go on and users rely more and more on passkeys for logins (which they should), switching platforms entirely will go from "a few days of work" to potentially needing to reach out and attempt to undo every single one of these logins and re-enroll. For folks who keep their original devices or a device in the ecosystem, this is mostly time consuming.

For users who don't, which will be a non-trivial percentage (why would a non-technical user keep their iphone around and not sell it if they have a brand new android), this is going to be an immense time commitment. This all assumes a high degree of usage of this standard, but I have trouble imagining web developers won't want to use this. It is simple a better more secure system that shifts a lot of the security burden off of them.


Stop Optimizing for Tutorials

Can't believe that didn't work as advertised. 

It always starts the same way. A demo is shown where a previously complex problem set is reduced down to running one Magic Tool. It is often endorsed by one of the Big Tech Companies or maintained by them. Some members of the organization start calling for the immediate adoption of this technology.  Overnight this new tool becomes a shorthand for "solution to all of our problems".

By the time it gets to the C-level, this tool is surrounded by tons of free press. You'll see articles like "X technology is deprecated, it's time to get onboard with Magic Tool". Those of you new to working in tech probably assume some due-diligence was done to see if the Magic Tool was good or secure or actually a good fit for that particular business. You would be incorrect.

Soon an internal initiative is launched. Some executive decides to make it an important project and teams are being told they have to adopt the Magic Tool. This is typically where the cracks first start to show up, when teams who are relatively apathetic towards the Magic Tool start to use them. "This doesn't do all the things our current stuff does and accounting for the edge cases is a lot more difficult than we were told".

Sometimes the team attempting to roll out this new tech can solve for the edge cases. As time drags on though, the new Magic Tool starts to look as complicated and full of sharp edges as their current tooling. At the end of all this work and discussion, organizations are lucky to see a 2-5% gain in productivity and nothing like the quantum leap forward implied by the advocates for Magic Tool.

Why is this such a common pattern?

We are obsessed with the idea that writing software shouldn't be this hard. None of us really know what we're doing when we start. Your credentials for this job are you learned, at school or at home, how to make small things by yourself. Birdhouses of Java, maybe a few potato clocks of Python. Then you pass a series of interviews where people ask you random, mostly unrelated questions to your job. "How would you make a birdhouse that holds 100,000 birds?" Finally you start working and are asked to construct a fighter jet with 800 other people. The documentation you get when you start describes how to make an F-150 pickup.

So when we see a tutorial or demo which seems to just do the right thing all the time, it lights a fire in people. Yes, that's how this was always supposed to be. The issue wasn't me or my coworkers, the problem was our technology was wrong. The clouds part, the stars align, everything makes sense again. However almost always that's wrong. Large technology changes that touch many teams and systems are never going to be easy and they're never going to just work. The technology might still be superior to be clear, but easy is a different beast.

I'll present 3 examples of good technologies whose user experience for long-term users is worse because of a desire to keep on-boarding as simple as possible. Where the enthusiasm for adoption ultimately hurts the daily usage of these tools. Docker, Kubernetes and Node. All technologies wildly loved in the community whose features are compelling but who hide the true complexity of adoption behind fast and simple tutorials.

To be clear I'm not blaming the maintainers of this software. What I'm trying to suggest is we need to do a better job of frontloading complexity, letting people know the scope and depth of a thing before we attempt to sell it to the industry at large. Often we get into these hype bubbles where this technology is presented as "inevitable", only to have intense disappointment when the reality sinks in.

Your app shouldn't be root inside of a container

Nowhere I think is this more clear than how we have fundamentally failed the container community in fundamental security. Job after job, meetup after meetup, I encounter folks who are simply unaware that the default model for how we build containers is the wrong way to do it. Applications in containers should not be root by default.

But what does root in a container even mean? It's actually not as simple as it sounds. A lot of work has happened behind the scenes to preserve the simple user experience and hide this complexity from you. Let's talk about how Docker went at this problem and how Podman attempts to fix it.

However you need to internalize this. If you are running a stock configured container with a root user and someone escapes that container, you are in serious trouble. Kubernetes or no Kubernetes.

In the off-chance you aren't familiar with how containers work.

Docker

When Docker launched and started to become popular, a lot of eyebrows were raised at it running as root on the host. The explanation made sense though for two reasons: mounting directories without restriction inside of the container and for binding to ports below 1024 on Linux. For those that don't know, any port at or below 1024 on Linux is a privileged port. Running the daemon as root is no longer a requirement of Docker but there are still the same restrictions in terms of port and directory mounting along with host networking and a bevy of other features.

This caused a lot of debate. It seemed crazy to effectively eliminate the security of Linux users for the sake of ease of deployment and testing, but community pressure forced the issue through. We were told this didn't really matter because "root in containers wasn't really root". It didn't make a ton of sense at first but we all went with it. Hello to kernel namespaces.

Namespaces in Linux are, confusingly, nothing like namespaces in Kubernetes. They're an isolation feature, meaning one service running in one namespace cannot see or access a process running in another namespace. There are user namespaces allowing users to have root in a namespace without that power transferring over. Process ID namespaces, meaning there can be multiple PID 1s running isolated from one another. Network namespace with a networking stack, mount namespace which allows for mounting and unmounting without changing the host, IPC namespaces for distinct and isolated POSIX message queues and finally UTS namespaces for different host and domain names.

The sales pitch makes sense. You can have multiple roots, running a ton of processes that are isolated from each other in distinct networking stacks that change mounts and send messages inside of a namespace without impacting other namespaces. The amount of resources and the priority of those resources are controlled by cgroups, which is how cpu and memory limits work for containers.

Sound great. What's the issue?

Namespaces and cgroups are not perfect security and certainly not something you can bet the farm on without a lot of additional context. For instance, running a container with --privileged eliminates the namespace value. This is common in CI/CD stacks when you need to run Docker in Docker. Docker also doesn't use all the namespace options, most importantly supporting but not mandating user namespaces to remap root to another user on the host.

To be fair they're gone to great lengths to try and keep the Docker container experience as simple as possible while attempting to improve security. By default Docker drops some Linux capabilities and you can expand that list as needed. Here is a list of all of the Linux capabilities. Finally, here are the ones that Docker uses. You can see here a security team trying desperately to get ahead of the problem and doing a good job.

The problem is that none of this was communicated to people during the adoption process. It's not really mentioned during tutorials, not emphasized during demos and requires an understanding of Linux to really go at this problem space. If Docker had frontloaded this requirement into the tooling, pushing you to do the right thing from the beginning, we'd have an overall much more secure space. That would have included the following:

  • Warning users a lot when they run the --privileged flag
  • By default creating a new user or userspace with containers that isn't root and creating one common directory for that user to write to.
  • Having --cap-drop=all be the default and having users add each cap they needed.
  • Running --security-opt=no-new-privileges as default
  • Adding resource definitions by default as outlined here: https://docs.docker.com/engine/reference/run/#runtime-constraints-on-resources
  • Helping people to write AppArmor or some security tooling as shown here.
  • Turning User Namespaces on by default if you are root. This allows a container root user to be mapped to a non uid-0 user outside the container and reduces the value of escaping the container.

Instead we've all adopted containers as a fundamental building block of our technologies and mostly did it wrong. Podman solves a lot of these problems btw and should be what you use from now on. By starting fresh they skipped a lot of the issues with Docker and have a much more sustainable and well-designed approach to containers and security in general. Hats off to that team.

How do I tell if my containers are correctly made?

Docker made a great tool to help with this which you can find here. Called Docker Bench for Security, it allows you to scan containers, see if they're made securely and suggests recommendations. I'm a big fan and have learned a lot about the correct ways to use Docker from using it.  

Jeff aged 25 after attempting to brute-force Kubernetes adoption

Kubernetes

I was introduced to Kubernetes years ago as "the system Google uses". Now this turns out not to actually be true, the internal Google system branched off from the design of K8s a long time ago, but it still represented a giant leap forward in terms of application design. Overnight organizations started to adopt it aggressively, often not truly understanding what it was or the problem it tried to solve. Like all tools, Kubernetes is good at some problem areas and terrible at others.

I personally love k8s and enjoy working with it. But many of the early pitches promised things that, in retrospect, were insane. K8s is neither easy to understand or easy to implement, with many sharp edges and corner cases which you run into with surprising frequency. Minikube demos are not a realistic representation of what the final product looks like.

Here's the truth: k8s is a bad choice for most businesses.

The demos for Kubernetes certainly look amazing, right? You take your containers, you shove them into the cloud, it does the right thing, traffic goes to the correct place, it's all magical. Things scale up, they scale down, you aren't locked into one specific vendor or solution. There's an API for everything so you can manage every element from one single control plane. When you first start using k8s it seems magical, this overlay on top of your cloud provider that just takes care of things.

I like how routers, storage and LBs are just "other"

The Reality

The reality is people should have been a lot more clear when introducing k8s as to what problems it is trying to solve and which ones it isn't. K8s is designed for applications that are totally ephemeral, they don't rely on local storage, they receive requests, process them and can be terminated with very little thought. While there might (now) be options for storage, they're a bad idea to use generally due to how storage works with different AZs in cloud providers. You can do things like batch jobs and guaranteed runs inside of k8s, but that's a different design pattern from the typical application.

K8s is also not bulletproof. You'll often run into issues with DNS resolution at scale since the internal networking is highly dependent on it. It comes by default wide open, allowing you to make namespaces that do not isolate network traffic (common mistake I hear all the time). Implementing a service mesh is mandatory as is some sort of tracing solution as you attempt to find what route is causing problems. Control plane problems are rare but catastrophic when they happen. Securing API endpoints and traffic in general requires diligence.

You also need to do a ton of research before you even start k8s. What CNI is best for you, how is external DNS going to work, how are you going to load balance traffic, where is SSL termination happening, how are you going to upgrade the nodes actually running these containers, monitoring and graphs and alerting and logs all need to be mapped out and planned. Are your containers secure? Do you understand your application dependencies between apps?

In order to run k8s at scale across a few hundred developers, a few people in your org need to become extremely familiar with how it works. They will be the technical troubleshooting resource as application teams attempt to work backwards to the problem space. These people will need to be available for questions all the time. Remember you are taking:

  • A entire cloud environment with IAM, VPCs, security policies, Linux servers, buckets, message queues, everything else
  • Adding effectively an entirely new environment on top of that. These two systems still interact and can cause problems for each other, but they have very little visibility from one into the other. This can cause infuriatingly hard to diagnose problems.

So in order to do this correctly, you need people who understand both problem spaces and can assist with sorting where each step is happening.

So when should  I use kubernetes?

  • You are spending a ton of money on containers that are being underused or adding additional capacity to try and keep uptime during failures. K8s is really amazing at resource management and handling node or pod failure.
  • You are deeply committed to microservices. K8s meshes well with typical microservices, allowing you to bake in monitoring and tracing along with clear relationships between microservices through service meshes.
  • Multicloud is a requirement. Be super careful about making this a requirement. It's often a giant mistake.
  • Vendor lock-in is a serious concern. Only applicable if you are a Fortune 500. Otherwise don't waste brain cycles on this.
  • Uptime is critical above all. The complexity pays off in uptime.
  • You are trying to find a bridge technology between your datacenter and the cloud

Otherwise use literally one of a million other options. Starting a new company? Use Lightsail. Too big for Lightsail? Try ECS. You can grow to an extremely large size on these much simpler technologies. If you have maxed out Lightsail, good news, you are now big enough to hire a team. If you decide to go to k8s understand and map out exactly what you want from it. Don't assume turning it on and pushing out some yaml is good enough.

Planning for k8s correctly

The best guide I've seen and one I've used a few times is the NSA k8s hardening guide. It walks through all the common issues and concerns with using k8s at scale, offering good common-sense recommendations at each step. If you are an organization considering using k8s or currently using it and are unsure whether you have it correctly set up, walking through this guide and discussing the points it brings up should at least allow you to approach the project with the correct mindset.

Node.js

Node by itself I have no issues with. I have no particular dog in that race of "should we be writing the backend and frontend in the same language". No, my issue with Node is really with NPM. After working with teams relying on NPM extensively to provide both private, internal packages and to build applications using external third-party dependencies, I'm convinced it is unsafe at any speed.

NPM: Easy over Safe

For a long time application development has followed a pretty stable path in terms of dependencies. You try to write as much as possible with the standard library of your given programming language, only pulling in third-party dependencies when it saved a tremendous amount of time or when they were taking care of something normal people shouldn't be getting involved in (like hashing passwords or creating database connections).

Typically you moved these dependencies into your project for the rest of the projects life, so during a code review process there was often a debate over "does it make sense to import x lines of code forever" in order to save time or add this new functionality. Often the debate hinged on maintenance burdens, where writing functionality using only the standard library made it relatively easy to upgrade between versions.

However dependencies, especially complicated or big dependencies, introduce a lot of possible long term issues. They themselves often have dependencies, or take advantage of more esoteric aspects of a language in order to solve for problems that your project often doesn't have (since the library has a wider use case than just you).

Here's an example. I was working on a PHP application with someone else and I needed to do some junk with S3. I (of course) added the official AWS SDK for PHP, thinking this was the correct way to do things. A more experienced developer stopped me, pointing out correctly that I was adding a ton of new code to the repo forever and maybe there was a better way to do this.

curl -s https://api.github.com/repos/aws/aws-sdk-php | jq '.size' | numfmt --to=iec --from-unit=1024
190M
That's a lot of MB for just using S3

As it turns out this was relatively easy to do without loading the entire SDK into our codebase forever and introducing a lot of new dependencies. However this back and forth consumes time and often requires folks to becomes much more familiar with the services their applications consume. Client libraries for SaaS often hide this complexity behind very simple methods that allow you to quickly drop in an API key and keep going.

Rise of Node

Node emerged onto the scene with a very different promise as compared to most server-side languages. What if, instead of front and backend teams working on different applications forever waiting for new API endpoints to get added, they were maintained by the same group. Driven by these potential cost and effort savings, Node was introduce to the world and has (mostly) lived up to this core promise.

I remain impressed by the concurrency model for Node, which I feel I need to say in order to stop people from accusing me of attacking the platform simply because it is Javascript. I've worked with teams that have exclusively lived inside of Node for years, many of them have extremely positive things to say. There is certainly a part of me that is jealous of their ability to become deep experts on Javascript and using that skillset across a wide variety of tooling.

The real advantage of Node though was the speed by which teams could build complex functionality even with limited backend knowledge. The low barrier to entry to NPM meant anyone was able to very quickly take some classes and make them available to the entire internet. At first this seemed fine, with developers enjoying how easy it was to have reproducible builds across different workspaces and machines with NPM.

As time went on these endless lists of dependencies started to introduce real maintenance burdens well in excess of their value. By emphasizing speed over safety NPM (and by extension Node) has now become dangerous to recommend for anyone to use in handling customer data in virtually any capacity.

NPM Problems

My first large Node application, I was shocked when I measured the count of third-party code to code in the actual application. We had almost a million lines of external code to our several thousand lines of application code. Graphing out these dependencies was like attempting to catch a money-laundering operation. Unlike, say, the PHP applications I had worked on in the past, it was impossible to even begin to vet this many external vendors. Since they're all required, we were open to security issues from any of them.

Let's walk through how I attempted to fix the problem.

  • Obviously started with the basics like npm audit, which works but is a purely reactionary element. It relies on developers maintaining the package, reporting the issue, someone initially reporting the issue. In a Python world where I am using maybe 6 big common dependencies, that makes sense. However in NPM land when I have thousands, many of which are maintained by one person, this isn't practical
  • Fine, can we vet each package before we add it? Even assuming I had the technical ability to assess each package against some sort of "known-good" parameter, the scale of the task would require a thousand people working full-time. Look at the dependency graph for just NPM itself which, remember, is often called by the root user in many tutorials.
Mother of god. Link
  • Alright well that's an impossible task. Maybe we can at least mitigate the problem by sticking with packages that are scoped. At least then we know its from the right org and isn't the result of a typo. Turns almost almost none of the most-downloaded packages from NPM are scoped. React, lodash, express, the list goes on and on. You can see the top NPM packages here.
  • Fine, is there a way I can tell if the NPM package at least requires 2FA to upload the thing? As far as I can tell, you cannot. Package maintainers can enforce 2FA, but I can't check to see if its there.
  • Now NPM has actually taken some great steps here to ensure the package hasn't been tampered with through ECDSA which you can see here for an example: https://registry.npmjs.org/underscore/1.13.6 under the "signatures": [{    "keyid": "SHA256:{{SHA256_PUBLIC_KEY}}",    "sig": "a312b9c3cb4a1b693e8ebac5ee1ca9cc01f2661c14391917dcb111517f72370809..." section.
  • But I still have the same underlying issue. While I can verify that the person publishing the package signed it (which is great), I'm still importing thousands of them. Plus signing the package isn't required so while many of the big packages have it, many don't.

In Practice

What this means is Node apps start to "rot" at an extremely aggressive rate. While they're fast to write and get stood up, they require constant monitoring against security issues with dependency trees spanning thousands of packages. You also constantly run into nested dependency version conflicts (which NPM handles through nesting). In order to keep a Node application patched you need to be willing to override some package dependencies to get a secure version (however you don't know if the actual higher-level dependency will function with the new version or not).

Example of a common pattern in Node application dependencies

Source

You are installing the same thing over and over with slightly different versions. You need to constantly monitor the entire dependency graph and check for declared issues, then hope against hope that everyone is checking everything for you on the internet and reporting like they should. Finally you need to hope that none of these maintainers get their stuff hijacked or simply give up.

Sustainable model of Node

I'm not sure what can be done now to fix this problem. I suspect a great option might be to force packages to "flatten" the dependency space, only allowing for top-level dependencies and not allowing for nested dependencies. However at this point the train has likely left the station.

If you are looking to write safe, long-term Node code, I think your only option is to be extremely aggressive about not allowing packages into your codebase and trying to use the Node Core Modules as outlined here as much as possible. I want to praise the NPM team for switching to a more sustainable signing system and hope it gains widespread adoption (which it seems on track to). Otherwise you basically need to hope and pray that nobody in the thousands of people you rely on decides to blow up your app today.

However in the desire to demonstrate how fast it is to start writing Node by using NPM, we have doomed Node by ensuring that long-term maintenance is nearly impossible. If you fix packages with known-security issues manually and tests starting failing, it can be a massive task to untangle the ball of yarn and figure out what specifically changed.

Conclusion

All of these technologies mostly deliver real, measurable value. They are often advancements over what came before. However organizations were often misled by tutorials and demos that didn't accurately portray the realities of running this technology, instead attempting to encourage adoption by presenting the most positive light possible. I don't blame enthusiasts for this, but I think as a field we should be more honest when showing new technology to peers.

As for the maintainers of these technologies, it ultimately hurts you and your reputation when you don't frontload the complexity into either the tooling or the documentation. I'm going to hit those sharp edges eventually, so why not prepare me for it from the beginning? If at some point I'm going to need to rewrite all of my containers, why not help me do it correctly from the start? If you know I'm going to need to constantly patch my NPM dependencies, maybe add a more robust chain of dependencies so I can audit what is coming in and from where.

Ultimately we help each other by being more honest. I want you to use my tool and get the maximum value out of it, even if the maximum value for you is not to use my tool.


The Book of CP-System

The Book of CP-System

I adore classic arcade machines. They're interested to look at, fun to play and designed for a different era of hardware and software design. While I've certainly spent some time poking around some of the FPGA code for classic arcade cores, I'm still pretty novice when it comes to the low level technical specifications for these machines. The brand that I have the most fond memories of is the old Capcom machines.

Fabian Sanglard has written just this amazing book on the internals of those early Capcom classics like Street Fighter 2. Going into extreme detail on the legendary CPS-1 board, you get more information than I've ever seen before on how it worked, how arcades at the time overall worked and every gritty detail in-between. It's a joy to read with amazing art and diagrams.

Come on, you are curious about how this all works. 

The book is available for a pay what you want at his website. His system for generating the book is also pretty clever with the source code available here.

There's also a great post about the PS2. Both were a lot of fun to read. It's always a blast to learn more about these machines that I spent hundreds of hours with as a kid. Link to Playstation 2 Architecture


TIL About a comparison table of Google internal tools to external products

I've never worked at Google, but I've been impressed with a lot of the data manipulation and storage tools inside of GCP like BigQuery and Cloud Spanner. If you ever worked at Google, this might be a great way to find a replacement for an internal tool you loved. For the rest of us, it's a fascinating look at the internal technology stack of Google.

You can find the comparison table of services here.


Battle of the Text Editors

Battle of the Text Editors

I have always liked text editors. There is something promising about them, a blank canvas that you can quickly mess up with your random commented out lines. I went from BBEdit, my favorite Mac app of all time (seriously I love this app so much I bought a tshirt with their logo on it which you can get here) to Vim. I still use BBEdit all the time, mostly as a clipboard style storage location, notes or just outlining tasks. Vim + tmux has been my go-to now for 5+ years.

Now I know this is a contentious topic. If you like using a full IDE (Integrated Development Environment) know that I am not coming for you. It's not my preference but it is entirely valid and right. People interested in an ideological fight over this issue should remember you are asking people to spend 8 hours a day using this thing. There isn't a right or wrong answer to "what tool should you be using", it's more like picking a chair or a mattress. That said, I'm not qualified to really speak on what is the best IDE (although according to people I work with its VS Code with plugins by a mile).

However with the rise of Rust there has been an explosion of new entries into the text editor space, with a lot of cool and novel ideas. I thought it would be fun to download some and try them out for a few hours each to get some actual work done. These won't be incredibly detailed reviews, but if there is interest I'm more than happy to go back and do a deeper dive into any individual one.

Should I use any of these to pay my bills?

That's up to you, but I will outline a few common recommendations below that I've used extensively for mission critical work along with why I like them. If you are just here looking for what should be your daily workhorse, it would be safer to pick from those. However a lot of the projects I'll discuss later are certainly safe to use, just might not have all the bells and whistles.

Here are some of the tools I've used for at least a few months in a row as my most-used applications along with any tips or add-ons I liked.

If you are ok with a learning curve

  • Vim. I understand Neovim is better, I still use standard Vim. I have not encountered any issues in 5+ years of daily use that would prompt me to change. However if you are just starting go with Neovim. At some point I'm sure I'll find some issue that causes me to switch.  

    Vim plugins: My former coworkers at Braintree maintain the best Vim dotfiles that I have used at every job since which you can grab here.

    Tutorial: I learned with Vim Golf which was actually pretty fun, but I've also heard a lot of good things about Vimified. Either should be fine, also Vim has a built-in tutorial which is more than enough to get you started. If you install Vim from Homebrew or from a linux package, just run vimtutor in your terminal, go through it and you'll have what you need to get started.
  • Tmux. You typically want to leave Vim running in something like Tmux just so you can keep multiple things running at the same time and switch between them. My workflow looks something like: a dev box I am SSH'd into, Vim open and then maybe docs or a man page. I use this for my Tmux config.

You just need something right now

What about Emacs?

I don't really know it well enough to speak to it. I'm open to running through a tutorial but I just never worked at a place where its use was widespread enough to justify sitting down and learning it. If there's a good tutorial though hit me up on Twitter and I'm glad to run through it. Link

I'm here for the new stuff, not this Lifehacker 2011 list.

Look sometimes people stumble on these posts who shouldn't be reading the dribble I write and send me comments on Twitter. Disclaimer over, from here on we're here for the new and the weird.

Amp

Doesn't get more simple than that. Not loving the old copyright date.

Amp is a text editor with a focus on "batteries included". You open it with: amp [dir | file1 file2 ...] after installing it. First day using it, the big value is a great  file finder. Hit space after opening it, Amp indexes all the files in the path and allows you to very quickly search among them.

What takes a second to get used to is that this isn't a fuzzy file finder. Instead of typing full words, use parts of the path , separated by spaces. Here's what it looks like in practice:

I will say accuracy is spot on

This takes a second to get used to, but very quickly I felt like I was searching with more accuracy than ever. Weirdly by default this didn't visually scale, so even if there were more results I didn't see them. You can change this in the config file documented here. I feel like 5 results is kind of crazy low as a default

Typical text movement is , and m keys for scroll up and down. Movement is the  h,j,k,l  commands are there, along with w,b for word. Most of your interactions though will be through the jump mode, which is weird as hell.

What it seems to be doing is going through your file and changing the beginning of the word to be a two letter code that is unique. So you can very quickly jump to words. Now this would make sense if it prefixed the words with the characters, but on my installation it replaced the first two characters. This makes files pretty much impossible to work on.

Unfortunately, even though I really enjoyed the file search, this was the end of my experiment with Amp. It isn't clear whether the tool is still being updated and since so much of the mobility relies on Jump, this messing up the first two characters of every word makes my workflow impossible. I didn't see anything obviously wrong in my installation and I did install from source so I'm not sure what else could be done on my end to resolve this.

I'm really disappointed. I liked what I saw of Amp and I feel like there are some good ideas there. I love the focus on file management and searching. I'll keep an eye on the repo and revisit this if something new comes up.

Survived: .5 hours of my production work

Helix

Helix is maybe the first Rust application I've used where I had to look and see "wait this isn't Node right?" In case you think I'm being ridiculous:

Damn giving me that NPM energy

However upon starting Helix you instantly see why it's including every Rust crate. This is a robust text editor, aimed squarely at the Vims/Emacs of the world. The basic concept is one of multiple cursors, so you are attempting to remove one of the steps from Vim. Overall the goal is a batteries included text editor that handles syntax in a slightly different way.

Syntax Tree

One feature touted in Helix is an integration with Tree-sitter. The idea is to get a more concurrent representation of the structure of your code. You've seen this before, but it's not in Vim and is still an experimental feature in Neovim. It includes built-in support for a lot of languages and in my testing works well, allowing you to search through the tree and not just search text. Basically you can now select nodes vs raw text.

Language Server

Similar to Neovim (and available on Vim 8 through a plugin), there is language server support. I typically lump this more into IDE features but it is a nice touch, especially for folks who have gotten used to it. I found myself using it quite a bit, which is unusual since I don't have it installed in my Vim setup. It turns out I might have been wrong about language servers all these years. You can see the languages supported here.

Enough of the features, how does it work?

I walked into using Helix expecting not to like it. I was completely wrong. Everything about it is a complete delight to use in a way that is so effortless that I wondered in the back of my mind "have I actually used this product before and forgotten?" You start it with hx on the command line and access all commands with :. On first startup, I was already up and running, editing work files quickly.

Minute to minute it does feel a lot like Vim, which for me is a huge perk but might not be for you. The basic flow is reversed though, so instead of action -> object it's object -> action. To delete a word in Vim is dw, delete word. In Helix it is wd. There is a fair amount of muscle memory for me around this, so it isn't an instant change. However the payoff is pretty massive.

Multiple Cursors

You can obviously manipulate text on a global scale with Vim. Search for a word then edit the second selection with cgn. You can then iterate through the text with the . tool. For more complex situations you can run macros over the entire file. While this functionality exists, Helix with the multiple cursors makes it easier to pick up on. Go to the beginning of the text, hit the %, then s and you have multiple cursors to manipulate text.

Trying to help all the time

Hitting g in a file brings up all the goto commands, which is really nice when you are starting.

You'll also notice the language server commands called out here with d. This allows you to jump to the definition as determined by LSP. By default this didn't work with Terraform, but it provided a good chance to try out adding one. There was a slight issue with hcl not being included in the GitHub list of supported languages but it was on the list of languages in their docs. GitHub link here.

Anyway it looks like it uses the standard Terraform language server, so nbd. For some reason the language docs don't link to the language server they mean, but I'm glad to go through and make a PR for that later. We install the Terraform language server, run :config-open and write out a new language entry. Except maybe we don't need to?

Here is where it gets a little bit confusing. My initial read of the documentation here suggested that I need to write a new config to get the language server to work. However that actually isn't the case. I just happened to run hx --health and bam, Terraform is set up. I confirmed yes it does in fact just work with the default config. The Helix team explains I don't need to do anything here but I must have missed it.

Anyway, if its on the list just install the language server and then confirm with hx --health that it has picked it up.

Sure enough it just works

In terms of theme I actually like the purple, but that's extremely customizable. Find those docs here. However they also include most of the normal terminal themes in the box, meaning you can try out whatever moves you with :themes. Here's the ones that come with it:

Just type what you want and the theme instantly switches:

Large Files

Sometimes I'm a bad Vim user. I will admit that I, from time to time, will open giant log files and attempt to search them in Vim vs doing that I should do and use a tool for searching large bodies of txt. Typically Vim will chug while doing this, which is totally understandable, but I was curious how will Helix hold up.

Sample file is: 207M Sep  4 21:19 giant_file.txt and opening it in Vim does cause noticeable slow-downs. Helix opens instantly, like zero visible change to me from a tiny file. Making an even larger file, 879M Sep  4 21:22 huge_file.txt, I decide to test again. BEFORE YOU PING ME ON TWITTER: I know this is because I have a lot of plugins. Helix also comes with a lot of stuff, so I don't consider this to be that insane of a test. Don't agree? Run your own tests.

With the 800M file Vim absolutely chugs, taking forever to show the first line of text. Helix is once again instant. Everything works really quickly and if I didn't know the file was huge, it wouldn't occur to me. I'm super impressed.

Survived: 20 hours of my production work but really could be endless

I didn't walk into this expecting to find a Vim replacement for me, but Helix is really compelling. It's fast, the extension-first mentality shines since the editor comes ready to get to work, everything is really discoverable in a way that just isn't true with Vim. An expert might know how to do everything Helix does in Vim, but there's no way a new user would be able to find out that information as easily in Vim as you can in Helix.

Progress on the editor is steady and impressive, but even in the current state I didn't find a lot to dislike. Performance was extremely stable testing on MacOS Intel and M1 along with Linux ARM and x86. If you are interested in the terminal editor life but want something with a lot more of the modern features built in, Helix is a world class product. The team behind it should be super proud.

If there is interest I would be happy to do a longer in-depth thing about Helix. Just let me know on Twitter.

Kibi

How much text editor can you get into less than 1024 lines of code? More than you might think! Kibi is an interesting project, inspired by kilo. It's a pretty basic text editor, but with some nice features. Think of it more like nano as compared to Helix. It aspires to do a bit less but really exceeds in what it sets out to do.

Most of the main functionality is visible on the page, but you can set syntax highlighting manually by following the steps here. It is unfortunately missing a few features I would love, things like displaying the tree of a directory if you pass it a directory. Instead passing it a directory throws an IO error, which is understandable.

If you are just starting out and looking for a simple text editor, really easy to recommend this one. For people especially who work with one type of file a lot, it's very simple to add syntax highlighting. My hope would be at some point a community collection of common syntax highlighting is added along with some file path support. Unfortunately without those things, there isn't that much for me to test.

Some of the shortcuts didn't work for me either. CTRL-D duplicated but CTRL-R didn't seem to remove the line. The one feature I REALLY liked was CTRL-E, which allows you to execute a shell command and then inputs the value into the line with the cursor. This is really great for a lot of sysadmin tasks where you are switching between tmux panes, putting the output of a command into the file.

This little tool would really get to the next level with just a few adjustments. Maybe add CTRL+arrow keys for jumping to the next word, some common highlighting options. However it's not bad for a nano replacement, which is pretty high praise considering the legacy of that app.

Survived: 1 hour of my production work but not a totally fair comparison

Lapce

Now before you get into it with me, this is a BIT of a stretch. However because it supports modal editing I'm going to consider it. I was mostly curious what does a GUI Rust app look like in a text editor.

My first impression was....mixed.

I like the terminal at the bottom, but there didn't seem to be window resizing and it didn't support terraform (which is fine). However most baffling was after I selected this directory as my project I couldn't figure out how to get out of it. I know that's ironic as a Vim user complaining that I couldn't escape a project but I was a little baffled. Where is the file explorer?

Blue icon connects you to an SSH host

It's actually the file name drop-down centered at the top that is where you can navigate to other projects. You use : and you can access all sorts of commands within the editor.

Once you see it, it makes sense but it was a little baffling at first. 

This would be one of the many UI things I ran into which I thought was unusual. The app doesn't follow most of the MacOS conventions, a strange choice I see a lot with new desktop applications using Rust frameworks. Settings isn't where it should be, under the App name -> Preferences.

Instead it is under a gear icon in the top right corner.

Settings has everything you need but again not very MacOS, not really following normal layout. It's all here though.

Once I opened a codebase that the editor supported, it seemed to work fine. To be honest it's roughly the same experience as whenever I open an IDE. This one feels very snappy and I think there's a lot of good promise here, but it doesn't feel particulally Vim-like.

The primary value of Lapce, from what I could tell, is that it has a great SSH integration, just click the Blue cloud icon and enter the ssh path of username@hostname and it worked great. It also does really well with big projects, lots of text, not slowing down when processing them. However since I don't really work in big codebases like that, the SSH integration is nice but I'm not really able to put it through its paces.

However if you do work with a lot of big files and find other IDEs sluggish, it's probably worth trying. Once you get a handle on the UI it all works pretty well.

Survived: 8 hours

I think if I was more of an IDE person and willing to put in the time Lapce could work fine for my workflow. I just didn't see anything that really spoke to me that would justify the time invested.

Conclusion

It's really an exciting time to be a programmer for a lot of reasons. We have more mechanical keyboards, more monitors and more text editors. But in all seriousness I love that people are out there making this stuff and I think all of them should be proud. It's really exciting to be able to offer people new options in this space instead of repeating the five or so dominant platforms.

If you are looking for something new to try right now, Helix is really impressive and I think hard to beat.

I strongly encourage people to download these, try them out, or PLEASE let me know if there is a good one I missed. I'm more than happy to do a review of another text editor at pretty much any time. You can find me on Twitter.