Skip to content

Why Can't My Mom Email Me?

An investigation into Proton encrypted email.

Suddenly Silence

I'm a big user of email, preferring long chains to messaging apps for a lot of my friends and contacts. It's nice that it isn't tied to a single device or platform and since I own my domain, I can move it from service to service whenever I want and the sender doesn't have to learn some new address. However in the last two months I suddenly stopped getting emails from a percentage of my friends and even my mom.

What I was getting instead was PGP encrypted emails with blank bodies that looked like the following:

If I inspected the message, it was clearly an encrypted email which Fastmail doesn't support. They have a whole blog post on why they don't here: https://www.fastmail.com/blog/why-we-dont-offer-pgp/ but up to this point I haven't really cared one way or the other since nobody sends me encrypted emails.

Now I knew that Proton would send encrypted emails to other Proton email addresses, but obviously this isn't a Proton hosted email address which it would be able to tell pretty easily with DNS. Then it got even stranger when I tried my work email and got the same error.

Checking the raw message and there it is, Proton has encrypted this email. Now this address is hosted on Google Workspaces, so at this point I'm just baffled. Can Proton email users not send emails to people on Google Workspaces email addresses? That can't possibly be right? My friends and mom using Proton would have noticed that their emails seem to always disappear into the ether for the majority of the people they email.

I open a ticket with Fastmail hoping they've seen this problem before, but no luck. Then I opened a ticket with Proton but didn't hear back as of the time of me writing this.

How Proton Seems To Work

So the reason why so many people I know are moving to Proton is they seem to be the only game in town that has cracked sending encrypted emails in the least annoying way possible. Their encryption uses asymmetric PGP key pairs with lookup for other users public keys happening on their key server. This in conjunction with their Key Transparency technology that compares lookup requests by the client with requests on the server-side allows for easy encrypted message exchanges with a high degree of safety, at least according to them.

There seems to be three classes of keys at Proton.

  • User keys: encrypt account-specific stuff like contacts. Not shared.
  • Address keys: for encrypting messages and data.
  • Other keys: part of a key tree that leads back to the address key as the primary external key for people to use.

So that makes sense that Proton can lookup address keys for users on their system. But where are my keys coming from? So in their Proton Key Transparency whitepaper they have this little snippet on page 10:

For External Addresses, the server may return email encryption keys that it
found in the Web Key Directory (WKD) [6] (since email is hosted elsewhere).
The server may also return data encryption keys, used e.g. for Proton Drive.
The former should have an absence proof in KT, and the latter should have an
inclusion proof.
For Non-Proton Addresses, the server may also return keys that it found in the
WKD. This way clients can automatically encrypt emails to it. These keys won’t
be in ProtonKT, thus KT should return an absence proof.

What The Hell Is WKD?

WKD, or OpenPGP Web Key Directory is an IETF draft by Werner Koch. It describes a service where you can lookup OpenPGP keys by mail addresses using a service. It also allows the key owner and the mail provider to publish and revoke keys. The whole thing is very clever, an interesting way to get around the annoying parts of PGP encryption of email. You can read it here: https://www.ietf.org/archive/id/draft-koch-openpgp-webkey-service-16.txt

It outlines an enrollment process by which I would signal to a WKD service that I have a key that I want to enroll into the process. The only problem is I never did that, or at least certainly can't remember doing that. I'm certainly not hosting a page with any key verification stuff.

There seems to be a way to set a CNAME record to point towards keys.openpgp.org where I do have a key set, but that isn't set up on my domain.

nslookup openpgpkey.matduggan.com
Server:		2a01:4f8:c2c:123f::1
Address:	2a01:4f8:c2c:123f::1#53

Non-authoritative answer:
*** Can't find openpgpkey.matduggan.com: No answer

Source here: https://keys.openpgp.org/about/usage

I can't seem to find why Proton thinks they can use this key BUT I can confirm this is the key they're encrypting the emails with.

What?

So it seems if your email address returns a key from keys.openpgp.org then Proton will encrypt the message with your public key from there, even though (as far as I can tell) I haven't opted into them using this service. I also can't seem to figure out a way to signal to them they shouldn't do it.

Alright so what happens if I just remove my key from keys.openpgp.org. The process is pretty simple, just go to: https://keys.openpgp.org/manage and follow the instructions in the email. It seems to work more or less instantly.

Alright looks like we figured it out!

Proton Seriously What The Hell?

I'm at a little bit of a loss here. I totally understand sending me encrypted emails if I've gone through the steps to set the CNAME that indicates that I want to do that, but it doesn't seem like that's how the service works. As far as I can tell, the act of uploading a OpenPGP-compatible key seems to trigger their service to send it as an end-to-end encrypted message.

I'll update this with whatever I hear back from Proton but in the meantime if you stumble across this post after getting blank emails from people for months, you'll at least be able to fix it.

Is there some flag I've accidentally set somewhere that tells Proton to send me encrypted emails? Let me know at: https://c.im/@matdevdug


Why Don't I Like Git More?

Why Don't I Like Git More?

I've been working with git now full-time for around a decade now. I use it every day, relying on the command-line version primarily. I've read a book, watched talks, practiced with it and in general use it effectively to get my job done. I even have a custom collection of hooks I install in new repos to help me stay on the happy path. I should like it, based on mere exposure effect alone. I don't.

I don't feel like I can always "control" what git is going to do, with commands sometimes resulting in unexpected behavior that is consistent with the way git works but doesn't track with how I think it should work. Instead, I need to keep a lot in my mind to get it to do what I want. "Alright, I want to move unstaged edits to a new branch. If the branch doesn't exist, I want to use checkout, but if it does exist I need to stash, checkout and then stash pop." "Now if the problem is that I made changes on the wrong branch, I want stash apply and not stash pop." "I need to bring in some cross-repo dependencies. Do I want submodules or subtree?"

I need to always deeply understand the difference between reset, revert, checkout, clone, pull, fetch, cherrypick when I'm working even though some of those words mean the same thing in English. You need to remember that push and pull aren't opposites despite the name. When it comes to merging, you need to think through the logic of when you want rebase vs merge vs merge --squash. What is the direction of the merge? Shit, I accidentally deleted a file awhile ago. I need to remember git rev-list -n 1 HEAD – filename. Maybe I deleted a file and immediately realized it but accidentally committed it. git reset --hard HEAD~1 will fix my mistake but I need to remember what specifically --hard does when you use it and make sure it's the right flag to pass.

Nobody is saying this is impossible and clearly git works for millions of people around the world, but can we be honest for a second and acknowledge that this is massive overkill for the workflow I use at almost every job which looks as follows:

  • Make a branch
  • Push branch to remote
  • Do work on branch and then make a Pull Request
  • Merge PR, typically with a squash and merge cause it is easier to read
  • Let CI/CD do its thing.

I've never emailed a patch or restored a repo from my local copy. I don't spend weeks working offline only to attempt to merge a giant branch. We don't let repos get larger than 1-2 GB because then they become difficult to work with when I just need to change like three files and make a PR. None of the typical workflow benefits from the complexity of git.

More specifically, it doesn't work offline. It relies on merge controls that aren't even a part of git with Pull Requests. Most of that distributed history gets thrown away when I do a squash. I don't gain anything with my local disk being cluttered up with out-of-date repos I need to update before I start working anyway.

Now someone saying "I don't like how git works" is sort of like complaining about PHP in terms of being a new and novel perspective. Let me lay out what I think would be the perfect VCS and explore if I can get anywhere close to there with anything on the market.

Gitlite

What do I think a VCS needs (and doesn't need) to replace git for 95% of usecases.

  • Dump the decentralized model. I work with tons of repos, everyone works with tons of repos, I need to hit the server all the time to do my work anyway. The complexity of decentralized doesn't pay off and I'd rather be able to do the next section and lose it. If GitHub is down today I can't deploy anyway so I might as well embrace the server requirement as a perk.
  • Move a lot of the work server-side and on-demand. I wanna search for something in a repo. Instead of copying everything from the repo, running the search locally and then just accepting that it might be out of date, run it on the server and tell me what files I want. Then let me ask for just those files on-demand instead of copying everything.
  • I want big repos and I don't want to copy the entire thing to my disk. Just give me the stuff I want when I request it and then leave the rest of it up there. Why am I constantly pulling down hundreds of files when I work with like 3 of them.
  • Pull Request as a first-class citizen. We have the concept of branches and we've all adopted the idea of checks that a branch must pass before it can be merged. Let's make that a part of the CLI flow. How great would be it to be able to, inside the same tool, ask the server to "dry-run" a PR check and see if my branch passes? Imagine taking the functionality of the gh CLI and not making it platform specific ala kubectl with different hosted Kubernetes providers.
  • Endorsing and simplifying the idea of cross-repo dependencies. submodules don't work the way anybody wants them to. subtree does but taking work and pushing it back to the upstream dependency is confusing and painful to explain to people. Instead I want something like: https://gitmodules.com/
    • My server keeps it in sync with the remote server if I'm pulling from a remote server but I can pin the version in my repo.
    • My changes in my repo go to the remote dependency if I have permission
    • If there are conflicts they are resolved through a PR.
  • Build in better visualization tools. Let me kick out to a browser or whatever to more graphically explore what I'm looking at here. A lot of people use the CLI + a GUI tool to do this with git and it seems like something we could roll into one step.
  • Easier centralized commit message and other etiquette enforcement. Yes I can distribute a bunch of git hooks but it would be nice if when you cloned the repo you got all the checks to make sure that you are doing things the right way before you wasted a bunch of time only to get caught by the CI linter or commit message format checker. I'd also love some prompts like "hey this branch is getting pretty big" or "every commit must be of a type fix/feat/docs/style/test/ci whatever".
  • Read replica concept. I'd love to be able to point my CI/CD systems at a read replica box and preserve my primary VCS box for actual users. Primary server fires a webhook that triggers a build with a tag, hits the read replica which knows to pull from the primary if it doesn't have that tag. Be even more amazing if we could do some sort of primary/secondary model where I can set both in the config and if primary (cloud provider) is down I can keep pushing stuff up to somewhere that is backed up.

So I tried out a few competitors to see "is there any system moving more towards this direction".

SVN in 2024

My first introduction to version control was SVN (Subversion), which was pitched to me as "don't try to make a branch until you've worked here a year". However getting SVN to work as a newbie was extremely easy because it doesn't do much. Add, delete, copy, move, mkdir, status, diff, update, commit, log, revert, update -r, co -r were pretty much all the commands you needed to get rolling. Subversion has a very simple mental model of how it works which also assists with getting you started. It's effectively "we copied stuff to a file server and back to your laptop when you ask us to".

I have to say though, svn is a much nicer experience than I remember. A lot of the rough edges seem to have been sanded down and I didn't hit any of the old issues I used to. Huge props to the Subversion team for delivering great work.

Subversion Basics

Effectively your Subversion client commits all your files as a single atomic transaction to the central server as the basic function. Whenever that happens, it creates a new version of the whole project, called a revision. This isn't a hash, it's just a number starting at zero, so there's no confusion as a new user what is the "newer" or "older" thing. These are global numbers, not tied to a file, so it's the state of the world. For each individual file there are 4 states it can be in:

  • Unchanged locally + current remote: leave it alone
  • Locally changed + current remote: to publish the change you need to commit it, an update will do nothing
  • Unchanged locally + out of date remotely: svn update will merge the latest copy into your working copy
  • Locally changed + out of date remotely: svn commit won't work, svn update will try to resolve the problem but if it can't then the user will need to figure out what to do.

It's nearly impossible to "break" SVN because pushing up doesn't mean you are pulling down. This means different files and directories can be set to different revisions, but only when you run svn update does the whole world true itself up to the latest revision.

Working with SVN looks as follows:

  • Ensure you are on the network
  • Run svn update to get your working copy up to latest
  • Make the changes you need, remembering not to use OS tooling to move or delete files and instead use svn copy and svn move so it knows about the changes.
  • Run svn diff to make sure you want to do what you are talking about doing
  • Run svn update again, resolve conflicts with svn resolve
  • Feeling good? Hit svn commit and you are done.

Why did SVN get dumped then? One word: branches.

SVN Branches

In SVN a branch is really just a directory that you stick into where you are working. Typically you do it as a remote copy and then start working with it, so it looks more like you are copying the URL path to a new URL path. But to users they just look like normal directories in the repository that you've made. Before SVN 1.4 merging a branch required a masters degree and a steady hand, but they added an svn merge which made it a bit easier.

Practically you are using svn merge against the main to keep your branch in sync and then when you are ready to go, you run svn merge --reintegrate to push the branch to master. Then you can delete the branch, but if you need to read the log the URL of the branch will always work to read the log of. This was particularly nice with ticket systems where the URL was just the ticket number. But you don't need to clutter things up forever with random directories.

In short a lot of the things that used to be wrong with svn branches aren't anymore.

What's wrong with it?

So SVN breaks down IME when it comes to automation. You need to make it all yourself. While you do you have nuanced access control over different parts of a repo, in practice this wasn't often valuable. What you don't have is the ability to block someone from merging in a branch without some sort of additional controls or check. It also can place a lot of burden on the SVN server since nobody seems to ever update them even when you add a lot more employees.

Also the UIs are dated and the entire tooling ecosystem has started to rot from users leaving. I don't know if I could realistically recommend someone jump from git to svn right now, but I do think it has a lot of good ideas that moves us closer to what I want. It would just need a tremendous amount of UI/UX investment in terms of web to get it to where I would like using it over git. But I think if someone was interested in that work, the fundamental "bones" of Subversion are good.

Sapling

One thing I've heard from every former Meta engineer I've worked with is how much they miss their VCS. Sapling is that team letting us play around with a lot of those pieces, adopted for a more GitHub-centric world. I've been using it for my own personal stuff for a few months and have really come to fall in love with it. It feels like Sapling is specifically designed to be easy to understand, which is a delightful change.

A lot of the stuff is the same. You clone with sl clone, you check the status with sl status and you commit with sl commit. The differences that immediately stick out are the concept of stacks and the concept of the smartlog. So stacks are "collections of commits" and the idea is that from the command line I can issue PRs for those changes with sl pr submit with each GitHub PR being one of the commits. This view (obviously) is cluttered and annoying, so there's another tool that helps you see the changes correctly which is ReviewStack.

None of this makes a lot of sense unless I show you what I'm talking about. I made a new repo and I'm adding files to it. First I check the status:

❯ sl st
? Dockerfile
? all_functions.py
? get-roles.sh
? gunicorn.sh
? main.py
? requirements.in
? requirements.txt

Then I add the files:

sl add .
adding Dockerfile
adding all_functions.py
adding get-roles.sh
adding gunicorn.sh
adding main.py
adding requirements.in
adding requirements.txt

If I want a nicer web UI running locally, I run sl web and get this:

So I added all those files in as one Initial Commit. Great, let's add some more.

❯ sl
@  5a23c603a  4 seconds ago  mathew.duggan
│  feat: adding the exceptions handler
│
o  2652cf416  17 seconds ago  mathew.duggan
│  feat: adding auth
│
o  2f5b8ee0c  9 minutes ago  mathew.duggan
   Initial Commit

Now if I want to navigate this stack, I can just use sl prev which moves me up and down the stack:

sl prev 1
0 files updated, 0 files merged, 1 files removed, 0 files unresolved
[2f5b8e] Initial Commit

And that is also represented in my sl output

❯ sl
o  5a23c603a  108 seconds ago  mathew.duggan
│  feat: adding the exceptions handler
│
o  2652cf416  2 minutes ago  mathew.duggan
│  feat: adding auth
│
@  2f5b8ee0c  11 minutes ago  mathew.duggan
   Initial Commit

This also shows up in my local web UI

Finally the flow ends with sl pr to create Pull Requests. They are GitHub Pull Requests but they don't look like normal GitHub pull requests and you don't want to review them the same way. The tool you want to use for this is ReviewStack.

I stole their GIF because it does a good job

Why I like it

Sapling lines up with what I expect a VCS to do. It's easier to see what is going on, it's designed to work with a large team and it surfaces the information I want in a way that makes more sense. The commands make more sense to me and I've never found myself unable to do something I needed to do.

More specifically I like throwing away the idea of branches. What I have is a collection of commits that fork off from the main line of development, but I don't have a distinct thing I want named that I'm asking you to add. I want to take the main line of work and add a stack of commits to it and then I want someone to look at that collection of commits and make sure it makes sense and then run automated checks against it. The "branch" concept doesn't do anything for me and ends up being something I delete anyway.

I also like that it's much easier to undo work. This is something where I feel like git makes it really difficult to handle and uncommit, unamend, unhide, and undo in Sapling just work better for me and always seem to result in the behavior that I expected. Losing the staging area and focusing more on easy to use commands is a more logical design.

Why you shouldn't switch

If I love Sapling so much, what's the problem? So to get Sapling to the place I actually want it to be, I need more of the Meta special sauce running. Sapling works pretty well on top of GitHub, but what I'd love is to get:

These seem to be the pieces to get all the goodness of this system as outlined below

  • On-demand historical file fetching (remotefilelog, 2013)
  • File system monitor for faster working copy status (watchman, 2014)
  • In-repo sparse profile to shrink working copy (2015)
  • Limit references to exchange (selective pull, 2016)
  • On-demand historical tree fetching (2017)
  • Incremental updates to working copy state (treestate, 2017)
  • New server infrastructure for push throughput and faster indexes (Mononoke, 2017)
  • Virtualized working copy for on-demand currently checked out file or tree fetching (EdenFS, 2018)
  • Faster commit graph algorithms (segmented changelog, 2020)
  • On-demand commit fetching (2021)

I'd love to try all of this together (and since there is source code for a lot of it, I am working on trying to get it started) but so far I don't think I've been able to see the full Sapling experience. All these pieces together would provide a really interesting argument for transitioning to Sapling but without them I'm really tacking a lot of custom workflow on top of GitHub. I think I could pitch migrating wholesale from GitHub to something else, but Meta would need to release more of these pieces in an easier to consume fashion.

Scalar

Alright so until Facebook decided to release the entire package end to end, Sapling exists as a great stack on top of GitHub but not something I could (realistically) see migrating a team to. Can I make git work more the way I want to? Or at least can I make it less of a pain to manage all the individual files?

Microsoft has a tool that does this, VFS for Git, but it's Windows only so that does nothing for me. However they also offer a cross-platform tool called Scalar that is designed to "enable working with large repos at scale". It was originally a Microsoft technology and was eventually moved to git proper, so maybe it'll do what I want.

What scalar does is effectively set all the most modern git options for working with a large repo. So this is the built-in file-system monitor, multi-pack index, commit graphs, scheduled background maintenance, partial cloning, and clone mode sparse-checkout.

So what are these things?

  • The file system monitor is FSMonitor, a daemon that tracks changes to files and directories from the OS and adds them to a queue. That means git status doesn't need to query every file in the repo to find changes.
  • Take the git pack directory with a pack file and break it into multiples.
  • Commit graphs which from the docs:
    • " The commit-graph file stores the commit graph structure along with some extra metadata to speed up graph walks. By listing commit OIDs in lexicographic order, we can identify an integer position for each commit and refer to the parents of a commit using those integer positions. We use binary search to find initial commits and then use the integer positions for fast lookups during the walk."
  • Finally clone mode sparse-checkout. This allows people to limit their working directory to specific files

The purpose of this tool is to create an easy-mode for dealing with large monorepos, with an eye towards monorepos that are actually a collection of microservices. Ok but does it do what I want?

Why I like it

Well it's already built into git which is great and it is incredibly easy to use and get started with. Also it does some of what I want. Taking a bunch of existing repos and creating one giant monorepo, the performance was surprisingly good. The sparse-checkout means I get to designate what I care about and what I don't and also solves the problem of "what if I have a giant directory of binaries that I don't want people to worry about" since it follows the same pattern matching as .gitignore

Now what it doesn't do is radically change what git is. You could grow a repo to much much larger with these defaults set, but it's still handling a lot of things locally and requiring me to do the work. However I will say it makes a lot of my complaints go away. Combined with the gh CLI tool for PRs and I can cobble together a reasonably good workflow that I really like.

So while this is definitely the pattern I'm going to be adopting from now on (monorepo full of microservices where I manage scale with scalar), I think it represents how far you can modify git as an existing platform. This is the best possible option today but it still doesn't get me to where I want to be. It is closer though.

You can try it out yourself: https://git-scm.com/docs/scalar

Conclusion

So where does this leave us? Honestly, I could write another 5000 words on this stuff. It feels like as a field we get maddeningly close to cracking this code and then give up because we hit upon a solution that is mostly good enough. As workflows have continued to evolve, we haven't come back to touch this third rail of application design.

Why? I think the people not satisfied with git are told that is a result of them not understanding it. It creates a feeling that if you aren't clicking with the tool, then the deficiency is with you and not with the tool. I also think programmers love decentralized designs because it encourages the (somewhat) false hope of portability. Yes I am entirely reliant on GitHub actions, Pull Requests, GitHub access control, SSO, secrets and releases but in a pinch I could move the actual repo itself to a different provider.

Hopefully someone decides to take another run at this problem. I don't feel like we're close to done and it seems like, from playing around with all these, that there is a lot of low-hanging optimization fruit that anyone could grab. I think the primary blocker would be you'd need to leave git behind and migrate to a totally different structure, which might be too much for us. I'll keep hoping it's not though.

Corrections/suggestions: https://c.im/@matdevdug


IAM Is The Worst

Imagine your job was to clean a giant office building. You go from floor to floor, opening doors, collecting trash, getting a vacuum out of the cleaning closet and putting it back. It's a normal job and part of that job is someone gives you a key. The key opens every door everywhere. Everyone understands the key is powerful, but they also understand you need to do your job.

Then your management hears about someone stealing janitor keys. So they take away your universal key and they say "you need to tell Suzie, our security engineer, which keys you need at which time". But the keys don't just unlock one door, some unlock a lot of doors and some desk drawers, some open the vault (imagine this is the Die Hard building), some don't open any doors but instead turn on the coffee machine. Obviously the keys have titles, but the titles mean nothing. Do you need the "executive_floor/admin" key or the "executive_floor/viewer" key?

But you are a good employee and understand that security is a part of the job. So you dutifully request the keys you think you need, try to do your job, open a new ticket when the key doesn't open a door you want, try it again, it still doesn't open the door you want so then there's another key. Soon your keyring is massive, just a clanging sound as you walk down the hallway. It mostly works, but a lot of the keys open stuff you don't need, which makes you think maybe this entire thing was pointless.

The company is growing and we need new janitors, but they don't want to give all the new janitors your key ring. So they roll out a new system which says "now the keys can only open doors that we have written down that this key can open, even if it says "executive_floor/admin". The problem is people move offices all the time, so even if the list of what doors that key opened was true when it was issued, it's not true tomorrow. The Security team and HR share a list, but the list sometimes drifts or maybe someone moves offices without telling the right people.

Soon nobody is really 100% sure what you can or cannot open, including you. Sure someone can audit it and figure it out, but the risk of removing access means you cannot do your job and the office doesn't get cleaned. So practically speaking the longer someone works as a janitor the more doors they can open until eventually they have the same level of access as your original master key even if that wasn't the intent.

That's IAM (Identity and access management) in cloud providers today.

Stare Into Madness

AWS IAM Approval Flow
GCP IAM Approval Flow
It's Not Natural, It's Just Simple: Food Branding Co-Opts Another Mean

Honestly I don't even know why I'm complaining. Of course it's entirely reasonable to expect anyone working in a cloud environment to understand the dozen+ ways that they may or may not have access to a particular resource. Maybe they have permissions at a folder level, or an org level, but that permission is gated by specific resources.

Maybe they don't even have access but the tool they're interacting with the resource with has permission to do it, so they can do it but only as long as they are SSH'd into host01, not if they try to do it through some cloud shell. Possibly they had access to it before, but now they don't since they moved teams. Perhaps the members of this team were previously part of some existing group but now new employees aren't added to that group so some parts of the team can access X but others cannot. Or they actually have the correct permissions to the resource but the resource is located in another account and they don't have the right permission to traverse the networking link between the two VPCs.

Meanwhile someone is staring at these flowcharts trying to figure out what in hell is even happening here. As someone who has had to do this multiple times in my life, let me tell you the real-world workflow that ends up happening.

  • Developer wants to launch a new service using new cloud products. They put in a ticket for me to give them access to the correct "roles" to do this.
  • I need to look at two elements of it, both what are the permissions the person needs in order to see if the thing is working and then the permissions the service needs in order to complete the task it is trying to complete.
  • So I go through my giant list of roles and try to cobble together something that I think based on the names will do what I want. Do you feel like a roles/datastore.viewer or more of a roles/datastore.keyVisualizerViewer? To run backups is roles/datastore.backupsAdmin sufficient or do I need to add roles/datastore.backupSchedulesAdmin in there as well?
  • They try it and it doesn't work. Reopen the ticket with "I still get authorizationerror:foo". I switch that role with a different role, try it again. Run it through the simulator, it seems to work, but they report a new different error because actually in order to use service A you need to also have a role in service B. Go into bathroom, scream into the void and return to your terminal.
  • We end up cobbling together a custom role that includes all the permissions that this application needs and the remaining 90% of permissions are something it will never ever use but will just sit there as a possible security hole.
  • Because /* permissions are the work of Satan, I need to scope it to specific instances of that resource and just hope nobody ever adds a SQS queue without....checking the permissions I guess. In theory we should catch it in the non-prod environments but there's always the chance that someone messes up something at a higher level of permissions that does something in non-prod and doesn't exist in prod so we'll just kinda cross our fingers there.

GCP Makes It Worse

So that's effectively the AWS story, which is terrible but at least it's possible to cobble together something that works and you can audit. Google looked at this and said "what if we could express how much we hate Infrastructure teams as a service?" Expensive coffee robots were engaged, colorful furniture was sat on and the brightest minds of our generation came up with a system so punishing you'd think you did something to offend them personally.

Google looked at AWS and said "this is a tire fire" as corporations put non-prod and prod environments in the same accounts and then tried to divide them by conditionals. So they came up with a folder structure:

GCP Resource Hierarchy

The problem is that this design encourages unsafe practices by promoting "groups should be set at the folder level with one of the default basic roles". It makes sense logically at first that you are a viewer, editor or owner. But as GCP adds more services this model breaks down quickly because each one of these encompasses thousands upon thousands of permissions. So additional IAM predefined roles were layered on.

People were encouraged to move away from the basic roles and towards the predefined roles. There are ServiceAgent roles that were designated for service accounts, aka the permissions you actual application has and then everything else. Then there are 1687 other roles for you to pick from to assign to your groups of users.

The problem is none of this is actually best practice. Even when assigning users "small roles", we're still not following the principal of least privilege. Also the roles don't remain static. As new services come online permissions are added to roles.

The above is an automated process that pulls down all the roles from the gcloud CLI tool and updates them for latest. It is a constant state of flux with roles with daily changes. It gets even more complicated though.

You also need to check the launch stage of a role.

Custom roles include a launch stage as part of the role's metadata. The most common launch stages for custom roles are ALPHA, BETA, and GA. These launch stages are informational; they help you keep track of whether each role is ready for widespread use. Another common launch stage is DISABLED. This launch stage lets you disable a custom role.
We recommend that you use launch stages to convey the following information about the role:
EAP or ALPHA: The role is still being developed or tested, or it includes permissions for Google Cloud services or features that are not yet public. It is not ready for widespread use.
BETA: The role has been tested on a limited basis, or it includes permissions for Google Cloud services or features that are not generally available.
GA: The role has been widely tested, and all of its permissions are for Google Cloud services or features that are generally available.
DEPRECATED: The role is no longer in use.

Who Cares?

Why would anyone care if Google is constantly changing roles? Well it matters because with GCP to make a custom role, you cannot combine predefined roles. Instead you need to go down to the permission level to list out all of the things those roles can do, then feed that list of permissions into the definition of your custom role and push that up to GCP.

In order to follow best practices this is what you have to do. Otherwise you will always be left with users that have a ton of unused permissions along with the fear of a security breach allowing someone to execute commands in your GCP account through an applications service account that cause way more damage than the actual application justifies.

So you get to build automated tooling which either queries the predefined roles for change over time and roll those into your custom roles so that you can assign a user or group one specific role that lets them do everything they need. Or you can assign these same folks multiple of the 1600+ predefined roles, accept that they have permissions they don't need and also just internalize that day to day you don't know how much the scope of those permissions have changed.

The Obvious Solution

Why am I ranting about this? Because the solution is so blindly obvious I don't understand why we're not already doing it. It's a solution I've had to build, myself, multiple times and at this point am furious that this keeps being my responsibility as I funnel hundreds of thousands of dollars to cloud providers.

What is this obvious solution? You, an application developer, need to launch a new service. I give you a service account that lets you do almost everything inside of that account along with a viewer account for your user that lets you go into the web console and see everything. You churn away happily, writing code that uses all those new great services. Meanwhile, we're tracking all the permissions your application and you are using.

At some time interval, 30 or 90 or whatever days, my tool looks at the permissions your application has used over the last 90 days and says "remove the global permissions and scope it to these". I don't need to ask you what you need, because I can see it. In the same vein I do the same thing with your user or group permissions. You don't need viewer everywhere because I can see what you've looked at.

Both GCP and AWS support this and have all this functionality baked in. GCP has the role recommendations which tracks exactly what I'm talking about and recommends lowering the role. AWS tracks the exact same information and can be used to do the exact same thing.

What if the user needs different permissions in a hurry?

This is not actually that hard to account for and again is something I and countless others have been forced to make over and over. You can issue expiring permissions in both situations where a user can request a role be temporarily granted to them and then it disappears in 4 hours. I've seen every version of these, from Slack bots to websites, but they're all the same thing. If user is in X group they're allowed to request Y temporary permissions. OR if the user is on-call as determined with an API call to the on-call provider they get more powers. Either design works fine.

That seems like a giant security hole

Compared to what? Team A guessing what Team B needs even though they don't ever do the work that Team B does? Some security team receiving a request for permissions and trying to figure out if the request "makes sense" or not? At least this approach is based on actual data and not throwing darts at a list of IAM roles and seeing what "feels right".

Conclusion

IAM started out as an easy idea that as more and more services were launched, started to become nightmarish to organize. It's too hard to do the right thing now and it's even harder to do the right thing in GCP compared to AWS. The solution is not complicated. We have all the tools, all the data, we understand how they fit together. We just need one of the providers to be brave enough to say "obviously we messed up and this legacy system you all built your access control on is bad and broken". It'll be horrible, we'll all grumble and moan but in the end it'll be a better world for us all.

Feedback: https://c.im/@matdevdug


State Of The Blog

Just a quick opportunity to check in with you all and say thanks!

Don't worry, nothing is changing. I just wanted to write this as a quick thank you to all of you for checking in with my little site. I also wanted to address a few questions I've gotten now that we've hit a certain threshold of traffic so I can direct people here in the future with questions. If I miss something hit me up on Mastodon here: https://c.im/@matdevdug

20k Average Weekly Visitors!

One of the milestones I never thought I'd hit was 10,000 average weekly visitors but we have blown past that without me noticing. Here are the last 30 days stats for those interested in such things. This obviously has a giant spike throwing the data off but if you look at a typical 30 day period we're just at 20,000 a week average.

I'm glad so many of you have found my little corner of the internet. It's been a pleasure to interact with (almost) all of you and for the vanishingly small percentage that have been unpleasant, we've had words.

Cost

Thanks to Cloudflare, running this site has not been expensive. We're still at about $6 a month to run. I'm running on the Hetzner ARM CAX11 instance class and have been really impressed with performance. Typically folks go with ARM-class instances for cost, but this thing has been a beast in terms of workload with zero ARM-specific issues I can point to. This mirrors my experience with AWS ARM instances, but in case you were considering doing the same thing, you can easily scale with even the cheapest instance.

Monetization

I've gotten a few offers to monetize the site, mostly either running ads (which for this audience would be a giant waste of time and ruin the visual of the site) or by running "promoted posts". After thinking it over I decided I don't want to earn any money on the site. It's fun to write, hopefully people enjoy reading it and I'm lucky enough to be at a point in my life where $10 a month is not a sum of money I miss.

If that ever changes in the future, I'll be sure to mark the posts as endorsed or paid for in some way so that nobody feels duped. But I haven't been interested so far.

Software

This is a Debian server that's initialized with cloud-init to set up a non-root user, install Docker and Docker compose, pull the Ghost images along with Nginx and then attach the volume. I also pull the Cloudflare certificate and insert that inside the Nginx container so I can have a long-running SSL certificate and let them handle the edge certificate rotation.

Previously I used Caddy in front of Ghost but did run into a few times when under load it seemed to struggle and required a restart. In general I had more problems than I expected with Caddy, which doesn't make it a bad webserver, but it is difficult to compete with the completely idiot-proof nature of Nginx. Plus since I'm not handling user-facing SSL certificates, the built-in SSL certificate functionality ended up not being much of a perk.

Ghost

As a platform Ghost has been pretty good with one exception, which is email newsletters. I'll touch on those later. It's held up well under load, with frequent updates. I don't use most of the admin panel which is really geared towards building an email newsletter business. I'm not interested in doing that, so a lot of the functionality is wasted on me.

However it is quite good at the basic stuff, which is you write text in their editor, hit publish and it goes out to the world. Most of the Dashboard stuff is pointless to me, with links to YouTubers and optimizing for SEO which I haven't done at all. Most of the new features they add has nothing to do with me and in retrospect I might have been better off with a static site.

In general though if you are interested in starting a blog with a focus on building a newsletter-based business, Ghost is great. For this it works well enough with some optimizations.

Email Newsletter

You may have noticed that the Subscribe button has disappeared. While I appreciate that people liked getting an email with the posts in it, the cost of sending emails exceeded the cost of hosting the rest of the platform by a lot. Ghost relies on Mailgun for sending newsletter and while it can use SMTP for transaction emails, the cost of Mailgun exceeds the value of what I get out of sending posts as newsletters. (If I post multiple times a month, we'd be looking at $90 a month for emails alone which is too rich for me). I also don't love having a database full of peoples names and email addresses in it, since the best way to prevent a data leak is to not have the data to begin with.

If anyone in the future complains I'll likely set this up: https://github.com/ItzNotABug/ghosler so I can use the much cheaper SMTP options. But so far the response to removing it has been small. For those reading this I would probably disable it from the get-go if started a new site OR set it up with SMTP from launch. Mailgun is too expensive for what it provides which was a pretty underwhelming user experience full of nested menus. (Insert my rant that transactional email API services are a scam business based on the false assertion that the reputation scores of senders are impossible to set up from scratch despite me having seen it done multiple times with IPv6 addresses).

However folks seem to be using RSS successfully, which is great. Some homegrown clients aren't intelligently checking whether there are new feeds or not, simply grabbing the entire RSS feed with every check. It's not a crisis by any means, but if that is you, maybe add a check for "is pubDate not todays date for the latest entry and if not, then maybe don't pull down the entire feed". But in general I strongly prefer RSS because the cost per user is extremely small and there are no personal data concerns around it. You are in control of your own feeds.

It does suck that less technical people still seem to struggle to find a functional RSS reader. It's still an unsolved problem as far as I can tell. I have many I like and recommend, but I constantly hear how hard it is to get "set up". If that's you, maybe check out Feedrabbit: https://feedrabbit.com/ to get RSS to emails.

Downsides

  • I don't love the Ghost team dumping sqlite for MySQL 8. Especially because there is no real scaling options here. So I'm not sure what the perk is for moving away from sqlite towards MySQL 8 if we're never going to be able to support multiple instances hosting the same site.
  • A lot of the technical work lately seems more in the direction of the headless CMS route, which is fine but does nothing for me.
  • Editor bugs. I get a lot of them. Markdown will stop working then resume working with a new update. Sometimes commands like /image will trigger the behavior I expect and sometimes it won't. The whole thing is a bit mysterious.

Nginx

Nothing to report here, just the official Nginx docker image with the following config. I have Authenticated Origin Pull set up so that I know all my traffic is coming from Cloudflare.

There's a couple of things happening here.

  • We have the proxy_cache setup so that Nginx can assist with any massive spikes in traffic.
  • This config attempts to both force SSL connections with the Strict-Transport-Security "max-age=63072000; includeSubdomains"; and also cache the SSL session parameters.
  • This is effectively my all-purpose Nginx configuration that I use for a lot of different things. There are a few sub-optimal things here (I don't think you need to do ssl_ciphers if you remove TLSv1) but in general this has been a pretty battle-tested config.
map $sent_http_content_type $expires {
    default                    off;
    text/css                   max;
    application/javascript     max;
    ~image/                    max;
}

server {
    listen 80;
    listen [::]:80;
    server_name matduggan.com www.matduggan.com;
    return 302 https://$server_name$request_uri;
}
proxy_cache_path /tmp/cache  levels=1:2    keys_zone=STATIC:10m inactive=24h  max_size=1g;
client_max_body_size 100M;

server {

    # SSL configuration

    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    charset UTF-8;
    gzip on;
    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 5m;
    ssl_prefer_server_ciphers on;
    ssl_ciphers ECDH+AESGCM:ECDH+AES256:ECDH+AES128:DH+3DES:!ADH:!AECDH:!MD5;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_buffer_size 4k;
    add_header Strict-Transport-Security "max-age=63072000; includeSubdomains";
    ssl_certificate         /etc/ssl/cert.pem;
    ssl_certificate_key     /etc/ssl/key.pem;
    ssl_client_certificate /etc/ssl/cloudflare.crt;
    ssl_verify_client on;
    expires $expires;


    server_name matduggan.com www.matduggan.com;

    location / {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        add_header X-Cache-Status $upstream_cache_status;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Host $http_host;
        proxy_buffering        on;
        proxy_cache            STATIC;
        proxy_cache_valid      200  1d;
        proxy_cache_use_stale  error timeout invalid_header updating
                               http_500 http_502 http_503 http_504;
        proxy_pass http://127.0.0.1:8080;
        proxy_redirect off;
    }

    location ~ ^/(ghost/|p/)/ {
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;
        proxy_pass http://127.0.0.1:8080;
    }

}

Considering Starting a Site?

You should! If you run across this and need some help, let me know at https://c.im/@matdevdug. Glad to provide pointers.

I think that addresses all the questions I typically get. If I missed something let me know.


K8s Service Meshes: The Bill Comes Due

K8s Service Meshes: The Bill Comes Due

When you start using Kubernetes one of the first suggestions you'll get is to install a service mesh. This, of course, on top of the 900 other things you need to install. For those unaware, everything in k8s is open to everything else by default when you start and traffic isn't encrypted between services. Since encrypting traffic between services and controlling what services can talk to which requires something like a JWT and client certificates, teams aren't typically eager to take on this work even though its increasingly a requirement of any stack.

Infrastructure teams can usually implement a feature faster than every app team in a company, so this tends to get solved by them. Service meshes exploded in popularity as it became clear they were easy ways to implement enforced encryption and granular service to service access control. You also get better monitoring and some cool features like circuit breaking and request retries for "free". As the scale of deployments grew with k8s and started to bridge multiple cloud providers or a cloud provider and a datacenter, this went from "nice to have" to an operational requirement.

What is a service mesh?

Service-to-service communication before and after service mesh implementation

Service meshes let you do a few things easily

  • Easy metrics on all service to service requests since it has a proxy that knows success/failure/RTT/number of requests
  • Knowledge that all requests are encrypted with automated rotation
  • Option to ensure only encrypted requests are accepted so you can have k8s in the same VPC as other things without needing to do firewall rules
  • Easy to set up network isolation at a route/service/namespace level (great for k8s hosting platform or customer isolation)
  • Automatic retries, global timeout limits, circuit breaking and all the features of a more robustly designed application without the work
  • Reduces change failure rate. With a proxy sitting there holding and retrying requests, small blips don't register anymore to the client. Now they shouldn't anyway if you set up k8s correctly but its another level of redundancy.

This adds up to a lot of value for places that adopt them with a minimum amount of work since they're sidecars injected into existing apps. For the most part they "just work" and don't require a lot of knowledge to keep working.

However, it's 2024 and stuff that used to be free isn't anymore. The free money train from VCs has ended and the bill has come due. Increasingly, this requirement for deploying production applications to k8s is going to come with a tax that you need to account for when budgeting for your k8s migration and determining whether it is worth it. Since December 2023 the service mesh landscape has changed substantially and it's a good time for a quick overview of what is going on.

NOTE: Before people jump down my throat, I'm not saying these teams shouldn't get paid. If your tool provides real benefits to businesses it isn't unreasonable to ask them to materially contribute to it. I just want people to be up to speed on what the state of the service mesh industry is and be able to plan accordingly.

Linkerd

My personal favorite of the service meshes, Linkerd is the most idiot proof of the designs. It consists of a control plane and a data plane with a monitoring option included. It looks like this:

Recently Linkerd has announced a change to their release process, which I think is a novel approach to the problem of "getting paid for your work". For those unaware, Linkerd has always maintained a "stable" and an "edge" version of their software, along with an enterprise product. As of Linkerd 2.15.0, they will no longer publish stable releases. Instead the concept of a stable release will be bundled into their Buoyant Enterprise for Linkerd option. You can read the blog post here.

Important to note that unlike some products, Linkerd doesn't just take a specific release of Edge and make it Enterprise. There are features that make it to Edge that never get to Enterprise, Stable is also not a static target (there are patch releases to the Stable branch as well), so these are effectively three different products. So you can't do the workaround of locking your org to specific Edge releases that match up with Stable/Enterprise.

Pricing

Update: Linkerd changed their pricing to per-pod. You can see it here: https://buoyant.io/pricing. I'll leave the below for legacy purposes but the new pricing addresses my concerns.

Buoyant has selected the surprisingly high price of $2000 a cluster per month. The reason this is surprising to me is the model for k8s is increasingly moving towards more clusters with less in a cluster, vs the older monolithic cluster where the entire company lives in one. This pricing works against that goal and removes some of the value of the service mesh concept.

If the idea of the Linkerd team is that orgs are going to stick with fewer, larger clusters, then it makes less sense to me to go with Linkerd. With a ton of clusters, I don't want to think about IP address ranges or any of the east to west networking designs, but if I just have like 2-3 clusters that are entirely independent of each other, then I can get a similar experience to Linkerd with relatively basic firewall rules, k8s network policies and some minor changes to an app to encrypt connections. There's still value to Linkerd, but the per-cluster pricing when I was clearly fine hosting the entire thing myself before is strange.

$2000 a month for a site license makes sense to me to get access to enterprise. $2000 a month per cluster when Buoyant isn't providing me with dashboards or metrics on their side seems like they picked an arbitrary number out of thin air. There's zero additional cost for them per cluster added, it's just profit. It feels weird and bad. If I'm hosting and deploying everything and the only support you are providing me is letting me post to the forum, where do you come up with the calculation that I owe you per cluster regardless of size?

Now you can continue to use Linkerd, but you need to switch to Edge. In my experience testing it, Edge is fine. It's mostly production ready, but there are sometimes features which you'll start using and then they'll disappear. I don't think it'll matter for most orgs most of the time, since you aren't likely constantly rolling out service mesh upgrades. You'll pick a version of Edge, test it, deploy it and then wait until you are forced to upgrade or you see a feature you like.

You also can't just buy a license, you need to schedule a call with them to buy a license with discounts available before March 21st, 2024. I don't know about you but the idea of needing to both buy a license and have a call to buy a license is equally disheartening. Maybe just let me buy it with the corporate card or work with the cloud providers to let me pay you through them.

Cilium

Cilium is the new cool kid on the block when it comes to service meshes. It eliminates the sidecar container, removing a major source of failure in the service mesh design. You still get encryption, load balancing, etc but since it uses eBPF and is injected right into the kernel you remove that entire element of the stack.

SDN = software defined networking

You also get a LOT with Cilium. It is its own CNI, which in my testing has amazing performance. It works with all the major cloud providers, it gives you incredibly precise network security and observability. You can also replace Kube-proxy with cilium. Here is how it works in a normal k8s cluster with Kube-proxy:

Effectively Kube-proxy works with the OS filtering layer (typically iptables) to allow network communication to your pods. This is a bit simplified but you get the idea.

With the BPF Kube-proxy replacement we remove a lot of pieces in that design.

This is only a tiny fraction of what Cilium does. It has developed a reputation for excellence, where if you full adopt the stack you can replace almost all the cloud-provider specific pieces for k8s to a generic stack that works across providers at a lower cost and high performance.

the UI for seeing service relationships in Cilium is world-class

A Wild Cisco Appears

Cisco recently acquired Isovalent in December of 2023, apparently to get involved in the eBPF space and also likely to augment their acquisition of Splunk. Cilium provides the metrics and traces as well as generating great flow logs and Splunk ingests them for you. If you are on Linkerd and considering moving over to Cilium to avoid paying, you should be aware that with Cisco having purchased them the bill is inevitable.

You will eventually be expected to pay and my guess based on years of ordering Cisco licenses and hardware is you'll be expected to pay a lot. So factor that in when considering Cilium or migrating to Cilium. I'll go out on a limb here and predict that Cilium is priced as a premium multi-cloud product with a requirement of the enterprise license for many of the features before the end of 2024. I will also predict that Linkerd ends up as the cheapest option on the table by the end of 2024 for most orgs.

Take how expensive Splunk is and extrapolate that into a service mesh license and I suspect you'll be in the ballpark.

Istio

The overall architecture of an Istio-based application.

Istio, my least favorite service mesh. Conceptually Istio and Linkerd share many of the same ideas. Both platforms use a two-part architecture now: a control plane and a data plane. The control plane manages the data plane by issuing configuration updates to the proxies in the data plane. The control plane also provides security features such as mTLS encryption and authentication.

Istio uses Envoy proxies vs rolling their own like Linkerd and tends to cover more possible scenarios than Linkerd. Here's a feature comparison:

source

Istio's primary differences are that it supports VMs, runs its own Ingress Controller and is 10x the complexity of setting up any other option. Istio has become infamous among k8s infrastructure staff as being the cause of more problems than any other part of the stack. Now many of these can be solved with minor modifications to the configuration (there is absolutely nothing structurally wrong with Istio), but since a service mesh failure can be "the entire cluster dies", it's tricky.

The reality is Istio is free and open source, but you pay in other ways. Istio has so many components and custom resources that can interact with each other in surprising and terrifying ways that you need someone in your team who is an Istio expert. Otherwise any attempt to create a self-service ecosystem will result in lots of downtime and tears. You are going to spend a lot of time in Istio tracking down performance problems, weird network connectivity issues or just strange reverse proxy behavior.

Some of the earlier performance complaints of Envoy as the sidecar have been addressed, but I still hear of problems when organizations scale up to a certain number of requests per second (less than I used to). The cost for Istio, to me, exceeds the value of a service mesh most of the time. Especially since Linkerd has caught up with most of the traffic management stuff like circuit breaking.

Consul Connect

The next service mesh we'll talk about is Consul Connect. If Istio is highly complicated to set up and Linkerd is easiest but fewest knobs to turn, Consul sits right in the middle. It has a great story when it comes to observability and has performance right there with Linkerd and superior to Istio.

Consul is also very clearly designed to be deployed by large companies, with features around stability and cross-datacenter design that only apply to the biggest orgs. However people who have used it seem to really like it, based on the chats I've had. The ability to use Terraform with Consul with its Consul-Terraform-Sync functionality to get information about services and interact with those services at a networking level is massive, especially for teams managing thousands of nodes or where pods need strict enforced isolation (such as SaaS products where customer app servers can't interact).

Pricing

Consul starts at $0.027 an hour, but in practice your price is gonna be higher than that. It goes up based on how many instances and clusters you are running. It's also not available on GCP, just AWS and Azure. You also don't get support with that, seemingly needing to upgrade your package to ask questions.

I'm pretty down on Hashicorp after the Terraform change, but people have reported a lot of success with Consul so if you are considering a move, this one makes a lot of sense.

Cloud Provider Service Meshes

GCP has Anthos (based on Istio) as part of their GKE Enterprise offering, which is $.10/cluster/hour. It comes with a bunch of other features but in my testing was a much easier way to run Istio. Basically Istio without the annoying parts. AWS App Mesh still uses Envoy but has a pretty different architecture. However it comes with no extra cost which is nice.

App Mesh

AWS App Mesh is also great for orgs that aren't all-in for k8s. You can bridge systems like ECS and traditional EC2 with it, meaning its a super flexible tool for hybrid groups or groups where the k8s-only approach isn't a great fit.

Azure uses Open Service Mesh which is now a deprecated product. Despite that, it's still their recommend solution according to a Google search. Link

Once again the crack team at Azure blows me away with their attention to detail. Azure has a hosted Istio add-on in preview now and presumably they'll end up doing something similar to GKE with Anthos. You can see that here.

What do you need to do

So the era of the free Service Mesh is coming to a close. AWS has decided to use it as an incentive to stay on their platform, Linkerd is charging you, Cilium will charge you At Some Point and Consul is as far from free now as it gets. GKE and Azure seem to be betting on Istio where they move the complexity into their stack, which makes sense. This is a reflection of how valuable these meshes are for observability and resilience as organizations transition to microservices and more specifically split stacks, where you retain your ability to negotiate with your cloud provider by running things in multiple places.

Infrastructure teams will need to carefully pick what horse they want to back moving forward. It's a careful balance between cloud lock-in vs flexibility at the cost of budget or complexity. There aren't any clear-cut winners in the pack, which wasn't true six months ago when the recommendation was just Linkerd or Cilium. If you are locked into either Linkerd or Cilium, the time to start discussing a strategy moving forward is probably today. Either get ready for the bill, commit to running Edge with more internal testing, or brace yourself for a potentially much higher bill in the future.


Python Dependencies Are Fixable

Python Dependencies Are Fixable

I like Python. I've had a lot of success with it on projects large and small. It is fast enough for most of the things I need and when it isn't, migrating from it to a more performant language isn't challenging. The depth of the standard library has been incredible for stable long-living code. However the one thing I hear often when discussing Python with younger programmers is well the dependency management is so bad I wouldn't bother with the language. Lately it seems the narrative is now evolving into "it is so broken that we need a new package system to fix it", which to me is the programming version of Spock dying in the warp core. Let's make absolutely sure we have no other options.

Kirk (William Shatner) bids farewell to Spock (Leonard Nimoy) in the emotional finale of Star Trek II: The Wrath of Khan. (Photo: Paramount/Courtesy Everett Collection)

The problem here isn't one of engineering. We have all the solutions to solve this problem for the majority of users. The issue is an incorrect approach to managing defaults. Pip, like many engineering-led projects, doesn't account for the power of defaults. Engineers tend towards maintaining existing behavior and providing the tooling for people to do it correctly. That's the wrong mentality. Experts who drive the project should be adjusting the default behavior to follow best practices.

Defaults are so important and I think so intimidating to change that this decision has been pushed back for years and years. If we have a better user experience for people and we know this is what they should be using, we should not expect users to discover that best way on their own. You have to make them opt out of the correct flow, not discover and opt in to the right way to do things. Change is scary though and maintainers don't have a massive corporate structure to hide behind. Whatever ire the change generates isn't directed at Faceless Corporation PR, it's directly at the people who make the decision.

Golang taught us this lesson. I work a lot with Golang at work and on some side projects. It is an amazing language to show the power of defaults and the necessity of experts pushing users forward. Golang code at every job looks like code at every other job, which is the direct result of intentional design. Shipping gofmt bundled in with the language increased the quality and readability of golang everywhere. Decentralizing dependency management became a "of course" moment for people when they tried it. Keep the language simple in the face of demands for increased complexity has preserved the appeal. The list goes on and on.

Pypa needs to push the ecosystem forward or give up on the project and officially endorse a new approach. Offering people 400 options is destroying confidence in the core language. The design mentality has to change from "it isn't a problem if there is a workaround" to the correct approach which is for most users the default is the only option they'll ever try.

Why it isn't that broken

Why do I think that we don't need to start fresh? Here's the workflow I use, which is not unique to me. I start a new Python repo and immediately make a venv with python -m venv venv. Then I activate it with source /venv/bin/activate and start doing whatever I want. I write all my code, feel pretty good about it and decide to lock down my dependencies.

I run pip freeze > requirement.in which gives me all the packages I have installed with their versions. It's 2024 so I need more security and confidence than a list of packages with a version number. The easiest way to get that is with package hashes, which is easy to do with pip-tools. pip-compile --generate-hashes requirements.in outputs a requirements.txt with the hashes I want along with the dependencies of the packages.

build==1.0.3 \
    --hash=sha256:538aab1b64f9828977f84bc63ae570b060a8ed1be419e7870b8b4fc5e6ea553b \
    --hash=sha256:589bf99a67df7c9cf07ec0ac0e5e2ea5d4b37ac63301c4986d1acb126aa83f8f
    # via
    #   -r requirements.in
    #   pip-tools
cachetools==5.3.2 \
    --hash=sha256:086ee420196f7b2ab9ca2db2520aca326318b68fe5ba8bc4d49cca91add450f2 \
    --hash=sha256:861f35a13a451f94e301ce2bec7cac63e881232ccce7ed67fab9b5df4d3beaa1
    # via
    #   -r requirements.in
    #   google-auth
certifi==2023.11.17 \
    --hash=sha256:9b469f3a900bf28dc19b8cfbf8019bf47f7fdd1a65a1d4ffb98fc14166beb4d1 \
    --hash=sha256:e036ab49d5b79556f99cfc2d9320b34cfbe5be05c5871b51de9329f0603b0474
    # via
    #   -r requirements.in
    #   aioquic
    #   aioquic-mitmproxy
    #   mitmproxy
    #   requests

Now I know all the packages I have, why I have the packages I have and also the specific hashes of those packages so I don't need to worry about supply chain issues. My Dockerfile is also pretty idiot-proof.

FROM python:3.12-slim

# Create a non-root user
RUN groupadd -r nonroot && useradd -r -g nonroot nonroot
WORKDIR /app

COPY requirements.txt .

RUN pip3 install -r requirements.txt

COPY . .

RUN chown -R nonroot:nonroot /app

USER nonroot

ENTRYPOINT ["./gunicorn.sh"]

Yay its running. I can feel pretty confident handing this project over to someone new and having them run into minimal problems getting all of this running. Need to check for updates? Not a big deal.

pip-review
Flask==3.0.2 is available (you have 3.0.0)
Jinja2==3.1.3 is available (you have 3.1.2)
MarkupSafe==2.1.5 is available (you have 2.1.3)
pip==24.0 is available (you have 23.3.1)

Basically if you know the happy path, there are no serious problems here. But you need to know all these steps, which are documented in random places all over the internet. How did we get here and what can be done to fix it?

Why do people think it is so bad

What is the combination of decisions that got us to this place? Why is the average users opinion so low? I think its everything below.

  • Bare pip sucks for daily tasks. We can't declare a minimum version of Python, we don't get any information as to dependency relationships in the file, we don't have a concept of developer dependency vs production dependency, we don't have hashes so we're very open to upstream attacks, it's slow, it's not clear how to check for updates, there's no alert for a critical update, the list goes on and on.
    • What pip-compile does should always be the minimum. It should have been the minimum years and years ago.
    • Where pip shines is the range of scenarios it covers and backwards compatibility. We don't want to throw out all that work if we can avoid it to switch to a new package manager unless the situation is unfixable. To me the situation is extremely fixable, but we need to change defaults.
  • People used Python as a bash replacement. This was a weird period where, similar to Perl, there was an assumption that Python would be installed on any system you were working with and so you could write Python scripts to do things and then package them up as Linux packages. If your script had dependencies, you would also pull those in as Linux packages.
    • To be blunt, this dependency management system never should have been allowed. It caused no end of confusion for everyone and ended up with people using super old packages. If your Python application had dependencies, you should have included them.
    • Starting to write Python in Linux and then running apt-get install requests but then later being told to use pip and remove the package even though packages are how you get software in Linux has thrown off beginners as long as I have been doing this job.
  • The nature of dependencies has changed and how we think of including third-party software has evolved. I was shocked when I started working with NodeJS teams at how aggressively and (frankly) recklessly they would add dependencies to a project. However NPM and Node are designed around that model of lots of external dependencies and they've adopted a lot of things that people have come to expect
    • The package.json, package-lock.json and node_modules directory as a consistent design across all projects is huge. It completely eliminated confusion and ensures you can have super-easy project switching along with reproducible environments.
    • Node defaulting to per-project and not global is what Python should have switched over to years ago. Again, this is just what people expect when they're talking about having lots of dependencies.
    • People have a lot more dependencies. When I started in this field, the idea of every project adding a 66 MB dependency with boto would have been unthinkable. Not because of disk space, but because its just so much code to bring into a project. However now people don't even blink at adding more libraries. pip was designed in a world where requirements.txt were 10 lines. Now we could easily be talking 200.
    • If we're not going to switch over to per-project dependencies, then at the very least you need to switch to venv as a default. I don't care how you do it. Make a file that sits at the top level of a directory that tells Python we're using a venv. Have it check for the existence of a folder and if it exists use it by default, you gotta have something a bit easier here.
    • Why this isn't a crisis is this is effectively a basic .profile fix
   cd() {
       builtin cd "$@"
       if [ -f "venv/bin/activate" ]; then
           source venv/bin/activate
       fi
   }
  • Finally people think its bad because Golang and Rust exist. Expectations for how dependency management evolved in the space. Work has been done to expand pip to meet more of these expectations but we're still pretty far.

Where to go from here

Anyone familiar with the Apple ecosystem will know the term "Sherlocking". It's where Apple monitors the ecosystem of third-party apps and will periodically copy one and make it part of the default OS. While unfair at times to those third parties, it's a clever design from Apple's perspective. They can let someone else do all the work of figuring out what users like and don't like, what designs succeed or fail on their platform and then swoop in when there is general consensus.

pip needs to do some Sherlocking. Pypa has already done a ton of hard engineering work. We have the ability to create a more secure, easier to debug dependency management system with existing almost-stock tooling. It doesn't require any fundamental changes to the ecosystem, or the investment of a lot of engineering effort.

What it requires is being confident enough in your work to make a better experience for everyone by enduring a period of some complaints. Or its time to give up and endorse something like uv. Sitting and waiting for the problem to resolve itself through some abstract concept of community consensus has been a trainwreck. Either make the defaults conform to modern expectations or warn users when they run pip this is a deprecated project and they should go install whatever else.

Questions/comments/concerns: https://c.im/@matdevdug


Tech Support Stories Part 2

Tech Support Stories Part 2

Since folks seemed to like the first one, I figured I would do another one. These are just interesting stories from my whole time doing IT-type work. Feel free to subscribe via RSS but know that this isn't the only kind of writing I do.

Getting Started

I grew up in a place that could serve as the laboratory calibration small town USA. It had a large courthouse, a diner, one liquor store infamous for serving underage teens and a library. When I turned 12 my dad asked if I wanted to work for free for a local computer shop. My parents, like girlfriends, friends and a spouse in the future would be, were worried about the amount of time I was spending in the basement surrounded by half-broken electronics.

The shop was on the outskirts of town, a steel warehouse structure converted into offices. It was a father and son business, the father running the counter and phones with the son mostly doing on-site visits to businesses. They were deeply religious, members of a religion where church on Sunday was an all-day affair. Despite agreeing to let me work there for free, the son mostly didn't acknowledge that I was there. He seemed content to let me be and focus on his dream of setting up a wireless ISP with large microwave radio links.

Bill was put in charge of training me. He had been a Vietnam veteran who had lost a leg below the knee in the war. His favorite trick was to rotate the fake leg 180 degrees up and then turn his chair around when kids walked in, laughing as they ran away screaming. He had been a radio operator in the war and had spent most of his career working on radio equipment before getting this computer repair job as "something to keep myself busy". I was put to work fixing Windows 98 and later XP desktop computers.

This was my introduction to "Troubleshooting Theory" which Bill had honed over decades of fixing electronics. It effectively boiled down to:

  • Ask users what happened before the failure.
  • Develop a theory of the failure and a test to confirm.
  • Record the steps you have taken so you don't repeat them
  • Check the machine after every step to ensure you didn't make it worse.
  • Software is unreliable, remove it as a factor whenever possible.
  • Document the fix in your own notes.
  • If you make the problem worse in your testing, walk away for a bit and start from the top. You are probably missing something obvious.

Nothing here is revolutionary but the quiet consistency of his approach is still something I use today. He was someone who believed that there was nothing exceptional in fixing technology but that people are too lazy to read the instruction manual. I started with "my PC is slow" tickets, which are basically "every computer that comes in". Windows 98 had a lot of bizarre behavior that was hard for normal users to understand. This was my first exposure to "the Registry".

The Registry

For those of you blessed to have started your exposure to Windows after the era of the registry, it was a hierarchical database used in older Windows versions that stored the information necessary to configure the system. User profiles, what applications are installed, what icons are for what folders, what hardware is in the system, it was all in this database. This database became the source of truth for everything in the system and also the only way to figure out what the system actually thought the value of something was.

The longer a normal person used a Windows device, the more cluttered this database becomes. Combined with adding and deleting files creating fragmentation on the spinning rust drives and you would get a constant stream of people attesting that their machine was slower than it was before. The combination of some Registry Editor work to remove entries and de-fragmentation would buy you some time, but effectively there was a ticking clock hanging over every Windows install before you would need to reinstall.

In short order I learned troubleshooting Windows was a waste of time. Even if you knew why 98 was doing something, you rarely could fix it. So I would just run assembly lines of re-installs, backing up all the users files to a file-share and then clicking through the 98 install screen a thousand times. It sounds boring now but I was thrilled by the work, even though copying lots of files off of bogged down Windows 98 machines was painfully, hilariously slow.

Since Bill believed in telling people they were (effectively) stupid and had broken their machines through an inability to understand simple instructions, I took over the delicate act of lying to users. A lot of Windows IT work is lying to people on the phone trying to walk a delicate line. You can't blame the software too much because we still want them to continue buying computers, but at the same time you don't want to tell the truth which was almost always "you did something wrong and broke it". I felt the lying in this case was practically a public service.

As time went on I graduated to hardware repairs, which was fascinating in that era. Things like "getting video to output onto a monitor" or "getting sound to come out of the sound card" were still minor miracles that often didn't work. Hardware failures were often showing up with blown capacitors. I lived on Bill's endless supply of cups of noodles, sparkling water bottles and his incredibly collection of hot sauce. The man loved hot sauce, buying every random variation he could find. His entire workstation was lined with little bottles of threatening-sounding sauces.

The hardware repairs quickly became my favorite. Bill taught me how to solder and I discovered most problems were pretty easy to fix. Capacitors of this time period were, for whatever reason, constantly exploding. Often even expensive components could be fixed right up by replacing a fan, soldering a new capacitor on or applying thermal paste correctly. Customers loved it because they didn't need to buy totally new components and I loved it because it made me feel like a real expert (even though I wasn't and this was mostly visual diagnosis of problems).

When Windows XP started to be a thing was the first time I felt some level of frustration. XP was so broken when it came out that it effectively put us underwater as a team. After awhile I felt like there wasn't much else for me to do in this space. Windows just broke all the time. I wasn't really getting better at fixing them, because there wasn't anything to fix. As Dell took over the PC desktop market in the area, everything from the videocard to the soundcard were on the motherboard, meaning all repairs boiled down to "replace the motherboard".

That was the end of my Windows career. I sold my PC gear, bought an iBook and from then on I was all-in on Mac. I haven't used Windows in any serious way since.

High School CCNA

While I was in high school, Cisco offered us this unique course. You could attend the Cisco Academy inside of high school, where you would study and eventually sit for your CCNA certification. It was a weird era where everyone understood computers were important to how life was going to work in the future but nobody understood what that meant. Cisco was trying to insert the idea that every organization on earth was going to need someone to configure Cisco routers and switches.

So we went, learned how to write Cisco configurations, recover passwords, reset devices and configure ports. Switches at this point were painfully simple, with concepts like VPNs working but not fully baked ideas. These were 10/100 port switches with PoE and had most of the basic features you would expect. It was a lot of fun to have a class where we would go down there and effectively mess with racks of networking equipment to try and get stuff to work. We'd set up DHCP servers and try to get anything to be able to talk to anything else on our local networks.

We mostly worked with the Cisco Catalyst 1900 which are models I would see well past their end of life in offices throughout my career. This class introduced me to a lot of the basic ideas I still use today. Designing network topology, the OSI model, having VLANs span switches, how routing tables work, IPv4 subnetting, all these concepts were introduced to me here and laid a good foundation for all the work I was to do later. More than the knowledge though, I appreciated the community.

This was the first time I discovered a group of people with the same interests and passions as me. Computer nerd was still very much an insult during this period of time, when admitting you enjoyed this sort of stuff opened you up to mocking. So you kinda didn't mention how much time you spent taking apart NESs from garage sales or you'd invite just a torrent of abuse. However here was a place where we could chat, compare personal projects and troubleshoot. I looked forward to it the 2 days a week I had the class.

To be clear, this was not a rich school. I grew up in a small town in Ohio whose primary industries were agriculture and making the Etch-A-Sketch. Our high school was full of asbestos to the extent that we were warned not to touch the ceiling tiles lest the dust get on us. The math teacher organized a prayer circle around the flagpole every morning as close to violating the Supreme Court ruling on prayer in school as he could get without actually breaking it. But somehow they threw this program together for a few years and I ended up benefiting from it.

The teacher also had contacts with lots of programmers and tech workers, which was the first time I had ever had contact with people in the tech field. They would come into this class and tell us what it was like to be a programmer or a network engineer at this time. It really opened my eyes to what was possible, since people in my life still made fun of the idea of "working with computers". Silicon Valley to people in the Midwest was long-haired hippies playing hacky sack, not doing actual work. These people looked way too tired to be accused of not doing real work.

Mostly though I appreciated our teacher, Mr. Bohnlein. The teacher was an old-school nerd who had been programming since the 70s. He had been a high school teacher for decades but a very passionate Mac user in his personal life. I remember he was extremely skilled at letting us fail for a long time while still giving us hints towards the correct solution. When it came time to take the test, I sailed through it thanks to his help. The students in the class used to make fun of him for his insistence on buying Apple stock. We all thought the company was going to be dead in the next 5 years. "The iPod is the inferior MP3 player" I remember stating very confidently.

He retired comfortable.

Playboy

One call I would get from time to time was to the Chicago Playboy office. This office was beautiful, high up overlooking the water with a very cool "Mad Men" layout. The writers and editors were up on a second level catwalk, with little "pod" offices that had glass walls. They dressed great and were just very stylish people. I was surprised to discover so many of the photographers were female, but I mostly didn't interact with them.

Playboy was on the top floors

The group I did spend time with was the website team, which unfortunately worked in conventional cubicles facing a giant monitor showing a dashboard of the websites performance and stats. I remember that the carpet was weirdly shaggy and purple, which made finding screws when I dropped them tricky. Often I had to wave a magnet over the ground and hope it sucked up the screw I had lost. The website crew was great to work with, super nice, but the route to their offices involved going by just mountains of Playboy branded stuff.

It was just rack after rack of Playboy clothes, lighters, swimsuits, underwear, water bottles. Every single item you could imagine had that rabbit logo on it. You see it a lot around, but I've never seen it all piled up together. Beyond that was a series of photo studios, with tons of lighting and props. I have no idea if they shot the content for the magazine there (I never saw anyone naked) but it seemed like a lot of the merchandise promo photos were shot there. The photo pipeline was a well-oiled machine, with SD cards almost immediately getting backed up onto multiple locations. They had editing stations right by the photo shooting areas and the entire staff was zero-nonsense.

The repairs were pretty standard stuff, mostly iMac and Mac Pro fixes, but the reason it stood out to me was the weird amount of pornography they tried to give me. Magazines, posters, a book once (like an actual hardcover photo book) which was incredibly nice of the IT guy I worked with, but felt like a strange thing to end a computer repair session with. He would give these to me in a cubicle filled with things made of animals. He had an antler letter opener, wore a ring that looked like it was made out of bone or antler along with a lot of photos of him holding up the heads of dead animals.

The IT field and the gun enthusiast community has a lot of overlap. It makes sense, people who enjoy comparing and shopping for very specific equipment that has long serial number-type names along with weirdly strong brand allegiances. I had no particularly strong stance on hunting guns, having grown up in a rural area where everyone had a shotgun somewhere in the house. As a kid it was common for every visit to a new house to involve being warned to stay away from the gun cabinet. However hunting stories are a particular kind of boring, often beginning with a journey to somewhere I would never want to go and a lot of details I don't need. "I was debating between bringing the Tikka T3 and the Remington 700 but you know the recoil on the T3x is crazy". "Obviously it's a three-day drive from the airport to the hunting area in nowhere Texas but we passed the time talking about our favorite jerky".

I often spent this time trapped in cubicles or offices thinking about these men suddenly forced to fight these animals hand to hand. Are deer hard to knock out with your fists? Presumably they have a lot of brain protection from all the male jousting. I think it would quickly become the most popular show on television, just middle-aged IT managers trying to choke a white-tailed deer as it runs around an arena. We'd sell big steins of beer and turkey legs, like Medieval Times, for spectators. You and a date would crash the giant glasses together and cheer as people run for their lives from a moose.

Once after a repair session, while waiting for the L, I tripped and some of the stuff in my bag spilled out. This woman on the platform looked down at just a thick stack of porn magazines sliding over the platform and then at me. I still think about what she must have thought about me. It's not just that I had a Playboy, but like 6, as if I was one of the secret sexual deviants you read about on the news. "He looked like a normal person but everywhere he went he had a thick stack of porn."

Shedd Aquarium

One of my favorite jobs in the entire city was the Shedd aquarium. I would enter around the side by the loading dock, which is also where many of the animals would come in through. Almost every morning there would be just these giant containers of misc seafood for the animals packed into the loading dock. It was actually really nice to see how high quality it was, like I've eaten dodgier seafood than what they serve the seals at Shedd.

It did make me laugh when you'd see the care and attention that went into the food for the animals and then you'd go by the cafeteria and see kids sucking down chicken nuggets and diet coke. But it was impossible not to be charmed by the intense focus these people had for animals. I used to break some of the rules and spy on the penguins, my favorite animals. There is something endlessly amusing about seeing penguins in non-animal places. Try not to smile at penguins walking down a hallway, it's impossible.

The back area of the aquarium feels like a secret world, with lots of staircases going behind the tanks. Often I would be in a conversation and look through the exhibit, making eye contact with a guest on the other side of the water. It was a very easy place to get lost, often heading down a series of catwalks and down a few stairs to a random door. Even after going there a few times, I appreciated an escort to ensure I didn't head down a random hallway and into an animal area or accidentally emerge in front of a crowd of visitors.

The offices were tucked away up here overlooking the water show

I worked with the graphic design team that was split between the visuals inside the aquarium and their ad campaigns. It was my introduction to concepts like color calibration and large format printing, The team was great and a lot of fun to work with, very passionate about their work. However one part of their workflow threw me off a lot at first. Fonts.

FontExplorer X Pro 7.0.1 - Download for Mac Free
Spent a lot of time figuring out how this software worked

So I, like many people, had not spent a lot of time thinking about the existence of fonts. In school I wrote papers exclusively in Times New Roman for some reason that was never explained to me. However in design teams buying and licensing fonts for each employee and project was a big deal. The technology that most places used at the time to manage these fonts were FontExplorer X Pro, which had a server and client side.

Quickly I learned a lot about fonts because debugging font issues is one of the stranger corners of technical troubleshooting. First some Adobe applications hijacked the system font directories, meaning even if you had injected the right font in the user directory they might not appear. Second fonts themselves were weird. TrueType fonts, which is the older format and the one a lot of these companies still dealt with, are at their lowest level "a sequence of points on a grid". You combine curves and straight lines into what we call a glyph.

Most of the fonts I worked with had started out with the goal of printing on paper. Now many of those were being repurposed for digital assets as well as printing on paper, which introduced no end of weirdness. Here are just a few of the things I tried to help with:

  • Print and screen use different color modules
  • DPI for print and PPI for digital aren't the same
  • No screen is the same. The differences between how a digital asset looked on a nice screen vs a cheap screen wasn't trivial, even if we tried to color calibrate both

In general though I liked working with designers. They often knew exactly what they wanted to get out of my technical assistance, providing me with a ton of examples of what was wrong. Their passion for the graphic design work they were doing inside the aquarium and outside was clear with everyone I spoke with. It's rare to find a group of people who truly enjoys their jobs.

My primary task though was managing and backing up the Mac servers onto tape. For those who haven't used tape backups, it's a slow way to backup a lot of data that requires a lot of testing of backups (along with a good storage system for not confusing people). I quickly came to despise running large-scale tape backups. The rate of errors discovered when attempting to restore backups as a test was horrifying.

The tape backup was overall a complete fucking disaster. There were two tape drives from IBM and way too often a tape written by one drive wouldn't be readable by the other one. The sticker system used to track the tape backups got messed up when I went on vacation and when I came back I couldn't make heads or tails over what had happened. Every week I stopped by and basically tried anything I could think of to get the tape backup to work correctly every time.

Then I did something I'm not proud of. The idea of them calling me and telling me all their hard work was gone was keeping me up at night. So without telling them, I stuck an external 3.5 drive with as much storage as I could afford behind the server tucked away and started copying everything to the tapes and the drive. The IT department had vetoed this idea before but I did it without their permission and basically bet the farm that if the server drives failed and the tape didn't restore, they'd forgive me for making another onsite backup.

I found out years later that their IT found the drive, assumed they had installed it and switched over to backing up on disks in a Drobo since it was much easier to keep running.

United Airlines

Another frequent customer of mine was United Airlines. They had a suburban office which remains the strangest designed office I've ever been in. There was a pretty normal lobby, with executive offices upstairs, a cafeteria and meeting room on the ground floor and then a nuclear bunker style basement. Most of the offices I went to were in the basement along cement corridors so long that they had those little carts with the orange flashing lights zooming down there. It sort of felt like you were at the airport. You could actually ask for a ride on the carts and get one, which I only did once but it was extremely fun.

The team that asked for technical support the most was the design team for the in-flight magazine, Hemispheres. They were all-Mac and located in a side room with no windows in this massive basement complex. So you'd go into just this broiling hot little room with Mac Pros humming along and zero airflow. The walls were brown, the carpet was brown, it was one of the least pleasant offices I've ever been in. Despite working for an in-flight magazine, these people were deadly serious about their work and had frequent contact with Apple about suggested improvements to workflows or tooling.

It was, to be frank, a baffling job. The United Airlines IT didn't want anything to do with the Macs, so I was there to do extremely normal things. I'm talking about applying software updates, install Adobe products, things that anyone is capable of doing without any help. I'd often be asked to wait in a conference room for hours until someone remembered I was there and would ask me to do something. Their internet was so slow I would download the Mac updates at home and bring them into the compound on a hard drive. I've never seen corporate internet this slow in my life.

It wasn't the proudest I've ever been of a job but I was absolutely broke. So I would spend hours watching the progress bar tick by on Mac updates and bill them for it. I tried to do anything to fill the time. I wrapped cables in velcro, refilled printers, reorganized ethernet cables. It was too lucrative for me to walk away but it was also the most bored I've ever been in my life. I once emptied the recycling for everyone just to feel like I had done something that day, only to piss off the janitor. "What, is this your job?" he shouted as I handed him back the recycling bin.

The thing I remember the most was how impossibly hard it was to get paid. You would need to go to the end of this hallway, which had an Accounts Payable window slot with an older woman working there. Then you would physically submit the invoice to her, she would take it and put it in an old-timey wooden invoice tracking system. I'm talking sometimes months from when I submitted the invoice to when I got paid. I would borderline harass this woman, asking her on the way to the bathroom like "hey any chance I could get paid before Christmas? I gotta get the kids presents this year."

I didn't have kids, but I figured it sounded more convincing. I shouldn't have bothered with the lie, she looked at me with zero expression and resumed reading a magazine. At this point I was so poor that I had a budget of $20 a day, so waiting months to get paid by United put me in serious risk of not being able to pay my rent. In the end I learned a super valuable lesson about working for giant corporations. It's a great way to get paid as long as time was no object, but it's a dangerous waiting game to play.

Schools

Colleges hiring me to come out and do specific jobs wasn't uncommon. Setting up a media lab was probably the most common request, where I would show up, set up a bunch of Mac Pros with fiber and an Xserve somewhere to store the files. This was fine work, but it wasn't very exciting and typically involved a lot of unboxing stuff and figuring out how to run fiber. The weirdest niche I found myself in was somehow I became the go-to person for Jewish schools in the Chicago suburbs.

It started with Ida Crown Jewish Academy in Skokie, IL. I went in to fix a bunch of white MacBooks and iMacs and while I was there I showed the teachers how to automate some of their tasks with Automator.

Automator was a drag and drop automation tool that let you effectively write scripts to do certain tasks. I showed them how to automate some of the grading process with CSVs and after that I became the person they always called. Soon after, I started getting calls for all the Jewish schools in the area. To be clear there are not a lot of these schools and they are extremely small.

On average I'd say somewhere around 200-300 students in each school. Also they had pretty intense security, probably the most I'd seen at a high school before or since. To be honest I don't know why they picked me as the Mac IT guy, I don't have any particular feelings about the Jewish faith. The times when the schools staff would ask questions about my faith, they seemed pleased by my complete lack of interest in the topic. As someone who grew up with Christian fundamentalist cults constantly trying to recruit me, I appreciated them dropping it and never mentioning it again.

I loved these jobs because the schools were well organized, the staff knew everyone and they had a list of specific tasks for me when I showed up. Half my life doing independent IT was sitting in waiting rooms while the person who hired me to show up actually came and got me, so this was delightful. I started doing more "teacher automation" work, which was mostly AppleScript or Automator doing the repetitive tasks that these people were staying late to get done.

It wasn't until one of the schools offered me a full-time job that I realized my time in IT was coming to a close. The automation and writing AppleScript was so much more fun than anything I was doing related to Active Directory or printers. It had started to become more clear with the changes Apple was making that they were less and less interested in the creative professional space, which was my bread and butter. This school was super nice, but I knew if I started working here I would be here forever and it was too boring to do forever.

That's when I started transitioning to more traditional Linux sysadmin work. But I still think back fondly of a lot of those trips around Chicago.

Questions/comments/concerns: https://c.im/@matdevdug


Typewriters and WordPerfect

The first and greatest trick of all technology is to make words appear. I will remember forever the feeling of writing my first paper on a typewriter as a kid. The tactile clunk and slight depression of the letters on the page made me feel like I was making something. It transformed my trivial thoughts to something more serious and weighty. I beamed with pride when I would be the only person who would hand in typed documents instead of the cursive of my classmates.

I learned how to type on the schools Brother Charger 11 typewriter, which by the time I got there were one step away from being thrown away. It was one of the last of its kind, being a manual portable typewriter before electric typewriters took over the entire market. Our typing teacher was a nun who had learned how to type on them and insisted they be what we tried first. Typewriters were heavy things, with a thunk and a clang going along with almost anything you did.

Despite being used to teach kids to type for years, they were effectively the same as the day they had been purchased. The typewriters sat against the wall in their little attached cases with colors that seemed to exist from the 1950s until the end of the 70s and then we stopped remembering how to mix them. The other kids in my class hated the typewriters since it was easier to just write on loose leaf paper and hand that in, plus the typing tests involved your hands being covered with a cardboard shell to prevent you from looking.

I, like all tech people, decided that instead of fixing my terrible handwriting, I would put in 10x as much work to skip the effort. So I typed everything I could, trying to get out of as many cursive class requirements as possible. As I was doing that, my father was bringing me along to various courthouses and law offices in Ohio when I had snow days or days off school and he didn't want to leave me alone in the house.

These trips were great, mostly because people forgot I was there. I'd watch violent criminal trials, sit in the secretary areas of courthouses eating cookies that were snuck over to me, the whole thing was great. Multiple times I would be sitting on the bench outside of holding cell for prisoners before they would appear in court (often for some procedural thing) and they'd give me advice. I remember one guy who was just covered in tattoos advising me that "stealing cars may look fun and it is fun, but don't crash because the police WILL COME and ask for registration information". 10 year old me would nod sagely and store this information for the future.

It was at one of these courthouses that I was introduced to something mind-blowing. It was a computer running WordPerfect.

WordPerfect?

For a long time the word processor of choice by professionals was WordPerfect. I got to watch the transformation from machine-gun sounding electric typewriters to the glow of CRT monitors. While the business world had switched over pretty quickly, it took a bit longer for government organizations to drop the typewriters and switch. I started learning how to use a word processor with WordPerfect 5.1, which came with an instruction manual big enough to stop a bullet.

For those unaware, WordPerfect introduced some patterns that have persisted throughout time as the best way to do things. It was very reliable software that came with 2 killer features that put the bullet in the head of typewriters: Move and Cancel. Ctrl-F4 let you grab a sentence and then hit enter to move it anywhere else. In an era of dangerous menus, F1 would reliably back you out of any setting in WordPerfect and get you back to where you started without causing damage. Add in some basic file navigation with F5 and you had the beginnings of every text processing tool that came after.

I fell in love with it, eventually getting one of the old courthouse computers in my house to do papers on. We set it up on a giant table next to the front door and I would happily bang away at the thing, churning out papers with the correct date in there (without having to look it up with Shift-F5). In many ways this was the most formative concept of how software worked that I would encounter.

WordPerfect was the first software I saw that understood the idea of WYSIWYG. If you changed the margins in the program, the view reflected that change. You weren't limited to one page of text at a time but could quickly wheel through all the text. It didn't have "modes", similar to Vim today, where you needed to pick Create, Edit or Insert. WordPerfect if you started typing it would insert text. It would then push the other text out of the way instead of overwriting it. It clicks as a natural way for text to work on a screen.

Thanks to the magic of emulation, I'm still able to run this software (and in fact am typing this on it right now). It turns out it is just as good as I remember, if not better. If you are interested in how there is a great write-up here. However as good as the software is, it turns out there is an amazing history of WordPerfect available for free online.

Almost Perfect is the story of WordPerfect's rise and fall from the perspective of someone who was there. I loved reading this and am so grateful that the entire text exists online. It contains some absolute gems like:

One other serious problem was our growing reputation for buggy software. Any complex software program has a number of bugs which evade the testing process. We had ours, and as quickly as we found them, we fixed them. Every couple of months we issued improved software with new release numbers. By the spring of 1983, we had already sent out versions 2.20, 2.21, and 2.23 (2.22 was not good enough to make it out the door). Unfortunately, shipping these new versions with new numbers was taken as evidence by the press and by our dealers that we were shipping bad software. Ironically, our reputation was being destroyed because we were efficient at fixing our bugs.
Our profits were penalized as well. Every time we changed a version number on the outside of the box, dealers wanted to exchange their old software for new. We did not like exchanging their stock, because the costs of remanufacturing the software and shipping it back and forth were steep. This seemed like a waste of money, since the bug fixes were minor and did not affect most users.
Our solution was not to stop releasing the fixes, but to stop changing the version numbers. We changed the date of the software on the diskettes inside the box, but we left the outside of the box the same, a practice known in the industry as slipstreaming. This was a controversial solution, but our bad reputation disappeared. We learned that perception was more important than reality. Our software was no better or worse than it had been before, but in the absence of the new version numbers, it was perceived as being much better.

You can find the entire thing here: http://www.wordplace.com/ap/index.shtml


Fixing Macs Door to Door

Fixing Macs Door to Door

When I graduated college in 2008, even our commencement speaker talked about how moving back in with your parents is nothing to be ashamed of. I sat there thinking well that certainly can't be a good sign. Since I had no aspirations and my girlfriend was moving to Chicago, I figured why not follow her. I had been there a few times and there were no jobs in Michigan. We found a cheap apartment near her law school and I started job hunting.

After a few weeks applying to every job on Craigslist, I landed an odd job working for an Apple Authorized Repair Center. The store was in a strip mall in the suburbs of Chicago with a Dollar Store and a Chinese buffet next door. My primary qualifications were that I was willing to work for not a lot of money and I would buy my own tools. My interview was with a deeply Catholic boss who focused on how I had been an alter boy growing up. Like all of my bosses early on, his primary quality was he was a bad judge of character.

I was hired to do something that I haven't seen anyone else talk about on the Internet and wanted to record before it was lost to time. It was a weird program, a throwback to the pre-Apple Store days of Apple Mac support that was called AppleCare Dispatch. It still appears to exist (https://www.apple.com/support/products/mac/) but I don't know of any AASPs still dispatching employees. It's possible that Apple has subcontracted it out to someone else.

AppleCare Dispatch

Basically if you owned a desktop Mac and lived in certain geographic areas, when you contacted AppleCare to get warranty support they could send someone like me out with a part. Normally they'd do this only for customers who were extremely upset or had a store repair go poorly. I'd get a notice that AppleCare was dispatching a part, we'd get it from FedEx and then I'd fill a backpack full of tools and head out to you on foot.

While we had the same certifications as an Apple Genius, unlike the Genius Bar we weren't trained on any sort of "customer service" element. All we did was Mac hardware repairs all day, with pretty tight expectations of turnaround. So how it worked at the time was basically if the Apple Store was underwater with in-house repairs, or you asked for at-home or the customer was Very Important, we would get sent out. I would head out to you on foot with my CTA card.

That's correct, I didn't own a car. AppleCare didn't pay a lot for each dispatch and my salary of $25,000 a year plus some for each repair didn't go far in Chicago even in the Great Recession. So this job involved me basically taking every form of public transportation in Chicago to every corner of the city. I'd show up at your door within a 2 hour time window, take your desktop Mac apart in your house, swap the part, run the diagnostic and then take the old part with me and mail it back to Apple.

Apple provided a backend web panel which came with a chat client. Your personal Apple ID was linked with the web tool (I think it was called ASX) where you could order parts for repairs as well as open up a chat with the Apple rep there to escalate an issue or ask for additional assistance. The system worked pretty well, with Apple paying reduced rates for each additional part after the first part you ordered. This encouraged us all to get pretty good at specific diagnostics with a minimal number of swaps.

Our relationship to Apple was bizarre. Very few people at Apple even knew the program existed, seemingly only senior AppleCare support people. We could get audited for repair quality, but I don't remember that ever happening. Customer satisfaction was extremely important and basically determined the rate we got paid, so we were almost never late to appointments and typically tried to make the experience as nice as possible. Even Apple Store staff seemed baffled by us on the rare occasions we ran into each other.

There weren't a lot of us working in Chicago around 2008-2010, maybe 20 in total. The community was small and I quickly met most of my peers who worked at other independent retail shops. If our customer satisfaction numbers were high, Apple never really bothered us. They'd provide all the internal PDF repair guides, internal diagnostic tools and that was it.

It is still surprising that Apple turned us loose onto strangers without anyone from Apple speaking to us or making us watch a video. Our exam was mostly about not ordering too many parts and ensuring we could read the PDF guide of how to fix a Mac. A lot of the program was a clear holdover from the pre-iPod Apple, where resources were scarce and oversight minimal. As Apple Retail grew, the relationship to Apple Authorized Service Providers got more adversarial and controlling. But that's a story for another time.

Tools etc

For the first two years I used a Manhattan Portage bag, which looked nice but was honestly a mistake. My shoulder ended up pretty hurt after carrying a heavy messenger bag for 6+ hours a day.

The only screwdrivers I bothered with was Wiha precision screwdrivers. I tried all of them and Wiha were consistently the best by a mile. Wiha has a list of screwdrivers by Apple model available here: https://www.wihatools.com/blogs/articles/apple-and-wiha-tools

Macs of this period booted off of FireWire, so that's what I had with me. FireWire 800 LaCie drives were the standard issue drives in the field.

LaCie Rugged Triple 2TB 2TB 5400rpm FireWire 800, USB 3.0 Orange, Sølv  (LAC9000448) | Dustin.dk

You'd partition it to have a series of OS X Installers on there (so you could restore the customer back to what they had before) along with a few bootable installs of OS X. These were where you'd run your diagnostic software. The most commonly used ones were as follows:

DaisyDisk, the most popular disk space analyzer
Get a visual breakdown of your disk space in form of an interactive map, reveal the biggest space wasters, and remove them with a simple drag and drop.
ALSOFT - Makers of DiskWarrior.
DiskWarrior is a utility program designed from the ground up with a totally different approach to preventing and resolving directory damage.

https://www.cleverfiles.com/pro.html

Mac Repair North York
Remember back when Macs were something you could fix? Crazy times

9/11 Truther

One of my first calls was for a Mac Pro at a private residence. It was a logic board, which means the motherboard of the Mac. I wasn't thrilled, because removing and replacing the Mac Pro logic board was a time-consuming repair that required a lot of light. Instead of a clean workspace with bright lights I got a guy who would not let me go until I had watched how 9/11 was an inside job.

Mac Pro 4.1 2009 Motherboard Logic Board | Kaufen auf Ricardo
The logic board in question

"Look, you don't really think the towers were blown up by planes do you?" he said as he dug around this giant stack of papers to find...presumably some sort of Apple-related document. I had told him that I had everything I needed, but that I had a tight deadline and needed to start right now. "Sure, but I'll put the video on in the background and you can just listen to it while you work." So while I took a Mac Pro down to the studs and rebuilt it, this poorly narrated video explained how it was the CIA behind 9/11.

His office or, "command center", looked like a set of the X-Files. There were folders and scraps of paper everywhere along with photos of buildings, planes, random men wearing sunglasses. I think it was supposed to come across as if he was doing an investigation, but it reminded me more of a neighbor who struggled with hoarding. If there was an organizational system, I couldn't figure it out. Why was this person so willing to dedicate an large portion of their house to "solving a mystery" the rest of us had long since moved on from?

The Mac Pro answered all my questions when it booted up. The desktop was full of videos he had edited of 9/11 truth material along with website assets for where he sold these videos. This guy wasn't just a believer, he produced the stuff. When I finished, we had to run a diagnostic test to basically confirm the thing still worked as well as move the serial number onto the logic board. When it cleared diagnostic I took off, thanking him for his time and wishing him a nice day. He looked devastated and asked if I wanted to go grab a drink at the bar and continue our conversation. I declined, jogging to the L.

The Doctors

One of the rich folks I was sent out to lived in one of those short, super expensive buildings on Lake Shore Drive. For those unfamiliar, these shorter buildings facing the water in Chicago are often divided into a few large houses. Basically you pass through an army of doorman, get shown into an elevator that opens into the persons house. That was, if you could get through the doormen.

The staff in rich peoples houses want to immediately establish with any contractor coming into the home that they're superior to you. This happened to me constantly, from personal assistants to doormen, maids, nannies, etc. Doormen in particular liked to make a big deal of demonstrating that they could stop me from going up. This one stuck out because he made me take the freight elevator, letting me know "the main elevator is for people who live here and people who work here". I muttered about how I was also working there and he rolled his eyes and called me an asshole.

On another visit to a different building I had a doorman physically threaten "to throw me down" if I tried to get on the elevator. The reason was all contractors had to have insurance registered with the building before they did work there, even though I wasn't.....removing wires from the wall. The owner came down and explained that I wasn't going to do any work, I was just "a friend visiting". I felt bad for the doorman in that moment, in a dumb hat and ill-fitting jacket with his brittle authority shattered.

So I took the freight elevator up, getting let into what I would come to see as "the rich persons template home". My time going into rich peoples houses were always disappointing, as they are often a collection of nice items sort of strewn around. I was shown by the husband into the library, a beautiful room full of books with what I (assumed) were prints of paintings in nice frames leaning against the bookshelves. There was an iMac with a dead hard drive, which is an easy repair.

The process for fixing a hard drive was "boot to DiskWarrior, attempt to fix disk, have it fail, swap the drive". Even if DiskWarrior fixed the Mac and it booted, I would still swap the drive (why not and it's what I was paid to do) but then I didn't have to have the talk. This is where I would need to basically sit someone down and tell them their data was gone. "What about my taxes?!?" I would shake my head sadly. Thankfully this time the drive was still functional so I could copy the data over with a SATA to USB adapter.

As I reinstalled OS X, I walked around the room and looked at the books. I realized they were old, really old and the paintings on the floor were not prints. There were sketches by Picasso, other names I had heard in passing through going through art museums. When he came back in, I asked why there was a lot of art. "Oh sure, my dads, his big collection, I'm going to hang it up once we get settled." He, like his wife, didn't really acknowledge my presence unless I directed a question right at him. I started to google some of the books, my eyes getting wide. There was millions of dollars in this room gathering dust. He never made eye contact with me during this period and quickly left the room.

This seems strange but was really common among these clients. I truly think for many of the C-level type people whose house I went to, they didn't really even see me. I had people turn the lights off in rooms I was in, forget I was there and leave (while arming the security system). For whatever reason I instantly became part of the furniture. When I went to the kitchen for a drink of water, the maid let me know that they have lived there for coming up on 5 years.

This was surprising to me because the apartment looked like they had moved in two weeks ago. There were still boxes on the floor, a tv sitting on the windowsill and what I would come to understand was a "prop fridge". It had bottled water, a single bottle of expensive champagne, juices, fruit and often some sort of energy drink. No leftovers, everything gets swapped out before it goes bad and gets replaced. "They're always at work" she explained, grabbing her bag and offering to let me out before she locked up. They were both specialist doctors and this was apparently where they recharged their batteries.

After the first AppleCare Dispatch visit they would call me back for years to fix random problems. I don't think either of them ever learned my name.

HARPO Studio

I was once called to fix a "high profile" computer at HARPO studios in Chicago. This was where they filmed the Oprah Winfrey Show, which I obviously knew of the existence of but had never watched. Often these celebrity calls went to me, likely because I didn't care and didn't particularly want them. I was directed to park across the street and told even though the signs said "no parking" that they had a "deal with the city".

This repair was suspicious and I got the sense that someone had name dropped Oprah to maybe get it done. AppleCare rarely sent me multiple parts unless the case was unusual or the person had gotten escalated through the system. If you emailed Steve Jobs back in the day and his staff approved a repair, it attached a special code to the serial number that allowed us to order effectively unlimited items against the serial number. However with the rare "celebrity" case, we would often find AppleCare did the same thing, throwing parts at us to make the problem go away.

The back area of HARPO was busy, with what seemed like an almost exclusively female staff. "Look it's important that if you see Oprah, you act normally, please don't ask her for an autograph or a photo". I nodded, only somewhat paying attention because never in a million years would I do that. This office felt like the set of The West Wing, with people constantly walking and talking along with a lot of hand motions. My guide led me to a back office with a desk on one side and a long table full of papers and folders. The woman told me to "fix the iMac" and left the room.

harpo-studio-chicago-office-sterling-bay-joshpabstphoto-(9) — Architecture  Photography | Commercial Real Estate Photographer
Not the exact office but you get the jist

I swapped the iMac hard drive and screen, along with the memory and wifi then dived under the desk the second Oprah walked in. The woman and Oprah had a conversation about scheduling someone at a farm, or how shooting at a farm was going and then she was gone. When I popped my head up, the woman looked at me and was like "can you believe you got to meet Oprah?" She had a big smile, like she had given me the chance of a lifetime.

The bummer about the aluminum iMac repairs is you have to take the entire screen off to get anything done. This meant I couldn't just run away and hide my shame after effectively diving under a table to escape Oprah, a woman who I am certain couldn't have cared less what came out of my mouth. I could have said "I love to eat cheese sometimes" and she would have nodded and left the room.

NO TOOLS NEEDED How to replace your 27 inch iMac screen glass monitor -  YouTube

So you have to pop the glass off (with suction cups, not your hands like a psycho as shown above), then unscrew and remove the LCD and then finally you get access to the actual components. Any dust that got on the LCD would stick and annoy people, so you had to try and keep it as clean as possible while moving quickly to get the swap done. The nightmare was breaking the thick cables that connected the screen to the logic board, something I did once and required a late night trip to an electronics repair guy who got me sorted out with a soldering iron.

The back alley electronics repair guy is the dark secret of the Dispatch world. If you messed up a part, pulled a cable or broke a connector, Apple could ask you to pay for that part. The Apple list price for parts were hilariously overpriced. Logic boards were like $700-$900, each stick of RAM was like $90 for ones you could buy on crucial for $25. This could destroy your pay for that month, so you'd end up going to Al, who ran basically a "solder Apple stuff back together" business in his garage. He wore overalls and talked a lot about old airplanes, which you'd need to endure in order to get the part fixed. Then I'd try to get the part swapped and just pray that the thing would turn on long enough for you to get off the property. Ironically his parts often lasted longer than the official Apple refurbished parts.

After I hid under the desk deliberately, I lied for years afterwards, telling people I didn't have time to say hi. In reality my mind completely blanked when she walked in. I stayed under the desk because I was nervous that everyone was going to look at me to be like "I loved when you did X" and my brain couldn't form a single memory of anything Oprah had ever done. I remembered Tom Cruise jumping on a couch but I couldn't recall if this was a good thing or a bad thing when it happened.

Oh and the car that I parked in the area the city didn't enforce? It had a parking ticket, which was great because I had borrowed the car. Most of the payment from my brush with celebrity went to the ticket and a tank of gas.

Brownstone Moms

One of the most common calls I got was to rich peoples houses in Lincoln Park, Streeterville, Old Town and a few other wealthy neighborhoods. They often live in distinct brownstone houses with small yards with a "public" entrance in the front, a family entrance on the side and then a staff entrance through the back or in the basement.

These houses were owned by some of the richest people in Chicago. The houses themselves were beautiful, but they don't operate like normal houses. Mostly they were run by the wives, who often had their own personal assistants. It was an endless sea of contractors coming in and out, coordinated by the mom and sometimes the nanny.

Once I was there, they'd pay me to do whatever random technical tasks existed outside of the initial repair. I typically didn't mind since I was pretty fast at the initial repair and the other stuff was easy, mostly setting up printers or routers. The sense I got was if the household made the AppleCare folks life a living hell, I would get sent out to make the problem disappear. These people often had extremely high expectations of customer service, which could be difficult at times.

There was a whole ecosystem of these small companies I started to run into more and more. They seemed to specialize in catering to rich people, providing tutoring services, in-house chefs, drivers, security and every other service under the sun. One of the AV installation companies and I worked together off the books after-hours to set up Apple TVs and Mac Minis as the digital media hubs in a lot of these houses. They'd pay me to set up 200 iPods as party favors or wire an iPad into every room.

Often I'd show up only to tell them their hard drive was dead and everything was gone. This was just how things worked before iCloud Photos, nobody kept backups and everything was constantly lost forever. Here they would often threaten or plead with me, sometimes insinuating they "knew people" at Apple or could get me fired. Jokes on you people, I don't even know people at Apple was often what ran through my head. Threats quickly lost their power when you realized nobody at any point had asked your name or any information about yourself. It's hard to threaten an anonymous person.

The golden rule that every single one of these assistants warned me about was not to bother the husband when he gets home. Typically these CEO-types would come in, say a few words to their kids and then retreat to their own area of the house. These were often TV rooms or home theaters, elaborate set pieces with $100,000+ of AV equipment in there that was treated like it was a secret lair of the house. To be clear, none of these men ever cared at all that I was there. They didn't seem to care that anybody was there, often barely acknowledging their wives even though an immense amount of work had gone into preparing for his return.

As smartphones became more of a thing, the number of "please spy on my teen" requests exploded. These varied from installing basically spyware on their kids laptops to attempting to install early MDM software on the kids iPhones. I was always uncomfortable with these jobs, in large part because the teens were extremely mean to me. One girl waited until her mom left the room to casually turn to me and say "I will pay you $500 to lie to my mom and say you set this up".

I was offended that this 15 year old thought she could buy me, in large part because she was correct. I took the $500 and told the mom the tracking software was all set up. She nodded and told me she would check that it was working and "call me back if it wasn't". I knew she was never going to check, so that part didn't spook me. I just hoped the kid didn't get kidnapped or something and I would end up on the evening news. But I was also a little short that month for rent so what can you do.

Tip for anyone reading this looking to get into this rich person Mac business

So the short answer is Time Machine is how you get paid month after month. Nobody checks Time Machine or pays attention to the "days since" notification. I wrote an AppleScript back in the day to alert you to Time Machine failures through email, but there is an app now that does the same thing: https://tmnotifier.com/

Basically when the backups fail, you schedule a visit and fix the problem. When they start to run out of space, you buy a new bigger drive. Then you backup the Time Machine to some sort of encrypted external location so when the drive (inevitably) gets stolen you can restore the files. The reason they keep paying you is you'll get a call at some point to come to the house at a weird hour and recover a PDF or a school assignment. That one call is how you get permanent standing appointments.

Nobody will ever ask you how it works, so just find the system you like best and do that. I preferred local Time Machine over something like remote backup only because you'll be sitting there until the entire restore is done and nothing beats local. Executives will often fill the "family computer" with secret corporate documents they needed printed off, so be careful with these backups. Encrypt, encrypt, encrypt then encrypt again. Don't bother explaining how the encryption works, just design the system with the assumption that someone will at some point put a CSV with your social security number onto this fun family iMac covered in dinosaur stickers.

Robbed for Broken Parts

A common customer for repairs would be schools, who would work with Apple to open a case for 100 laptops or 20 iMacs at a time. I liked these "mass repair" days, typically because the IT department for Chicago Public Schools would set us up with a nice clean area to work and I could just listen to a podcast and swap hard drives or replace top cases. However this mass repair was in one of Chicago's rougher neighborhoods.

Personal safety was a common topic among the dispatch folks when we would get together for a pizza and a beer. Everyone had bizarre stories but I was the only one not working out of my car. The general sense among the community was it was not an "if" but a "when" until you were robbed. Typically my rule was if I started to get nervous I'd "call back to the office" to check if a part had arrived. Often this would calm people down, reminding them that people knew where I was. Everyone had a story of getting followed back to their car and I had been followed back to the train once or twice.

On this trip though everything went wrong that could go wrong. My phone, the HTC Dream running v1 of Android had decided to effectively "stop phoning". It was still on but decided we were not, in fact, in the middle of a large city. I was instead in a remote forest miles away from a cell tower. I got to the school later than I wanted to be there, showing up at noon. When I tried to push it and come back the next day the staff let me know the janitors knew I would be there and would let me out.

So after replacing a ton of various Mac parts I walked out with boxes of broken parts in my bag and a bunch in an iMac box that someone had given me. My plan was I would head back home, get them checked in and labeled and then drop them off at a FedEx store. When I got out and realized it was dark, I started to accept something bad was likely about to happen to me. Live in a city for any amount of time and you'll start to develop a subconscious odds calculator. The closing line on this wasn't looking great.

Sure enough while waiting for the bus, I was approached by a man who made it clear he wanted the boxes. He didn't have a weapon but started to go on about "kicking the shit" out of me and I figured that was good enough for me. He clearly thought there was an iMac in the box and I didn't want to be here when he realized that wasn't true. I handed over my big pile of broken parts and sprinted to the bus that was just pulling up, begging the driver to keep driving. As a CTA bus driver, he had of course witnessed every possible horror a human can inflict on another human and was entirely unphased by my outburst. "Sit down or get off the bus".

When I got home I opened a chat with as Apple rep who seemed unsure of what to do. I asked if they wanted me to go to the police and the rep said if I wanted to I could, but after "talking to some people on this side" they would just mark the parts as lost in transit and it wouldn't knock my metrics. I thanked them and didn't think much more of the incident until weeks later when someone from Apple mailed me a small Apple notebook.

They never directly addressed the incident (truly the notebook might be unrelated) but I always thought the timing was funny. Get robbed, get a notebook. I still have the notebook.

Questions/comments/concerns? Find me on Mastodon: https://c.im/@matdevdug


Tech and the Twilight of Democracy

We live in dangerous times. The average level of peacefulness around the world has dropped for the 9th straight year. The impact of violence on the global economy increased by $1 trillion to a record $17.5 trillion. This is equivalent to 13% of global GDP, approximately $2,200 per person. The graphs seem to be trending in the wrong direction by virtually any metric you can imagine. [Source]

It can be difficult to point to who are the "most powerful countries", but I think by most metrics the following countries would certainly fall into that list. These leaders of the world paint a dire picture for the future of democratic rule. In no particular order:

  • United States: currently ranked as a Deficient Democracy and a country who is facing the very real possibility of the upcoming presidential election being its last. The current president, despite low crime and good economic numbers, is facing a close race and a hard reelection. His challenger, Donald Trump, has promised the following:

“We pledge to you that we will root out the Communists, Marxists, fascists, and the radical-left thugs that live like vermin within the confines of our country, that lie and steal and cheat on elections,” Donald Trump said this past November, in a campaign speech that was ostensibly honoring Veterans Day. “The real threat is not from the radical right; the real threat is from the radical left … The threat from outside forces is far less sinister, dangerous, and grave than the threat from within. Our threat is from within.”

Given his strong polling there is no reason to think the US will not fall from Deficient Democracy to Hybrid Regime or even further.

  • China: in the face of increased economic opportunity and growth, there was a hope that China would grow to become more open. If anything, China trends in a very different direction. China is considered to be amongst the least democratic countries in the world.
Over the past 10 years, the Communist Party has moved from collective leadership with the general secretary, considered first among equals on the elite Politburo Standing Committee — a practice established in the “reform and opening” era after the Cultural Revolution — to Xi’s supreme leadership, analysts say.
In 2018, Chinese lawmakers amended the constitution abolishing presidential term limits - paving the way for Xi to rule for life. In a further move to assert his authority, the party pledged to uphold the "Two Establishes,” party-speak for loyalty to him, in a historical resolution passed in 2021.

[Source]

  • EU: Currently they stand alone as keeping the development of democracy alive. However even here they have begun to pass more extreme anti-immigration legislation as an attempt to appease right-leaning voters and keep the more extreme political parties out of office. France recently passed a hard-line anti-immigrant bill designed specifically to keep Le Pen supporters happy [source] and in Germany the desire for a dictator has continued to grow. Currently, across all age groups, between 5-7% of those surveyed support a dictatorship with a single strong party and leader for Germany. This result is double the long-term average. [source]
  • India: Having been recently downgraded to a Hybrid Regime, India is currently in the process of an aggressive consolidation of power by the executive with the assistance of both old and new laws.
The Modi government has increasingly employed two kinds of laws to silence its critics—colonial-era sedition laws and the Unlawful Activities Prevention Act (UAPA). Authorities have regularly booked individuals under sedition laws for dissent in the form of posters, social-media posts, slogans, personal communications, and in one case, posting celebratory messages for a Pakistani cricket win. Sedition cases rose by 28 percent between 2010 and 2021. Of the sedition cases filed against citizens for criticizing the government, 96 percent were filed after Modi came to power in 2014. One report estimates that over the course of just one year, ten-thousand tribal activists in a single district were charged with sedition for invoking their land rights.
The Unlawful Activities Prevention Act was amended in 2019 to allow the government to designate individuals as terrorists without a specific link to a terrorist organization. There is no mechanism of judicial redress to challenge this categorization. The law now specifies that it can be used to target individuals committing any act “likely to threaten” or “likely to strike terror in people.” Between 2015 and 2019, there was a 72 percent increase in arrests under the UAPA, with 98 percent of those arrested remaining in jail without bail.

[Source]

  • Russia: There has been a long-standing debate over whether Russia was a full dictatorship or some hybrid model. The invasion of Ukraine seems to have put all those questions to bed.
On 8 December, Andrey Klishas, the Head of the Federation Council Committee on Constitutional Legislation, made a point in an interview with Vedomosti which was already tacitly understood by Russia-watchers, but still shocking to hear.   In answer to a question on why the partial mobilisation decree had not been repealed now the process was completed, he explained to the Kremlin-friendly correspondent there was no need for legislation: ‘There is no greater power than the President’s words.’ So there it is – Russia is by definition a dictatorship. For the unawares reader, Vedomosti was one of Russia’s leading, intelligent and independent newspapers; it fell afoul of the authorities and today is a government propaganda channel.

[Source]

We have no reason at this point to think this trend will slow or reverse itself. It appears that, despite the constant refrain of my childhood that progression towards democracy was an inevitable result of free and open trade, this was another neoliberal fantasy. We live in a world where the most powerful countries are actively trending away from what we would consider to be core democratic values and towards more xenophobic and authoritarian governments.

However I'm not here to lecture, only to lay the foundation. In the face of this data, I thought it could be interesting to discuss some what-ifs, trying to imagine what the future of technology will look like in the face of this strong global anti-democratic trend. What technologies will we all be asked to make and what concessions will be forced upon us?

Disclaimer: I am not an expert on foreign policy, or really anything. Approach these topics not as absolute truths but as discussion points. I will attempt to provide citations and factual basis for my guesses, but as always feel free to disagree. Don't send me threatening messages as sometimes happens when I write things like this. I don't care about you and don't read them.

So let's make some predictions. What kind of world are we heading into? What are the major trends and things to look out for.

The Internet Stops Being Global

The Internet has always been a fractured thing. Far from the dream of perfectly equal traffic being carried across the fastest route between user and service, the real internet is a complicated series of arrangements between the tiers of ISPs and the services that ride those rails. First, what is the internet?

The thing we call the Internet is a big collection of separate, linked systems, each of which is managed as a single domain called an Autonomous Systems (AS). There are over sixty thousand AS numbers (ASNs) assigned to a wide variety of companies, educational, non-profit and government entities. The AS networks that form the primary transport for the Internet are independently controlled by Internet Service Providers (ISPs). The BGP protocol binds these entities together.

When we talk about ISPs, we're talking about 3 tiers. Tier 1 are defined by not paying to have their traffic delivered through similar-sized networks, can deliver to the whole internet, peer on multiple continents and have direct access to a fiber cable in the ocean. Tier 2 provide paid transit through Tier 1 and through peering with other Tier 2. Tier 3 is what hooks up end users and businesses and connects to a Tier 2.

internet-tier1-tier2-tier3-isps.excalidraw

The internet is not as reliable as some people pretend it is. Instead its a very fragile entity well within the governmental scope of the countries where the pieces reside. As governments become less open, their Internet becomes less open. India regularly shuts down the Internet to stop dissent or to control protests or any civil unrest (source) and I would expect that to grow into even more extreme regulations as time goes on.

The "IT Rules 2011" were adopted in April 2011 as a supplement to the 2000 Information Technology Act (ITA). The new rules require Internet companies to remove within 36 hours of being notified by the authorities any content that is deemed objectionable, particularly if its nature is "defamatory," "hateful", "harmful to minors", or "infringes copyright". Cybercafé owners are required to photograph their customers, follow instructions on how their cafés should be set up so that all computer screens are in plain sight, keep copies of client IDs and their browsing histories for one year, and forward this data to the government each month.

China has effectively made its own Internet and Russia is currently in the process of doing the same thing (source). The US has its infamous Section 702.

Section 702 of the Foreign Intelligence Surveillance Act permits the U.S. government to engage in mass, warrantless surveillance of Americans’ international communications, including phone calls, texts, emails, social media messages, and web browsing. The government claims to be pursuing vaguely defined foreign intelligence “targets,” but its targets need not be spies, terrorists, or criminals. They can be virtually any foreigner abroad: journalists, academic researchers, scientists, or businesspeople.

[source]

As time progresses I would expect to see the restrictions on Internet traffic to increase, not decrease. Much is made of the sanctity of encrypted messages between individuals, but in practice this is less critical since even if the message body is itself encrypted, the metadata often isn't. The reality is even if the individual messages between people are encrypted, a graph of relationships is still possible through all the additional information around the message.

Predictions

  • Expect to see more pressure placed on ISPs and less on tech companies. Google, Apple, Meta and others have shown some willingness to buck governmental pressure. However given the growth in cellular data usage and the shift of consumers from laptops/desktops to mobile, expect to see more restrictions at the mobile cellular network where even simple DNS blocking or tracking is harder to stop.
[source]
  • Widespread surveillance of all Internet traffic will continue to grow and governments will become more willing to turn off or greatly limit Internet access in the face of disruptions or threats. Expect to see even regional governments able to turn off mobile Internet in the face of protests or riots.
  • Look to the war in Gaza as an example of what this might look like.
Palestine-Israel Conflict Impacts Internet Access

Shutting off the Internet will be a more common tactic to limit the flow of information out and to disrupt attempts to organize or communicate across members of the opposition.

As of me writing this there are 8 ongoing governmental Internet shutdowns and 119 in the last 12 months. I would expect this pace to dramatically increase.[source]

The end result of all of these disruptions will be an increasingly siloed Internet specific to your country. It'll be harder for normal people on the ground in a crisis or governmental crackdown to tell people what is happening and, with the next technology, easier for those forces to make telling what is happening on the ground next to impossible.

LLMs Make Telling the Truth Impossible

Technology was supposed to usher in an age of less-centralized truth. No longer would we be reliant on the journalists of the past. Instead we could get our information directly from the people on the ground without filtering or editorializing. The goal was a more fair version of news that was more honest and less manipulated.

The actual product is far from that. Social media has become a powerful tool for propaganda, with algorithms designed to keep users engaged with content they find relevant allowing normal people access to conspiracy theories and propaganda with no filters or ethics. Russia and China, following a new version of their old Cold War playbooks, have excelled at this new world of disinformation, making it difficult to tell what is real and what is fake.

In 20 years we'll look back at this period as being the almost innocent beginning of this trend. With realistic deepfakes, it will soon be impossible to tell what a leader did or didn't say. Since China, Russia and increasingly the US have no concept of "ethical journalism" and either answer to government leaders or a desire for more ratings, it will soon be possible to create entirely false news streams that cater to whatever viewpoint your audience finds appealing at that time.

Predictions

  • Future conflicts will find social media immediately swamped with LLM backed accounts attempting to create the perception that even a deeply disliked action (a Chinese blockage or invasion of Taiwan) is more nuanced. World leaders will find it difficult to tell what voters actually think and it will be hard to form consensus across political affiliations even on seemingly straight-forward issues.
  • Politicians and their supporters will use the possibility of deepfakes to attempt to explain away any video or image of them engaging in nefarious actions. Even if deepfakes aren't widely deployed, the possibility of them will transition us into a post-truth reality. Even if you watch a video of the president giving a speech advocating something truly terrible, supporters will be able to dismiss it without consideration.
  • Technology companies, facing a closed Internet and increasingly hostile financial landscape, will inevitably provide this technology as a service. Expect to see a series of cut-out companies but the underlying technology will be the same.
  • We won't ever find reliable LLM detection technology and there won't be a way to mass filter out this content from social media.
  • Even if you are careful about your consumption of media, it will be very hard to tell truth from fabrication for savvy consumers of information. Even if you are not swayed by the LLM generated content, you will not be able to keep up with the sheer output with conventional fact checking.

Global Warming (and War) Kills the Gadget

We know that Global Warming is going to have a devastating impact on shipping routes around the world. We're already seeing more storms impacting ports that are absolutely critical to the digital logistics chain.

[source]

With the COP28 conference a complete failure and none of the countries previously mentioned interested in addressing Global Warming, expect to see this trend continue unchecked. Without democratic pressures, we would expect to see countries like India, China, the US and others continue to take the most profitable course of action regardless of long-term cost.

The net result will be a widespread disruption in the complicated supply chain that provides the hardware necessary to continue to grow the digital economy. It will be more difficult for datacenters, mobile network providers and individual consumers to get replacement parts for hardware or to upgrade that hardware. Since much of the manufacturing expertise required to make these parts is almost exclusively contained within the impacted zones, setting up alternative factories will be difficult or impossible.

What’s likely incentivizing semiconductor makers more than government dollars are geopolitical changes. Taiwan is potentially a major choke point in any electronics supply chain. Any electronic part, whether for a smart phone, a television, a home computer, or a data center likely includes critical components that came through Taiwan.

“If you look across the Taiwan Strait, you’ve got this 900-pound gorilla called China that is saying 'Taiwan belongs to us, and if you won’t give it to us, we’ll take it at some point,'” Johnson said. “What would happen to the semiconductor industry if TSMCs fabs were destroyed? Disaster.”

Before Chinese President Xi Jinping became president in 2012, Western nations had a relatively healthy trade relationship with China. Since that time, it has become more contentious.

“Before Xi came in power, we had this great trade relationship. And there was the belief that if you treated China like a grown-up partner, they’d start acting like one; that turned out to be a very bad assumption,” Johnson said. “So yeah, the idea of bringing the entire supply chain back to the US? Probably not practical.

"But you want to figure out how to diversify away from China as much as you can. I don’t consider China a reliable business partner anymore.”

[source]

Predictions

  • As relations with China continue to degrade, expect to see tech companies struggle to find replacements for difficult to manufacture parts.
  • Even among countries where relations are good, the decision to ignore Global Warming means we'll see increased severe disruption of maritime shipping with destruction or flooding of vulnerable ports causing massive parts shortages.
  • It'll be harder to replace devices and harder to fix the ones you already have
  • Expect to see a lot of "right to repair" bills as governments, unable to solve the logistical struggles, will push the issue down to being the responsibility of tech companies who will need to change their designs and manufacturing locations.
  • Also expect to see the same model of something in the field for a lot longer. A cellphone or random IoT device will go from being easy to replace overnight to possibly involving a multi-week or even several month delay. Consumers will come to expect that they will be able to keep technology operational for longer.

Tech Companies will be Pressured to Comply

We currently live in a strange middle period where companies can still (mostly) say no to governments. While there are consequences, these are mostly financial or limitations on where the company can sell their products. However that period appears to be coming to an end. Governments around the world are looking at Big Tech and looking to apply regulations to those businesses. [source]

More governments arrested users for nonviolent political, social, or religious speech than ever before. Officials suspended internet access in at least 20 countries, and 21 states blocked access to social media platforms. Authorities in at least 45 countries are suspected of obtaining sophisticated spyware or data-extraction technology from private vendors.
[source]
  • Expect to see governments step up their expectations of what Tech is willing to do for them. Being told it is "impossible" to get information out of an encrypted exchange will get less and less traction.
  • Platforms like YouTube will be under immense pressure to either curtail fake video or promote face video promoted by the government in question. Bans or slowdowns will be common-place.
  • Getting users to provide more government ID under the guise of protecting underage users so that social media accounts can result in more effective criminal prosecution will become common.

Conclusion

Technology is not immune to changes in political structure. As we trend away from free and open communication across borders and towards more closed borders and war, we should expect to see technology reflect those changes. Hopefully this provides you with some interesting things to consider.

Whether these trends are reversible or not is not for me to say. I have no idea how to make a functional democracy, so fixing it is beyond my skills. I do hope I'm wrong, but I feel my predictions fit within the data I was able to find.

As always I'm open to feedback. The best place to find me is on Mastodon: https://c.im/@matdevdug