matduggan.com

Tech Support Stories Part 2

February 16, 2024

Since folks seemed to like the first one, I figured I would do another one. These are just interesting stories from my whole time doing IT-type work. Feel free to subscribe via RSS but know that this isn't the only kind of writing I do.

Getting Started

I grew up in a place that could serve as the laboratory calibration small town USA. It had a large courthouse, a diner, one liquor store infamous for serving underage teens and a library. When I turned 12 my dad asked if I wanted to work for free for a local computer shop. My parents, like girlfriends, friends and a spouse in the future would be, were worried about the amount of time I was spending in the basement surrounded by half-broken electronics.

The shop was on the outskirts of town, a steel warehouse structure converted into offices. It was a father and son business, the father running the counter and phones with the son mostly doing on-site visits to businesses. They were deeply religious, members of a religion where church on Sunday was an all-day affair. Despite agreeing to let me work there for free, the son mostly didn't acknowledge that I was there. He seemed content to let me be and focus on his dream of setting up a wireless ISP with large microwave radio links.

Bill was put in charge of training me. He had been a Vietnam veteran who had lost a leg below the knee in the war. His favorite trick was to rotate the fake leg 180 degrees up and then turn his chair around when kids walked in, laughing as they ran away screaming. He had been a radio operator in the war and had spent most of his career working on radio equipment before getting this computer repair job as "something to keep myself busy". I was put to work fixing Windows 98 and later XP desktop computers.

This was my introduction to "Troubleshooting Theory" which Bill had honed over decades of fixing electronics. It effectively boiled down to:

Ask users what happened before the failure.
Develop a theory of the failure and a test to confirm.
Record the steps you have taken so you don't repeat them
Check the machine after every step to ensure you didn't make it worse.
Software is unreliable, remove it as a factor whenever possible.
Document the fix in your own notes.
If you make the problem worse in your testing, walk away for a bit and start from the top. You are probably missing something obvious.

Nothing here is revolutionary but the quiet consistency of his approach is still something I use today. He was someone who believed that there was nothing exceptional in fixing technology but that people are too lazy to read the instruction manual. I started with "my PC is slow" tickets, which are basically "every computer that comes in". Windows 98 had a lot of bizarre behavior that was hard for normal users to understand. This was my first exposure to "the Registry".

The Registry

For those of you blessed to have started your exposure to Windows after the era of the registry, it was a hierarchical database used in older Windows versions that stored the information necessary to configure the system. User profiles, what applications are installed, what icons are for what folders, what hardware is in the system, it was all in this database. This database became the source of truth for everything in the system and also the only way to figure out what the system actually thought the value of something was.

The longer a normal person used a Windows device, the more cluttered this database becomes. Combined with adding and deleting files creating fragmentation on the spinning rust drives and you would get a constant stream of people attesting that their machine was slower than it was before. The combination of some Registry Editor work to remove entries and de-fragmentation would buy you some time, but effectively there was a ticking clock hanging over every Windows install before you would need to reinstall.

In short order I learned troubleshooting Windows was a waste of time. Even if you knew why 98 was doing something, you rarely could fix it. So I would just run assembly lines of re-installs, backing up all the users files to a file-share and then clicking through the 98 install screen a thousand times. It sounds boring now but I was thrilled by the work, even though copying lots of files off of bogged down Windows 98 machines was painfully, hilariously slow.

Since Bill believed in telling people they were (effectively) stupid and had broken their machines through an inability to understand simple instructions, I took over the delicate act of lying to users. A lot of Windows IT work is lying to people on the phone trying to walk a delicate line. You can't blame the software too much because we still want them to continue buying computers, but at the same time you don't want to tell the truth which was almost always "you did something wrong and broke it". I felt the lying in this case was practically a public service.

As time went on I graduated to hardware repairs, which was fascinating in that era. Things like "getting video to output onto a monitor" or "getting sound to come out of the sound card" were still minor miracles that often didn't work. Hardware failures were often showing up with blown capacitors. I lived on Bill's endless supply of cups of noodles, sparkling water bottles and his incredibly collection of hot sauce. The man loved hot sauce, buying every random variation he could find. His entire workstation was lined with little bottles of threatening-sounding sauces.

The hardware repairs quickly became my favorite. Bill taught me how to solder and I discovered most problems were pretty easy to fix. Capacitors of this time period were, for whatever reason, constantly exploding. Often even expensive components could be fixed right up by replacing a fan, soldering a new capacitor on or applying thermal paste correctly. Customers loved it because they didn't need to buy totally new components and I loved it because it made me feel like a real expert (even though I wasn't and this was mostly visual diagnosis of problems).

When Windows XP started to be a thing was the first time I felt some level of frustration. XP was so broken when it came out that it effectively put us underwater as a team. After awhile I felt like there wasn't much else for me to do in this space. Windows just broke all the time. I wasn't really getting better at fixing them, because there wasn't anything to fix. As Dell took over the PC desktop market in the area, everything from the videocard to the soundcard were on the motherboard, meaning all repairs boiled down to "replace the motherboard".

That was the end of my Windows career. I sold my PC gear, bought an iBook and from then on I was all-in on Mac. I haven't used Windows in any serious way since.

High School CCNA

While I was in high school, Cisco offered us this unique course. You could attend the Cisco Academy inside of high school, where you would study and eventually sit for your CCNA certification. It was a weird era where everyone understood computers were important to how life was going to work in the future but nobody understood what that meant. Cisco was trying to insert the idea that every organization on earth was going to need someone to configure Cisco routers and switches.

So we went, learned how to write Cisco configurations, recover passwords, reset devices and configure ports. Switches at this point were painfully simple, with concepts like VPNs working but not fully baked ideas. These were 10/100 port switches with PoE and had most of the basic features you would expect. It was a lot of fun to have a class where we would go down there and effectively mess with racks of networking equipment to try and get stuff to work. We'd set up DHCP servers and try to get anything to be able to talk to anything else on our local networks.

We mostly worked with the Cisco Catalyst 1900 which are models I would see well past their end of life in offices throughout my career. This class introduced me to a lot of the basic ideas I still use today. Designing network topology, the OSI model, having VLANs span switches, how routing tables work, IPv4 subnetting, all these concepts were introduced to me here and laid a good foundation for all the work I was to do later. More than the knowledge though, I appreciated the community.

This was the first time I discovered a group of people with the same interests and passions as me. Computer nerd was still very much an insult during this period of time, when admitting you enjoyed this sort of stuff opened you up to mocking. So you kinda didn't mention how much time you spent taking apart NESs from garage sales or you'd invite just a torrent of abuse. However here was a place where we could chat, compare personal projects and troubleshoot. I looked forward to it the 2 days a week I had the class.

To be clear, this was not a rich school. I grew up in a small town in Ohio whose primary industries were agriculture and making the Etch-A-Sketch. Our high school was full of asbestos to the extent that we were warned not to touch the ceiling tiles lest the dust get on us. The math teacher organized a prayer circle around the flagpole every morning as close to violating the Supreme Court ruling on prayer in school as he could get without actually breaking it. But somehow they threw this program together for a few years and I ended up benefiting from it.

The teacher also had contacts with lots of programmers and tech workers, which was the first time I had ever had contact with people in the tech field. They would come into this class and tell us what it was like to be a programmer or a network engineer at this time. It really opened my eyes to what was possible, since people in my life still made fun of the idea of "working with computers". Silicon Valley to people in the Midwest was long-haired hippies playing hacky sack, not doing actual work. These people looked way too tired to be accused of not doing real work.

Mostly though I appreciated our teacher, Mr. Bohnlein. The teacher was an old-school nerd who had been programming since the 70s. He had been a high school teacher for decades but a very passionate Mac user in his personal life. I remember he was extremely skilled at letting us fail for a long time while still giving us hints towards the correct solution. When it came time to take the test, I sailed through it thanks to his help. The students in the class used to make fun of him for his insistence on buying Apple stock. We all thought the company was going to be dead in the next 5 years. "The iPod is the inferior MP3 player" I remember stating very confidently.

He retired comfortable.

Playboy

One call I would get from time to time was to the Chicago Playboy office. This office was beautiful, high up overlooking the water with a very cool "Mad Men" layout. The writers and editors were up on a second level catwalk, with little "pod" offices that had glass walls. They dressed great and were just very stylish people. I was surprised to discover so many of the photographers were female, but I mostly didn't interact with them.

The group I did spend time with was the website team, which unfortunately worked in conventional cubicles facing a giant monitor showing a dashboard of the websites performance and stats. I remember that the carpet was weirdly shaggy and purple, which made finding screws when I dropped them tricky. Often I had to wave a magnet over the ground and hope it sucked up the screw I had lost. The website crew was great to work with, super nice, but the route to their offices involved going by just mountains of Playboy branded stuff.

It was just rack after rack of Playboy clothes, lighters, swimsuits, underwear, water bottles. Every single item you could imagine had that rabbit logo on it. You see it a lot around, but I've never seen it all piled up together. Beyond that was a series of photo studios, with tons of lighting and props. I have no idea if they shot the content for the magazine there (I never saw anyone naked) but it seemed like a lot of the merchandise promo photos were shot there. The photo pipeline was a well-oiled machine, with SD cards almost immediately getting backed up onto multiple locations. They had editing stations right by the photo shooting areas and the entire staff was zero-nonsense.

The repairs were pretty standard stuff, mostly iMac and Mac Pro fixes, but the reason it stood out to me was the weird amount of pornography they tried to give me. Magazines, posters, a book once (like an actual hardcover photo book) which was incredibly nice of the IT guy I worked with, but felt like a strange thing to end a computer repair session with. He would give these to me in a cubicle filled with things made of animals. He had an antler letter opener, wore a ring that looked like it was made out of bone or antler along with a lot of photos of him holding up the heads of dead animals.

The IT field and the gun enthusiast community has a lot of overlap. It makes sense, people who enjoy comparing and shopping for very specific equipment that has long serial number-type names along with weirdly strong brand allegiances. I had no particularly strong stance on hunting guns, having grown up in a rural area where everyone had a shotgun somewhere in the house. As a kid it was common for every visit to a new house to involve being warned to stay away from the gun cabinet. However hunting stories are a particular kind of boring, often beginning with a journey to somewhere I would never want to go and a lot of details I don't need. "I was debating between bringing the Tikka T3 and the Remington 700 but you know the recoil on the T3x is crazy". "Obviously it's a three-day drive from the airport to the hunting area in nowhere Texas but we passed the time talking about our favorite jerky".

I often spent this time trapped in cubicles or offices thinking about these men suddenly forced to fight these animals hand to hand. Are deer hard to knock out with your fists? Presumably they have a lot of brain protection from all the male jousting. I think it would quickly become the most popular show on television, just middle-aged IT managers trying to choke a white-tailed deer as it runs around an arena. We'd sell big steins of beer and turkey legs, like Medieval Times, for spectators. You and a date would crash the giant glasses together and cheer as people run for their lives from a moose.

Once after a repair session, while waiting for the L, I tripped and some of the stuff in my bag spilled out. This woman on the platform looked down at just a thick stack of porn magazines sliding over the platform and then at me. I still think about what she must have thought about me. It's not just that I had a Playboy, but like 6, as if I was one of the secret sexual deviants you read about on the news. "He looked like a normal person but everywhere he went he had a thick stack of porn."

Shedd Aquarium

One of my favorite jobs in the entire city was the Shedd aquarium. I would enter around the side by the loading dock, which is also where many of the animals would come in through. Almost every morning there would be just these giant containers of misc seafood for the animals packed into the loading dock. It was actually really nice to see how high quality it was, like I've eaten dodgier seafood than what they serve the seals at Shedd.

It did make me laugh when you'd see the care and attention that went into the food for the animals and then you'd go by the cafeteria and see kids sucking down chicken nuggets and diet coke. But it was impossible not to be charmed by the intense focus these people had for animals. I used to break some of the rules and spy on the penguins, my favorite animals. There is something endlessly amusing about seeing penguins in non-animal places. Try not to smile at penguins walking down a hallway, it's impossible.

The back area of the aquarium feels like a secret world, with lots of staircases going behind the tanks. Often I would be in a conversation and look through the exhibit, making eye contact with a guest on the other side of the water. It was a very easy place to get lost, often heading down a series of catwalks and down a few stairs to a random door. Even after going there a few times, I appreciated an escort to ensure I didn't head down a random hallway and into an animal area or accidentally emerge in front of a crowd of visitors.

The offices were tucked away up here overlooking the water show

I worked with the graphic design team that was split between the visuals inside the aquarium and their ad campaigns. It was my introduction to concepts like color calibration and large format printing, The team was great and a lot of fun to work with, very passionate about their work. However one part of their workflow threw me off a lot at first. Fonts.

FontExplorer X Pro 7.0.1 - Download for Mac Free — Spent a lot of time figuring out how this software worked

So I, like many people, had not spent a lot of time thinking about the existence of fonts. In school I wrote papers exclusively in Times New Roman for some reason that was never explained to me. However in design teams buying and licensing fonts for each employee and project was a big deal. The technology that most places used at the time to manage these fonts were FontExplorer X Pro, which had a server and client side.

Quickly I learned a lot about fonts because debugging font issues is one of the stranger corners of technical troubleshooting. First some Adobe applications hijacked the system font directories, meaning even if you had injected the right font in the user directory they might not appear. Second fonts themselves were weird. TrueType fonts, which is the older format and the one a lot of these companies still dealt with, are at their lowest level "a sequence of points on a grid". You combine curves and straight lines into what we call a glyph.

Most of the fonts I worked with had started out with the goal of printing on paper. Now many of those were being repurposed for digital assets as well as printing on paper, which introduced no end of weirdness. Here are just a few of the things I tried to help with:

Print and screen use different color modules
DPI for print and PPI for digital aren't the same
No screen is the same. The differences between how a digital asset looked on a nice screen vs a cheap screen wasn't trivial, even if we tried to color calibrate both

In general though I liked working with designers. They often knew exactly what they wanted to get out of my technical assistance, providing me with a ton of examples of what was wrong. Their passion for the graphic design work they were doing inside the aquarium and outside was clear with everyone I spoke with. It's rare to find a group of people who truly enjoys their jobs.

My primary task though was managing and backing up the Mac servers onto tape. For those who haven't used tape backups, it's a slow way to backup a lot of data that requires a lot of testing of backups (along with a good storage system for not confusing people). I quickly came to despise running large-scale tape backups. The rate of errors discovered when attempting to restore backups as a test was horrifying.

The tape backup was overall a complete fucking disaster. There were two tape drives from IBM and way too often a tape written by one drive wouldn't be readable by the other one. The sticker system used to track the tape backups got messed up when I went on vacation and when I came back I couldn't make heads or tails over what had happened. Every week I stopped by and basically tried anything I could think of to get the tape backup to work correctly every time.

Then I did something I'm not proud of. The idea of them calling me and telling me all their hard work was gone was keeping me up at night. So without telling them, I stuck an external 3.5 drive with as much storage as I could afford behind the server tucked away and started copying everything to the tapes and the drive. The IT department had vetoed this idea before but I did it without their permission and basically bet the farm that if the server drives failed and the tape didn't restore, they'd forgive me for making another onsite backup.

I found out years later that their IT found the drive, assumed they had installed it and switched over to backing up on disks in a Drobo since it was much easier to keep running.

United Airlines

Another frequent customer of mine was United Airlines. They had a suburban office which remains the strangest designed office I've ever been in. There was a pretty normal lobby, with executive offices upstairs, a cafeteria and meeting room on the ground floor and then a nuclear bunker style basement. Most of the offices I went to were in the basement along cement corridors so long that they had those little carts with the orange flashing lights zooming down there. It sort of felt like you were at the airport. You could actually ask for a ride on the carts and get one, which I only did once but it was extremely fun.

The team that asked for technical support the most was the design team for the in-flight magazine, Hemispheres. They were all-Mac and located in a side room with no windows in this massive basement complex. So you'd go into just this broiling hot little room with Mac Pros humming along and zero airflow. The walls were brown, the carpet was brown, it was one of the least pleasant offices I've ever been in. Despite working for an in-flight magazine, these people were deadly serious about their work and had frequent contact with Apple about suggested improvements to workflows or tooling.

It was, to be frank, a baffling job. The United Airlines IT didn't want anything to do with the Macs, so I was there to do extremely normal things. I'm talking about applying software updates, install Adobe products, things that anyone is capable of doing without any help. I'd often be asked to wait in a conference room for hours until someone remembered I was there and would ask me to do something. Their internet was so slow I would download the Mac updates at home and bring them into the compound on a hard drive. I've never seen corporate internet this slow in my life.

It wasn't the proudest I've ever been of a job but I was absolutely broke. So I would spend hours watching the progress bar tick by on Mac updates and bill them for it. I tried to do anything to fill the time. I wrapped cables in velcro, refilled printers, reorganized ethernet cables. It was too lucrative for me to walk away but it was also the most bored I've ever been in my life. I once emptied the recycling for everyone just to feel like I had done something that day, only to piss off the janitor. "What, is this your job?" he shouted as I handed him back the recycling bin.

The thing I remember the most was how impossibly hard it was to get paid. You would need to go to the end of this hallway, which had an Accounts Payable window slot with an older woman working there. Then you would physically submit the invoice to her, she would take it and put it in an old-timey wooden invoice tracking system. I'm talking sometimes months from when I submitted the invoice to when I got paid. I would borderline harass this woman, asking her on the way to the bathroom like "hey any chance I could get paid before Christmas? I gotta get the kids presents this year."

I didn't have kids, but I figured it sounded more convincing. I shouldn't have bothered with the lie, she looked at me with zero expression and resumed reading a magazine. At this point I was so poor that I had a budget of $20 a day, so waiting months to get paid by United put me in serious risk of not being able to pay my rent. In the end I learned a super valuable lesson about working for giant corporations. It's a great way to get paid as long as time was no object, but it's a dangerous waiting game to play.

Schools

Colleges hiring me to come out and do specific jobs wasn't uncommon. Setting up a media lab was probably the most common request, where I would show up, set up a bunch of Mac Pros with fiber and an Xserve somewhere to store the files. This was fine work, but it wasn't very exciting and typically involved a lot of unboxing stuff and figuring out how to run fiber. The weirdest niche I found myself in was somehow I became the go-to person for Jewish schools in the Chicago suburbs.

It started with Ida Crown Jewish Academy in Skokie, IL. I went in to fix a bunch of white MacBooks and iMacs and while I was there I showed the teachers how to automate some of their tasks with Automator.

Automator was a drag and drop automation tool that let you effectively write scripts to do certain tasks. I showed them how to automate some of the grading process with CSVs and after that I became the person they always called. Soon after, I started getting calls for all the Jewish schools in the area. To be clear there are not a lot of these schools and they are extremely small.

On average I'd say somewhere around 200-300 students in each school. Also they had pretty intense security, probably the most I'd seen at a high school before or since. To be honest I don't know why they picked me as the Mac IT guy, I don't have any particular feelings about the Jewish faith. The times when the schools staff would ask questions about my faith, they seemed pleased by my complete lack of interest in the topic. As someone who grew up with Christian fundamentalist cults constantly trying to recruit me, I appreciated them dropping it and never mentioning it again.

I loved these jobs because the schools were well organized, the staff knew everyone and they had a list of specific tasks for me when I showed up. Half my life doing independent IT was sitting in waiting rooms while the person who hired me to show up actually came and got me, so this was delightful. I started doing more "teacher automation" work, which was mostly AppleScript or Automator doing the repetitive tasks that these people were staying late to get done.

It wasn't until one of the schools offered me a full-time job that I realized my time in IT was coming to a close. The automation and writing AppleScript was so much more fun than anything I was doing related to Active Directory or printers. It had started to become more clear with the changes Apple was making that they were less and less interested in the creative professional space, which was my bread and butter. This school was super nice, but I knew if I started working here I would be here forever and it was too boring to do forever.

That's when I started transitioning to more traditional Linux sysadmin work. But I still think back fondly of a lot of those trips around Chicago.

Questions/comments/concerns: https://c.im/@matdevdug

Typewriters and WordPerfect

February 06, 2024 in Work

The first and greatest trick of all technology is to make words appear. I will remember forever the feeling of writing my first paper on a typewriter as a kid. The tactile clunk and slight depression of the letters on the page made me feel like I was making something. It transformed my trivial thoughts to something more serious and weighty. I beamed with pride when I would be the only person who would hand in typed documents instead of the cursive of my classmates.

I learned how to type on the schools Brother Charger 11 typewriter, which by the time I got there were one step away from being thrown away. It was one of the last of its kind, being a manual portable typewriter before electric typewriters took over the entire market. Our typing teacher was a nun who had learned how to type on them and insisted they be what we tried first. Typewriters were heavy things, with a thunk and a clang going along with almost anything you did.

Despite being used to teach kids to type for years, they were effectively the same as the day they had been purchased. The typewriters sat against the wall in their little attached cases with colors that seemed to exist from the 1950s until the end of the 70s and then we stopped remembering how to mix them. The other kids in my class hated the typewriters since it was easier to just write on loose leaf paper and hand that in, plus the typing tests involved your hands being covered with a cardboard shell to prevent you from looking.

I, like all tech people, decided that instead of fixing my terrible handwriting, I would put in 10x as much work to skip the effort. So I typed everything I could, trying to get out of as many cursive class requirements as possible. As I was doing that, my father was bringing me along to various courthouses and law offices in Ohio when I had snow days or days off school and he didn't want to leave me alone in the house.

These trips were great, mostly because people forgot I was there. I'd watch violent criminal trials, sit in the secretary areas of courthouses eating cookies that were snuck over to me, the whole thing was great. Multiple times I would be sitting on the bench outside of holding cell for prisoners before they would appear in court (often for some procedural thing) and they'd give me advice. I remember one guy who was just covered in tattoos advising me that "stealing cars may look fun and it is fun, but don't crash because the police WILL COME and ask for registration information". 10 year old me would nod sagely and store this information for the future.

It was at one of these courthouses that I was introduced to something mind-blowing. It was a computer running WordPerfect.

WordPerfect?

For a long time the word processor of choice by professionals was WordPerfect. I got to watch the transformation from machine-gun sounding electric typewriters to the glow of CRT monitors. While the business world had switched over pretty quickly, it took a bit longer for government organizations to drop the typewriters and switch. I started learning how to use a word processor with WordPerfect 5.1, which came with an instruction manual big enough to stop a bullet.

For those unaware, WordPerfect introduced some patterns that have persisted throughout time as the best way to do things. It was very reliable software that came with 2 killer features that put the bullet in the head of typewriters: Move and Cancel. Ctrl-F4 let you grab a sentence and then hit enter to move it anywhere else. In an era of dangerous menus, F1 would reliably back you out of any setting in WordPerfect and get you back to where you started without causing damage. Add in some basic file navigation with F5 and you had the beginnings of every text processing tool that came after.

I fell in love with it, eventually getting one of the old courthouse computers in my house to do papers on. We set it up on a giant table next to the front door and I would happily bang away at the thing, churning out papers with the correct date in there (without having to look it up with Shift-F5). In many ways this was the most formative concept of how software worked that I would encounter.

WordPerfect was the first software I saw that understood the idea of WYSIWYG. If you changed the margins in the program, the view reflected that change. You weren't limited to one page of text at a time but could quickly wheel through all the text. It didn't have "modes", similar to Vim today, where you needed to pick Create, Edit or Insert. WordPerfect if you started typing it would insert text. It would then push the other text out of the way instead of overwriting it. It clicks as a natural way for text to work on a screen.

Thanks to the magic of emulation, I'm still able to run this software (and in fact am typing this on it right now). It turns out it is just as good as I remember, if not better. If you are interested in how there is a great write-up here. However as good as the software is, it turns out there is an amazing history of WordPerfect available for free online.

Almost Perfect is the story of WordPerfect's rise and fall from the perspective of someone who was there. I loved reading this and am so grateful that the entire text exists online. It contains some absolute gems like:

One other serious problem was our growing reputation for buggy software. Any complex software program has a number of bugs which evade the testing process. We had ours, and as quickly as we found them, we fixed them. Every couple of months we issued improved software with new release numbers. By the spring of 1983, we had already sent out versions 2.20, 2.21, and 2.23 (2.22 was not good enough to make it out the door). Unfortunately, shipping these new versions with new numbers was taken as evidence by the press and by our dealers that we were shipping bad software. Ironically, our reputation was being destroyed because we were efficient at fixing our bugs.

Our profits were penalized as well. Every time we changed a version number on the outside of the box, dealers wanted to exchange their old software for new. We did not like exchanging their stock, because the costs of remanufacturing the software and shipping it back and forth were steep. This seemed like a waste of money, since the bug fixes were minor and did not affect most users.

Our solution was not to stop releasing the fixes, but to stop changing the version numbers. We changed the date of the software on the diskettes inside the box, but we left the outside of the box the same, a practice known in the industry as slipstreaming. This was a controversial solution, but our bad reputation disappeared. We learned that perception was more important than reality. Our software was no better or worse than it had been before, but in the absence of the new version numbers, it was perceived as being much better.

You can find the entire thing here: http://www.wordplace.com/ap/index.shtml

Fixing Macs Door to Door

January 05, 2024 in Apple

When I graduated college in 2008, even our commencement speaker talked about how moving back in with your parents is nothing to be ashamed of. I sat there thinking well that certainly can't be a good sign. Since I had no aspirations and my girlfriend was moving to Chicago, I figured why not follow her. I had been there a few times and there were no jobs in Michigan. We found a cheap apartment near her law school and I started job hunting.

After a few weeks applying to every job on Craigslist, I landed an odd job working for an Apple Authorized Repair Center. The store was in a strip mall in the suburbs of Chicago with a Dollar Store and a Chinese buffet next door. My primary qualifications were that I was willing to work for not a lot of money and I would buy my own tools. My interview was with a deeply Catholic boss who focused on how I had been an alter boy growing up. Like all of my bosses early on, his primary quality was he was a bad judge of character.

I was hired to do something that I haven't seen anyone else talk about on the Internet and wanted to record before it was lost to time. It was a weird program, a throwback to the pre-Apple Store days of Apple Mac support that was called AppleCare Dispatch. It still appears to exist (https://www.apple.com/support/products/mac/) but I don't know of any AASPs still dispatching employees. It's possible that Apple has subcontracted it out to someone else.

AppleCare Dispatch

Basically if you owned a desktop Mac and lived in certain geographic areas, when you contacted AppleCare to get warranty support they could send someone like me out with a part. Normally they'd do this only for customers who were extremely upset or had a store repair go poorly. I'd get a notice that AppleCare was dispatching a part, we'd get it from FedEx and then I'd fill a backpack full of tools and head out to you on foot.

While we had the same certifications as an Apple Genius, unlike the Genius Bar we weren't trained on any sort of "customer service" element. All we did was Mac hardware repairs all day, with pretty tight expectations of turnaround. So how it worked at the time was basically if the Apple Store was underwater with in-house repairs, or you asked for at-home or the customer was Very Important, we would get sent out. I would head out to you on foot with my CTA card.

That's correct, I didn't own a car. AppleCare didn't pay a lot for each dispatch and my salary of $25,000 a year plus some for each repair didn't go far in Chicago even in the Great Recession. So this job involved me basically taking every form of public transportation in Chicago to every corner of the city. I'd show up at your door within a 2 hour time window, take your desktop Mac apart in your house, swap the part, run the diagnostic and then take the old part with me and mail it back to Apple.

Apple provided a backend web panel which came with a chat client. Your personal Apple ID was linked with the web tool (I think it was called ASX) where you could order parts for repairs as well as open up a chat with the Apple rep there to escalate an issue or ask for additional assistance. The system worked pretty well, with Apple paying reduced rates for each additional part after the first part you ordered. This encouraged us all to get pretty good at specific diagnostics with a minimal number of swaps.

Our relationship to Apple was bizarre. Very few people at Apple even knew the program existed, seemingly only senior AppleCare support people. We could get audited for repair quality, but I don't remember that ever happening. Customer satisfaction was extremely important and basically determined the rate we got paid, so we were almost never late to appointments and typically tried to make the experience as nice as possible. Even Apple Store staff seemed baffled by us on the rare occasions we ran into each other.

There weren't a lot of us working in Chicago around 2008-2010, maybe 20 in total. The community was small and I quickly met most of my peers who worked at other independent retail shops. If our customer satisfaction numbers were high, Apple never really bothered us. They'd provide all the internal PDF repair guides, internal diagnostic tools and that was it.

It is still surprising that Apple turned us loose onto strangers without anyone from Apple speaking to us or making us watch a video. Our exam was mostly about not ordering too many parts and ensuring we could read the PDF guide of how to fix a Mac. A lot of the program was a clear holdover from the pre-iPod Apple, where resources were scarce and oversight minimal. As Apple Retail grew, the relationship to Apple Authorized Service Providers got more adversarial and controlling. But that's a story for another time.

Tools etc

For the first two years I used a Manhattan Portage bag, which looked nice but was honestly a mistake. My shoulder ended up pretty hurt after carrying a heavy messenger bag for 6+ hours a day.

The only screwdrivers I bothered with was Wiha precision screwdrivers. I tried all of them and Wiha were consistently the best by a mile. Wiha has a list of screwdrivers by Apple model available here: https://www.wihatools.com/blogs/articles/apple-and-wiha-tools

Macs of this period booted off of FireWire, so that's what I had with me. FireWire 800 LaCie drives were the standard issue drives in the field.

LaCie Rugged Triple 2TB 2TB 5400rpm FireWire 800, USB 3.0 Orange, Sølv (LAC9000448) | Dustin.dk

You'd partition it to have a series of OS X Installers on there (so you could restore the customer back to what they had before) along with a few bootable installs of OS X. These were where you'd run your diagnostic software. The most commonly used ones were as follows:

https://www.cleverfiles.com/pro.html

Mac Repair North York — Remember back when Macs were something you could fix? Crazy times

9/11 Truther

One of my first calls was for a Mac Pro at a private residence. It was a logic board, which means the motherboard of the Mac. I wasn't thrilled, because removing and replacing the Mac Pro logic board was a time-consuming repair that required a lot of light. Instead of a clean workspace with bright lights I got a guy who would not let me go until I had watched how 9/11 was an inside job.

Mac Pro 4.1 2009 Motherboard Logic Board | Kaufen auf Ricardo — The logic board in question

"Look, you don't really think the towers were blown up by planes do you?" he said as he dug around this giant stack of papers to find...presumably some sort of Apple-related document. I had told him that I had everything I needed, but that I had a tight deadline and needed to start right now. "Sure, but I'll put the video on in the background and you can just listen to it while you work." So while I took a Mac Pro down to the studs and rebuilt it, this poorly narrated video explained how it was the CIA behind 9/11.

His office or, "command center", looked like a set of the X-Files. There were folders and scraps of paper everywhere along with photos of buildings, planes, random men wearing sunglasses. I think it was supposed to come across as if he was doing an investigation, but it reminded me more of a neighbor who struggled with hoarding. If there was an organizational system, I couldn't figure it out. Why was this person so willing to dedicate an large portion of their house to "solving a mystery" the rest of us had long since moved on from?

The Mac Pro answered all my questions when it booted up. The desktop was full of videos he had edited of 9/11 truth material along with website assets for where he sold these videos. This guy wasn't just a believer, he produced the stuff. When I finished, we had to run a diagnostic test to basically confirm the thing still worked as well as move the serial number onto the logic board. When it cleared diagnostic I took off, thanking him for his time and wishing him a nice day. He looked devastated and asked if I wanted to go grab a drink at the bar and continue our conversation. I declined, jogging to the L.

The Doctors

One of the rich folks I was sent out to lived in one of those short, super expensive buildings on Lake Shore Drive. For those unfamiliar, these shorter buildings facing the water in Chicago are often divided into a few large houses. Basically you pass through an army of doorman, get shown into an elevator that opens into the persons house. That was, if you could get through the doormen.

The staff in rich peoples houses want to immediately establish with any contractor coming into the home that they're superior to you. This happened to me constantly, from personal assistants to doormen, maids, nannies, etc. Doormen in particular liked to make a big deal of demonstrating that they could stop me from going up. This one stuck out because he made me take the freight elevator, letting me know "the main elevator is for people who live here and people who work here". I muttered about how I was also working there and he rolled his eyes and called me an asshole.

On another visit to a different building I had a doorman physically threaten "to throw me down" if I tried to get on the elevator. The reason was all contractors had to have insurance registered with the building before they did work there, even though I wasn't.....removing wires from the wall. The owner came down and explained that I wasn't going to do any work, I was just "a friend visiting". I felt bad for the doorman in that moment, in a dumb hat and ill-fitting jacket with his brittle authority shattered.

So I took the freight elevator up, getting let into what I would come to see as "the rich persons template home". My time going into rich peoples houses were always disappointing, as they are often a collection of nice items sort of strewn around. I was shown by the husband into the library, a beautiful room full of books with what I (assumed) were prints of paintings in nice frames leaning against the bookshelves. There was an iMac with a dead hard drive, which is an easy repair.

The process for fixing a hard drive was "boot to DiskWarrior, attempt to fix disk, have it fail, swap the drive". Even if DiskWarrior fixed the Mac and it booted, I would still swap the drive (why not and it's what I was paid to do) but then I didn't have to have the talk. This is where I would need to basically sit someone down and tell them their data was gone. "What about my taxes?!?" I would shake my head sadly. Thankfully this time the drive was still functional so I could copy the data over with a SATA to USB adapter.

As I reinstalled OS X, I walked around the room and looked at the books. I realized they were old, really old and the paintings on the floor were not prints. There were sketches by Picasso, other names I had heard in passing through going through art museums. When he came back in, I asked why there was a lot of art. "Oh sure, my dads, his big collection, I'm going to hang it up once we get settled." He, like his wife, didn't really acknowledge my presence unless I directed a question right at him. I started to google some of the books, my eyes getting wide. There was millions of dollars in this room gathering dust. He never made eye contact with me during this period and quickly left the room.

This seems strange but was really common among these clients. I truly think for many of the C-level type people whose house I went to, they didn't really even see me. I had people turn the lights off in rooms I was in, forget I was there and leave (while arming the security system). For whatever reason I instantly became part of the furniture. When I went to the kitchen for a drink of water, the maid let me know that they have lived there for coming up on 5 years.

This was surprising to me because the apartment looked like they had moved in two weeks ago. There were still boxes on the floor, a tv sitting on the windowsill and what I would come to understand was a "prop fridge". It had bottled water, a single bottle of expensive champagne, juices, fruit and often some sort of energy drink. No leftovers, everything gets swapped out before it goes bad and gets replaced. "They're always at work" she explained, grabbing her bag and offering to let me out before she locked up. They were both specialist doctors and this was apparently where they recharged their batteries.

After the first AppleCare Dispatch visit they would call me back for years to fix random problems. I don't think either of them ever learned my name.

HARPO Studio

I was once called to fix a "high profile" computer at HARPO studios in Chicago. This was where they filmed the Oprah Winfrey Show, which I obviously knew of the existence of but had never watched. Often these celebrity calls went to me, likely because I didn't care and didn't particularly want them. I was directed to park across the street and told even though the signs said "no parking" that they had a "deal with the city".

This repair was suspicious and I got the sense that someone had name dropped Oprah to maybe get it done. AppleCare rarely sent me multiple parts unless the case was unusual or the person had gotten escalated through the system. If you emailed Steve Jobs back in the day and his staff approved a repair, it attached a special code to the serial number that allowed us to order effectively unlimited items against the serial number. However with the rare "celebrity" case, we would often find AppleCare did the same thing, throwing parts at us to make the problem go away.

The back area of HARPO was busy, with what seemed like an almost exclusively female staff. "Look it's important that if you see Oprah, you act normally, please don't ask her for an autograph or a photo". I nodded, only somewhat paying attention because never in a million years would I do that. This office felt like the set of The West Wing, with people constantly walking and talking along with a lot of hand motions. My guide led me to a back office with a desk on one side and a long table full of papers and folders. The woman told me to "fix the iMac" and left the room.

harpo-studio-chicago-office-sterling-bay-joshpabstphoto-(9) — Architecture Photography | Commercial Real Estate Photographer — Not the exact office but you get the jist

I swapped the iMac hard drive and screen, along with the memory and wifi then dived under the desk the second Oprah walked in. The woman and Oprah had a conversation about scheduling someone at a farm, or how shooting at a farm was going and then she was gone. When I popped my head up, the woman looked at me and was like "can you believe you got to meet Oprah?" She had a big smile, like she had given me the chance of a lifetime.

The bummer about the aluminum iMac repairs is you have to take the entire screen off to get anything done. This meant I couldn't just run away and hide my shame after effectively diving under a table to escape Oprah, a woman who I am certain couldn't have cared less what came out of my mouth. I could have said "I love to eat cheese sometimes" and she would have nodded and left the room.

NO TOOLS NEEDED How to replace your 27 inch iMac screen glass monitor - YouTube

So you have to pop the glass off (with suction cups, not your hands like a psycho as shown above), then unscrew and remove the LCD and then finally you get access to the actual components. Any dust that got on the LCD would stick and annoy people, so you had to try and keep it as clean as possible while moving quickly to get the swap done. The nightmare was breaking the thick cables that connected the screen to the logic board, something I did once and required a late night trip to an electronics repair guy who got me sorted out with a soldering iron.

The back alley electronics repair guy is the dark secret of the Dispatch world. If you messed up a part, pulled a cable or broke a connector, Apple could ask you to pay for that part. The Apple list price for parts were hilariously overpriced. Logic boards were like $700-$900, each stick of RAM was like $90 for ones you could buy on crucial for $25. This could destroy your pay for that month, so you'd end up going to Al, who ran basically a "solder Apple stuff back together" business in his garage. He wore overalls and talked a lot about old airplanes, which you'd need to endure in order to get the part fixed. Then I'd try to get the part swapped and just pray that the thing would turn on long enough for you to get off the property. Ironically his parts often lasted longer than the official Apple refurbished parts.

After I hid under the desk deliberately, I lied for years afterwards, telling people I didn't have time to say hi. In reality my mind completely blanked when she walked in. I stayed under the desk because I was nervous that everyone was going to look at me to be like "I loved when you did X" and my brain couldn't form a single memory of anything Oprah had ever done. I remembered Tom Cruise jumping on a couch but I couldn't recall if this was a good thing or a bad thing when it happened.

Oh and the car that I parked in the area the city didn't enforce? It had a parking ticket, which was great because I had borrowed the car. Most of the payment from my brush with celebrity went to the ticket and a tank of gas.

Brownstone Moms

One of the most common calls I got was to rich peoples houses in Lincoln Park, Streeterville, Old Town and a few other wealthy neighborhoods. They often live in distinct brownstone houses with small yards with a "public" entrance in the front, a family entrance on the side and then a staff entrance through the back or in the basement.

These houses were owned by some of the richest people in Chicago. The houses themselves were beautiful, but they don't operate like normal houses. Mostly they were run by the wives, who often had their own personal assistants. It was an endless sea of contractors coming in and out, coordinated by the mom and sometimes the nanny.

Once I was there, they'd pay me to do whatever random technical tasks existed outside of the initial repair. I typically didn't mind since I was pretty fast at the initial repair and the other stuff was easy, mostly setting up printers or routers. The sense I got was if the household made the AppleCare folks life a living hell, I would get sent out to make the problem disappear. These people often had extremely high expectations of customer service, which could be difficult at times.

There was a whole ecosystem of these small companies I started to run into more and more. They seemed to specialize in catering to rich people, providing tutoring services, in-house chefs, drivers, security and every other service under the sun. One of the AV installation companies and I worked together off the books after-hours to set up Apple TVs and Mac Minis as the digital media hubs in a lot of these houses. They'd pay me to set up 200 iPods as party favors or wire an iPad into every room.

Often I'd show up only to tell them their hard drive was dead and everything was gone. This was just how things worked before iCloud Photos, nobody kept backups and everything was constantly lost forever. Here they would often threaten or plead with me, sometimes insinuating they "knew people" at Apple or could get me fired. Jokes on you people, I don't even know people at Apple was often what ran through my head. Threats quickly lost their power when you realized nobody at any point had asked your name or any information about yourself. It's hard to threaten an anonymous person.

The golden rule that every single one of these assistants warned me about was not to bother the husband when he gets home. Typically these CEO-types would come in, say a few words to their kids and then retreat to their own area of the house. These were often TV rooms or home theaters, elaborate set pieces with $100,000+ of AV equipment in there that was treated like it was a secret lair of the house. To be clear, none of these men ever cared at all that I was there. They didn't seem to care that anybody was there, often barely acknowledging their wives even though an immense amount of work had gone into preparing for his return.

As smartphones became more of a thing, the number of "please spy on my teen" requests exploded. These varied from installing basically spyware on their kids laptops to attempting to install early MDM software on the kids iPhones. I was always uncomfortable with these jobs, in large part because the teens were extremely mean to me. One girl waited until her mom left the room to casually turn to me and say "I will pay you $500 to lie to my mom and say you set this up".

I was offended that this 15 year old thought she could buy me, in large part because she was correct. I took the $500 and told the mom the tracking software was all set up. She nodded and told me she would check that it was working and "call me back if it wasn't". I knew she was never going to check, so that part didn't spook me. I just hoped the kid didn't get kidnapped or something and I would end up on the evening news. But I was also a little short that month for rent so what can you do.

Tip for anyone reading this looking to get into this rich person Mac business

So the short answer is Time Machine is how you get paid month after month. Nobody checks Time Machine or pays attention to the "days since" notification. I wrote an AppleScript back in the day to alert you to Time Machine failures through email, but there is an app now that does the same thing: https://tmnotifier.com/

Basically when the backups fail, you schedule a visit and fix the problem. When they start to run out of space, you buy a new bigger drive. Then you backup the Time Machine to some sort of encrypted external location so when the drive (inevitably) gets stolen you can restore the files. The reason they keep paying you is you'll get a call at some point to come to the house at a weird hour and recover a PDF or a school assignment. That one call is how you get permanent standing appointments.

Nobody will ever ask you how it works, so just find the system you like best and do that. I preferred local Time Machine over something like remote backup only because you'll be sitting there until the entire restore is done and nothing beats local. Executives will often fill the "family computer" with secret corporate documents they needed printed off, so be careful with these backups. Encrypt, encrypt, encrypt then encrypt again. Don't bother explaining how the encryption works, just design the system with the assumption that someone will at some point put a CSV with your social security number onto this fun family iMac covered in dinosaur stickers.

Robbed for Broken Parts

A common customer for repairs would be schools, who would work with Apple to open a case for 100 laptops or 20 iMacs at a time. I liked these "mass repair" days, typically because the IT department for Chicago Public Schools would set us up with a nice clean area to work and I could just listen to a podcast and swap hard drives or replace top cases. However this mass repair was in one of Chicago's rougher neighborhoods.

Personal safety was a common topic among the dispatch folks when we would get together for a pizza and a beer. Everyone had bizarre stories but I was the only one not working out of my car. The general sense among the community was it was not an "if" but a "when" until you were robbed. Typically my rule was if I started to get nervous I'd "call back to the office" to check if a part had arrived. Often this would calm people down, reminding them that people knew where I was. Everyone had a story of getting followed back to their car and I had been followed back to the train once or twice.

On this trip though everything went wrong that could go wrong. My phone, the HTC Dream running v1 of Android had decided to effectively "stop phoning". It was still on but decided we were not, in fact, in the middle of a large city. I was instead in a remote forest miles away from a cell tower. I got to the school later than I wanted to be there, showing up at noon. When I tried to push it and come back the next day the staff let me know the janitors knew I would be there and would let me out.

So after replacing a ton of various Mac parts I walked out with boxes of broken parts in my bag and a bunch in an iMac box that someone had given me. My plan was I would head back home, get them checked in and labeled and then drop them off at a FedEx store. When I got out and realized it was dark, I started to accept something bad was likely about to happen to me. Live in a city for any amount of time and you'll start to develop a subconscious odds calculator. The closing line on this wasn't looking great.

Sure enough while waiting for the bus, I was approached by a man who made it clear he wanted the boxes. He didn't have a weapon but started to go on about "kicking the shit" out of me and I figured that was good enough for me. He clearly thought there was an iMac in the box and I didn't want to be here when he realized that wasn't true. I handed over my big pile of broken parts and sprinted to the bus that was just pulling up, begging the driver to keep driving. As a CTA bus driver, he had of course witnessed every possible horror a human can inflict on another human and was entirely unphased by my outburst. "Sit down or get off the bus".

When I got home I opened a chat with as Apple rep who seemed unsure of what to do. I asked if they wanted me to go to the police and the rep said if I wanted to I could, but after "talking to some people on this side" they would just mark the parts as lost in transit and it wouldn't knock my metrics. I thanked them and didn't think much more of the incident until weeks later when someone from Apple mailed me a small Apple notebook.

They never directly addressed the incident (truly the notebook might be unrelated) but I always thought the timing was funny. Get robbed, get a notebook. I still have the notebook.

Questions/comments/concerns? Find me on Mastodon: https://c.im/@matdevdug

Tech and the Twilight of Democracy

December 22, 2023

We live in dangerous times. The average level of peacefulness around the world has dropped for the 9th straight year. The impact of violence on the global economy increased by $1 trillion to a record $17.5 trillion. This is equivalent to 13% of global GDP, approximately $2,200 per person. The graphs seem to be trending in the wrong direction by virtually any metric you can imagine. [Source]

It can be difficult to point to who are the "most powerful countries", but I think by most metrics the following countries would certainly fall into that list. These leaders of the world paint a dire picture for the future of democratic rule. In no particular order:

United States: currently ranked as a Deficient Democracy and a country who is facing the very real possibility of the upcoming presidential election being its last. The current president, despite low crime and good economic numbers, is facing a close race and a hard reelection. His challenger, Donald Trump, has promised the following:

“We pledge to you that we will root out the Communists, Marxists, fascists, and the radical-left thugs that live like vermin within the confines of our country, that lie and steal and cheat on elections,” Donald Trump said this past November, in a campaign speech that was ostensibly honoring Veterans Day. “The real threat is not from the radical right; the real threat is from the radical left … The threat from outside forces is far less sinister, dangerous, and grave than the threat from within. Our threat is from within.”

Given his strong polling there is no reason to think the US will not fall from Deficient Democracy to Hybrid Regime or even further.

China: in the face of increased economic opportunity and growth, there was a hope that China would grow to become more open. If anything, China trends in a very different direction. China is considered to be amongst the least democratic countries in the world.

Over the past 10 years, the Communist Party has moved from collective leadership with the general secretary, considered first among equals on the elite Politburo Standing Committee — a practice established in the “reform and opening” era after the Cultural Revolution — to Xi’s supreme leadership, analysts say.

In 2018, Chinese lawmakers amended the constitution abolishing presidential term limits - paving the way for Xi to rule for life. In a further move to assert his authority, the party pledged to uphold the "Two Establishes,” party-speak for loyalty to him, in a historical resolution passed in 2021.

[Source]

EU: Currently they stand alone as keeping the development of democracy alive. However even here they have begun to pass more extreme anti-immigration legislation as an attempt to appease right-leaning voters and keep the more extreme political parties out of office. France recently passed a hard-line anti-immigrant bill designed specifically to keep Le Pen supporters happy [source] and in Germany the desire for a dictator has continued to grow. Currently, across all age groups, between 5-7% of those surveyed support a dictatorship with a single strong party and leader for Germany. This result is double the long-term average. [source]
India: Having been recently downgraded to a Hybrid Regime, India is currently in the process of an aggressive consolidation of power by the executive with the assistance of both old and new laws.

The Modi government has increasingly employed two kinds of laws to silence its critics—colonial-era sedition laws and the Unlawful Activities Prevention Act (UAPA). Authorities have regularly booked individuals under sedition laws for dissent in the form of posters, social-media posts, slogans, personal communications, and in one case, posting celebratory messages for a Pakistani cricket win. Sedition cases rose by 28 percent between 2010 and 2021. Of the sedition cases filed against citizens for criticizing the government, 96 percent were filed after Modi came to power in 2014. One report estimates that over the course of just one year, ten-thousand tribal activists in a single district were charged with sedition for invoking their land rights.

The Unlawful Activities Prevention Act was amended in 2019 to allow the government to designate individuals as terrorists without a specific link to a terrorist organization. There is no mechanism of judicial redress to challenge this categorization. The law now specifies that it can be used to target individuals committing any act “likely to threaten” or “likely to strike terror in people.” Between 2015 and 2019, there was a 72 percent increase in arrests under the UAPA, with 98 percent of those arrested remaining in jail without bail.

[Source]

Russia: There has been a long-standing debate over whether Russia was a full dictatorship or some hybrid model. The invasion of Ukraine seems to have put all those questions to bed.

On 8 December, Andrey Klishas, the Head of the Federation Council Committee on Constitutional Legislation, made a point in an interview with Vedomosti which was already tacitly understood by Russia-watchers, but still shocking to hear. In answer to a question on why the partial mobilisation decree had not been repealed now the process was completed, he explained to the Kremlin-friendly correspondent there was no need for legislation: ‘There is no greater power than the President’s words.’ So there it is – Russia is by definition a dictatorship. For the unawares reader, Vedomosti was one of Russia’s leading, intelligent and independent newspapers; it fell afoul of the authorities and today is a government propaganda channel.

[Source]

We have no reason at this point to think this trend will slow or reverse itself. It appears that, despite the constant refrain of my childhood that progression towards democracy was an inevitable result of free and open trade, this was another neoliberal fantasy. We live in a world where the most powerful countries are actively trending away from what we would consider to be core democratic values and towards more xenophobic and authoritarian governments.

However I'm not here to lecture, only to lay the foundation. In the face of this data, I thought it could be interesting to discuss some what-ifs, trying to imagine what the future of technology will look like in the face of this strong global anti-democratic trend. What technologies will we all be asked to make and what concessions will be forced upon us?

Disclaimer: I am not an expert on foreign policy, or really anything. Approach these topics not as absolute truths but as discussion points. I will attempt to provide citations and factual basis for my guesses, but as always feel free to disagree. Don't send me threatening messages as sometimes happens when I write things like this. I don't care about you and don't read them.

So let's make some predictions. What kind of world are we heading into? What are the major trends and things to look out for.

The Internet Stops Being Global

The Internet has always been a fractured thing. Far from the dream of perfectly equal traffic being carried across the fastest route between user and service, the real internet is a complicated series of arrangements between the tiers of ISPs and the services that ride those rails. First, what is the internet?

The thing we call the Internet is a big collection of separate, linked systems, each of which is managed as a single domain called an Autonomous Systems (AS). There are over sixty thousand AS numbers (ASNs) assigned to a wide variety of companies, educational, non-profit and government entities. The AS networks that form the primary transport for the Internet are independently controlled by Internet Service Providers (ISPs). The BGP protocol binds these entities together.

When we talk about ISPs, we're talking about 3 tiers. Tier 1 are defined by not paying to have their traffic delivered through similar-sized networks, can deliver to the whole internet, peer on multiple continents and have direct access to a fiber cable in the ocean. Tier 2 provide paid transit through Tier 1 and through peering with other Tier 2. Tier 3 is what hooks up end users and businesses and connects to a Tier 2.

The internet is not as reliable as some people pretend it is. Instead its a very fragile entity well within the governmental scope of the countries where the pieces reside. As governments become less open, their Internet becomes less open. India regularly shuts down the Internet to stop dissent or to control protests or any civil unrest (source) and I would expect that to grow into even more extreme regulations as time goes on.

The "IT Rules 2011" were adopted in April 2011 as a supplement to the 2000 Information Technology Act (ITA). The new rules require Internet companies to remove within 36 hours of being notified by the authorities any content that is deemed objectionable, particularly if its nature is "defamatory," "hateful", "harmful to minors", or "infringes copyright". Cybercafé owners are required to photograph their customers, follow instructions on how their cafés should be set up so that all computer screens are in plain sight, keep copies of client IDs and their browsing histories for one year, and forward this data to the government each month.

China has effectively made its own Internet and Russia is currently in the process of doing the same thing (source). The US has its infamous Section 702.

Section 702 of the Foreign Intelligence Surveillance Act permits the U.S. government to engage in mass, warrantless surveillance of Americans’ international communications, including phone calls, texts, emails, social media messages, and web browsing. The government claims to be pursuing vaguely defined foreign intelligence “targets,” but its targets need not be spies, terrorists, or criminals. They can be virtually any foreigner abroad: journalists, academic researchers, scientists, or businesspeople.

[source]

As time progresses I would expect to see the restrictions on Internet traffic to increase, not decrease. Much is made of the sanctity of encrypted messages between individuals, but in practice this is less critical since even if the message body is itself encrypted, the metadata often isn't. The reality is even if the individual messages between people are encrypted, a graph of relationships is still possible through all the additional information around the message.

Predictions

Expect to see more pressure placed on ISPs and less on tech companies. Google, Apple, Meta and others have shown some willingness to buck governmental pressure. However given the growth in cellular data usage and the shift of consumers from laptops/desktops to mobile, expect to see more restrictions at the mobile cellular network where even simple DNS blocking or tracking is harder to stop.

Widespread surveillance of all Internet traffic will continue to grow and governments will become more willing to turn off or greatly limit Internet access in the face of disruptions or threats. Expect to see even regional governments able to turn off mobile Internet in the face of protests or riots.
Look to the war in Gaza as an example of what this might look like.

Palestine-Israel Conflict Impacts Internet Access

Shutting off the Internet will be a more common tactic to limit the flow of information out and to disrupt attempts to organize or communicate across members of the opposition.

As of me writing this there are 8 ongoing governmental Internet shutdowns and 119 in the last 12 months. I would expect this pace to dramatically increase.[source]

The end result of all of these disruptions will be an increasingly siloed Internet specific to your country. It'll be harder for normal people on the ground in a crisis or governmental crackdown to tell people what is happening and, with the next technology, easier for those forces to make telling what is happening on the ground next to impossible.

LLMs Make Telling the Truth Impossible

Technology was supposed to usher in an age of less-centralized truth. No longer would we be reliant on the journalists of the past. Instead we could get our information directly from the people on the ground without filtering or editorializing. The goal was a more fair version of news that was more honest and less manipulated.

The actual product is far from that. Social media has become a powerful tool for propaganda, with algorithms designed to keep users engaged with content they find relevant allowing normal people access to conspiracy theories and propaganda with no filters or ethics. Russia and China, following a new version of their old Cold War playbooks, have excelled at this new world of disinformation, making it difficult to tell what is real and what is fake.

In 20 years we'll look back at this period as being the almost innocent beginning of this trend. With realistic deepfakes, it will soon be impossible to tell what a leader did or didn't say. Since China, Russia and increasingly the US have no concept of "ethical journalism" and either answer to government leaders or a desire for more ratings, it will soon be possible to create entirely false news streams that cater to whatever viewpoint your audience finds appealing at that time.

Predictions

Future conflicts will find social media immediately swamped with LLM backed accounts attempting to create the perception that even a deeply disliked action (a Chinese blockage or invasion of Taiwan) is more nuanced. World leaders will find it difficult to tell what voters actually think and it will be hard to form consensus across political affiliations even on seemingly straight-forward issues.
Politicians and their supporters will use the possibility of deepfakes to attempt to explain away any video or image of them engaging in nefarious actions. Even if deepfakes aren't widely deployed, the possibility of them will transition us into a post-truth reality. Even if you watch a video of the president giving a speech advocating something truly terrible, supporters will be able to dismiss it without consideration.
Technology companies, facing a closed Internet and increasingly hostile financial landscape, will inevitably provide this technology as a service. Expect to see a series of cut-out companies but the underlying technology will be the same.
We won't ever find reliable LLM detection technology and there won't be a way to mass filter out this content from social media.
Even if you are careful about your consumption of media, it will be very hard to tell truth from fabrication for savvy consumers of information. Even if you are not swayed by the LLM generated content, you will not be able to keep up with the sheer output with conventional fact checking.

Global Warming (and War) Kills the Gadget

We know that Global Warming is going to have a devastating impact on shipping routes around the world. We're already seeing more storms impacting ports that are absolutely critical to the digital logistics chain.

With the COP28 conference a complete failure and none of the countries previously mentioned interested in addressing Global Warming, expect to see this trend continue unchecked. Without democratic pressures, we would expect to see countries like India, China, the US and others continue to take the most profitable course of action regardless of long-term cost.

The net result will be a widespread disruption in the complicated supply chain that provides the hardware necessary to continue to grow the digital economy. It will be more difficult for datacenters, mobile network providers and individual consumers to get replacement parts for hardware or to upgrade that hardware. Since much of the manufacturing expertise required to make these parts is almost exclusively contained within the impacted zones, setting up alternative factories will be difficult or impossible.

What’s likely incentivizing semiconductor makers more than government dollars are geopolitical changes. Taiwan is potentially a major choke point in any electronics supply chain. Any electronic part, whether for a smart phone, a television, a home computer, or a data center likely includes critical components that came through Taiwan.

“If you look across the Taiwan Strait, you’ve got this 900-pound gorilla called China that is saying 'Taiwan belongs to us, and if you won’t give it to us, we’ll take it at some point,'” Johnson said. “What would happen to the semiconductor industry if TSMCs fabs were destroyed? Disaster.”

Before Chinese President Xi Jinping became president in 2012, Western nations had a relatively healthy trade relationship with China. Since that time, it has become more contentious.

“Before Xi came in power, we had this great trade relationship. And there was the belief that if you treated China like a grown-up partner, they’d start acting like one; that turned out to be a very bad assumption,” Johnson said. “So yeah, the idea of bringing the entire supply chain back to the US? Probably not practical.

"But you want to figure out how to diversify away from China as much as you can. I don’t consider China a reliable business partner anymore.”

[source]

Predictions

As relations with China continue to degrade, expect to see tech companies struggle to find replacements for difficult to manufacture parts.
Even among countries where relations are good, the decision to ignore Global Warming means we'll see increased severe disruption of maritime shipping with destruction or flooding of vulnerable ports causing massive parts shortages.
It'll be harder to replace devices and harder to fix the ones you already have
Expect to see a lot of "right to repair" bills as governments, unable to solve the logistical struggles, will push the issue down to being the responsibility of tech companies who will need to change their designs and manufacturing locations.
Also expect to see the same model of something in the field for a lot longer. A cellphone or random IoT device will go from being easy to replace overnight to possibly involving a multi-week or even several month delay. Consumers will come to expect that they will be able to keep technology operational for longer.

Tech Companies will be Pressured to Comply

We currently live in a strange middle period where companies can still (mostly) say no to governments. While there are consequences, these are mostly financial or limitations on where the company can sell their products. However that period appears to be coming to an end. Governments around the world are looking at Big Tech and looking to apply regulations to those businesses. [source]

More governments arrested users for nonviolent political, social, or religious speech than ever before. Officials suspended internet access in at least 20 countries, and 21 states blocked access to social media platforms. Authorities in at least 45 countries are suspected of obtaining sophisticated spyware or data-extraction technology from private vendors.

Expect to see governments step up their expectations of what Tech is willing to do for them. Being told it is "impossible" to get information out of an encrypted exchange will get less and less traction.
Platforms like YouTube will be under immense pressure to either curtail fake video or promote face video promoted by the government in question. Bans or slowdowns will be common-place.
Getting users to provide more government ID under the guise of protecting underage users so that social media accounts can result in more effective criminal prosecution will become common.

Conclusion

Technology is not immune to changes in political structure. As we trend away from free and open communication across borders and towards more closed borders and war, we should expect to see technology reflect those changes. Hopefully this provides you with some interesting things to consider.

Whether these trends are reversible or not is not for me to say. I have no idea how to make a functional democracy, so fixing it is beyond my skills. I do hope I'm wrong, but I feel my predictions fit within the data I was able to find.

As always I'm open to feedback. The best place to find me is on Mastodon: https://c.im/@matdevdug

Why Kubernetes needs an LTS

December 04, 2023

There is no denying that containers have taken over the mindset of most modern teams. With containers, comes the need to have orchestration to run those containers and currently there is no real alternative to Kubernetes. Love it or hate it, it has become the standard platform we have largely adopted as an industry. If you exceed the size of docker-compose, k8s is the next step in that journey.

Despite the complexity and some of the hiccups around deploying, most organizations that use k8s that I've worked with seem to have positive feelings about it. It is reliable and the depth and width of the community support means you are never the first to encounter a problem. However Kubernetes is not a slow-moving target by infrastructure standards.

Kubernetes follows an N-2 support policy (meaning that the 3 most recent minor versions receive security and bug fixes) along with a 15-week release cycle. This results in a release being supported for 14 months (12 months of support and 2 months of upgrade period). If we compare that to Debian, the OS project a lot of organizations base their support cycles on, we can see the immediate difference.

Red Hat, whose entire existence is based on organizations not being able to upgrade often, shows you at what cadence some orgs can roll out large changes.

Now if Kubernetes adopted this cycle across OSS and cloud providers, I would say "there is solid evidence that it can be done and these clusters can be kept up to date". However cloud providers don't hold their customers to these extremely tight time windows. GCP, who has access to many of the Kubernetes maintainers and works extremely closely with the project, doesn't hold customers to anywhere near these timelines.

Neither does AWS or Azure. The reality is that nobody expects companies to keep pace with that cadence of releases because the tooling to do so doesn't really exist. Validating that a cluster can be upgraded and that it is safe to do so requires the use of third-party tooling or to have a pretty good understanding of what APIs are getting deprecated when. Add in time for validating in staging environments along with the sheer time involved in babysitting a Kubernetes cluster upgrade and a clear problem emerges.

What does upgrading a k8s cluster even look like?

For those unaware of what a manual upgrade looks like, this is the rough checklist.

Check all third-party extensions such as network and storage plugins
Update etcd (all instances)
Update kube-apiserver (all control plane hosts)
Update kube-controller-manager
Update kube-scheduler
Update the cloud controller manager, if you use one
Update kubectl
Drain every node and either replace the node or upgrade the node and then readd and monitor to ensure it continues to work
Run kubectl convert as required on manifests

None of this is rocket science and all of it can be automated, but it still requires someone to effectively be super on top of these releases. Most importantly it is not substantially easier than making a new cluster. If upgrading is, at best, slightly easier than making a new cluster and often quite a bit harder, teams can get stuck unsure what is the correct course of action. However given the aggressive pace of releases, spinning up a new cluster for every new version and migrating services over to it can be really logistically challenging.

Consider that you don't want to be on the .0 of a k8s release, typically .2. You lose a fair amount of your 14 month window waiting for that criteria. Then you spin up the new cluster and start migrating services over to it. For most teams this involves a fair amount of duplication and wasted resources, since you will likely have double the number of nodes running for at least some period in there. CI/CD pipelines need to get modified, docs need to get changed, DNS entries have to get swapped.

None of this is impossible stuff, or even terribly difficult stuff, but it is critical and even with automation the risk of one of these steps failing silently is high risk enough that few folks I know would fire and forget. Instead clusters seem to be in a state of constant falling behind unless the teams are empowered to make keeping up with upgrades a key value they bring to the org.

My experience with this has been extremely bad, often joining teams where a cluster has been left to languish for too long and now we're running into concerns over whether it can be safely upgraded at all. Typically my first three months running an old cluster is telling leadership I need to blow our budget out a bit to spin up a new cluster and cut over to it namespace by namespace. It's not the most gentle onboarding process.

Proposed LTS

I'm not suggesting that the k8s maintainers attempt to keep versions around forever. Their pace of innovation and adding new features is a key reason the platform has thrived. What I'm suggesting is a dead-end LTS with no upgrade path out of it. GKE allowed customers to be on 1.24 for 584 days and 1.26 for 572 days. Azure has a more generous LTS date of 2 years from the GA date and EKS from AWS is sitting at around 800 days that a version is supported from launch to end of LTS.

These are more in line with the pace of upgrades that organizations can safely plan for. I would propose an LTS release with a 24 months of support from GA and an understanding that the Kubernetes team can't offer an upgrade to the next LTS. The proposed workflow for operations teams would be clusters that live for 24 months and then organizations need to migrate off of them and create a new cluster.

This workflow makes sense for a lot of reasons. First creating fresh new nodes at regular intervals is best practice, allowing organizations to upgrade the underlying linux OS and hypervisor upgrades. While you should obviously be upgrading more often than every 2 years, this would be a good check-in point. It also means teams take a look at the entire stack, starting with a fresh ETCD, new versions of Ingress controllers, all the critical parts that organizations might be loathe to poke unless absolutely necessary.

I also suspect that the community would come in and offer a ton of guidance on how to upgrade from LTS to LTS, since this is a good injection point for either a commercial product or an OSS tool to assist with the process. But this wouldn't bind the maintainers to such a project, which I think is critical both for pace of innovation and just complexity. K8s is a complicated collection of software with a lot of moving pieces and testing it as-is already reaches a scale most people won't need to think about in their entire careers. I don't think its fair to put this on that same group of maintainers.

LTS WG

The k8s team is reviving the LTS workgroup, which was disbanded previously. I'm cautiously optimistic that this group will have more success and I hope that they can do something to make a happier middle ground between hosted platform and OSS stack. I haven't seen much from that group yet (the mailing list is empty: https://groups.google.com/a/kubernetes.io/g/wg-lts) and the Slack seems pretty dead as well. However I'll attempt to follow along with them as they discuss the suggestion and update if there is any movement.

I really hope the team seriously considers something like this. It would be a massive benefit to operators of k8s around the world to not have to be in a state of constantly upgrading existing clusters. It would simplify the third-party ecosystem as well, allowing for easier validation against a known-stable target that will be around for a little while. It also encourages better workflows from cluster operators, pushing them towards the correct answer of getting in the habit of making new clusters at regular intervals vs keeping clusters around forever.

AI is Already Killing Books

November 24, 2023

I love reading. It is the thing on this earth that brings me the most joy. I attribute no small part of who I am and how I think to the authors who I have encountered in my life. The speed by which LLMs are destroying this ecosystem is a tragedy that we're not going to understand for a generation. We keep talking about it as an optimization, like writing is a factory and books are the products that fly off the line. I think it's a tragedy that will cause people to give up on the idea of writing as a career, closing off a vital avenue for human expression and communication.

Books, like everything else, has evolved in the face of the internet. For a long time publishers were the ultimate gatekeepers and authors tried to eek out an existence by submitting to anyone who would read their stuff. Most books were commercial failures, but some became massive hits. Then eBooks came out and suddenly authors could bypass the publishers and editors of the world to get directly to readers. This was promised to unleash a wave of quality the world had never seen before.

In practice, to put it kindly, eBooks are a mixed success. Some authors benefited greatly from the situation, able to establish strong followings and keep a much higher percentage of their revenue than they would with a conventional publisher. Most released a book, nobody ever read it and that was it. However there was a medium success, where authors could find a niche and generate a pretty reliable stream of income. Not giant numbers, but even 100 copies sold or borrowed under Kindle Unlimited a month spread out across enough titles can let you survive.

AI-written text is quickly filling these niches, since scammers are able to identity lucrative subsections where it might not be worth a year of a persons life to try and write a book this audience will like, but having a machine generate a book and throw it up there is incredibly cheap. I'm seeing them more and more, these free on Kindle Unlimited books with incredibly specific topics that seem tailored towards getting recommended to users in sub-genres.

There is no feeling of betrayal like thinking you are about to read something that another person slaved over, only to discover you've been tricked. They had an idea, maybe even a good idea and instead of putting in the work and actually sitting there crafting something worth my precious hours on this Earth to read, they wasted my time with LLM dribble. Those too formal, politically neutral, long-winded paragraphs stare back at me as the ultimate indictment of how little of a shit the person who "wrote this" cared about my experience reading it. It's like getting served a microwave dinner at a sit down restaurant.

Maybe you don't believe me, or see the problem. Let me at least try to explain why this matters. Why the relationship between author and reader is important and requires mutual respect. Finally why this destruction is going to matter in the decades to come.

TLDR

Since I know a lot of people aren't gonna read the whole thing (which is fine), let me just bulletpoint my responses to anticipated objections addressed later.

LLMs will let people who couldn't write books before do it. That isn't a perk. Part of the reason people invest so many hours into reading is because we know the author invested far more in writing. The sea of unread, maybe great books, was already huge. This is expanding the problem and breaking the relationship of trust between author and reader.
It's not different from spellcheck or grammar check. It is though and you know that. Those tools made complex lookups easier against a large collection of rules, this is generating whole blobs of text. Don't be obtuse.
They let me get my words down with less work. There is a key thing about any creative area but especially in writing that people forget. Good writing kills its darlings. If you don't care enough about a section to write it, then I don't care enough to read it. Save us both time and just cut it.
Your blog is very verbose. I never said I was a good writer.
The market will fix the problem. The book market relies on a vast army of unpaid volunteers to effectively sacrifice their time and wade through a sea of trash to find the gems. Throwing more books at them just means more gems get lost. Like any volunteer effort, the pool of people doesn't grow at the same rate as the problem.
How big of a problem is it? Considering how often I'm seeing them, it feels big, but it is hard to calculate a number. It isn't just me link

Why Does It Matter?

Allow me to veer into my personal background to provide context on why I care. I grew up in small towns across rural Ohio, places where the people who lived there either had no choice but to stay or chose to stay because of the simple lifestyle and absolute consensus on American Christian values. We said the Pledge of Allegiance aggressively, we all went to church on Sunday, gay people didn't exist and the only non-white people we saw were the migrant farm workers who we all pretended didn't exist living in the trailers around farms surrounding the town. As a kid it was fine, children are neither celebrated or hated in this culture, instead we were mostly left alone.

There is a violent edge to these places that people don't see right away. You aren't encouraged to ask a lot of questions about the world around you. We were constantly flooded with religious messaging, at school, home, church, church camp, weekly classes at night or bible studies, movies and television that was specifically encouraged because they had a religious element. Anything outside of this realm was met with a chilly reaction from most adults, if not outright threats of violence. My parents didn't hit me, but I was very much in the minority of my group. More than once we turned the sound up on a videogame or tv to drown out the sobs of a child being struck with a hand or belt while we were at a friends house.

Small town opinion turns on a dime and around 4th grade it turned on me. Everyone knows your status because there aren't a lot of people so I couldn't just go hang out in a new neighborhood. Suddenly I had a lot of alone time, which I filled with reading. These books didn't just fill time, they made me invisible. I had something to do during lunch, recess, whenever. Soon I had consumed everything within the children's section of the library I was interested in reading and graduated to the adult section.

Adult Section

Bryan Main Library | Williams County Public Library — Not a terribly impressive building that I spent a lot of time in.

I was fortunate enough not to grow up today, where this loneliness and anger might have found an online community. They would reinforce my feelings, confirming that I was in the right and everyone else was in the wrong. If they rejected me, I would have wandered until I found another group. The power of the internet is the ability to self-select for your level of depravity.

Instead, wandering the poorly lit stacks of the only library in town, I came across a book that child me couldn't walk pass. A heavy tomb that seemed to contain exactly the sort of cursed knowledge that had been kept from me my entire life. The Book of the Dead.

The Ancient Egyptian Book Of The Dead by University of Texas Press & Faulkner

The version I read was an old hardcover, tucked away in a corner with a title that was too good to pass up. A book about other religions, old religions? From a Muslim country? I knew I couldn't take it home. If anyone saw me with this it would raise a lot of questions I couldn't answer. Instead I struggled through it sitting at the long wooden tables after school and on the weekends, trying to make sense of what was happening.

The text (for those that are curious: https://www.ucl.ac.uk/museums-static/digitalegypt/literature/religious/bdbynumber.html) is dense and hard to read. It took me forever to get through it, missing a lot of the meaning. I would spend days sitting there writing in my little composition notebook, looking up words and trying to parse hard to read sentences. The Book of the Dead is about two hundred "spells" or maybe "chants" would be a better way to describe it that basically take someone through the process of death. From preservation to the afterlife and finally to judgement, the soul was escorted through the process and each part was touched upon.

The part that blew my mind was the Hymn to Osiris

"(1) Hail to thee, Osiris, lord of eternity, king of the gods, thou who hast many names, thou disposer of created things, thou who hast hidden forms in the temples, thou sacred one, thou KA who dwellest in Tattu, thou mighty (2) one in Sekhem, thou lord to whom invocations are made in Anti, thou who art over the offerings in Annu, thou lord who makest inquisition in two-fold right and truth, thou hidden soul, the lord of Qerert, thou who disposest affairs in the city of the White Wall, thou soul of Ra, thou very body of Ra who restest in (3) Suten-henen, thou to whom adorations are made in the region of Nart, thou who makest the soul to rise, thou lord of the Great House in Khemennu, thou mighty of terror in Shas-hetep, thou lord of eternity, thou chief of Abtu, thou who sittest upon thy throne in Ta-tchesert, thou whose name is established in the mouths of (4) men, thou unformed matter of the world, thou god Tum, thou who providest with food the ka's who are with the company of the gods, thou perfect khu among khu's, thou provider of the waters of Nu, thou giver of the wind, thou producer of the wind of the evening from thy nostrils for the satisfaction of thy heart. Thou makest (5) plants to grow at thy desire, thou givest birth to . . . . . . . ; to thee are obedient the stars in the heights, and thou openest the mighty gates. Thou art the lord to whom hymns of praise are sung in the southern heaven, and unto thee are adorations paid in the northern heaven. The never setting stars (6) are before thy face, and they are thy thrones, even as also are those that never rest. An offering cometh to thee by the command of Seb. The company of the gods adoreth thee, the stars of the tuat bow to the earth in adoration before thee, [all] domains pay homage to thee, and the ends of the earth offer entreaty and supplication. When those who are among the holy ones (7) see thee they tremble at thee, and the whole world giveth praise unto thee when it meeteth thy majesty. Thou art a glorious sahu among the sahu's, upon thee hath dignity been conferred, thy dominion is eternal, O thou beautiful Form of the company of the gods; thou gracious one who art beloved by him that (8) seeth thee. Thou settest thy fear in all the world, and through love for thee all proclaim thy name before that of all other gods. Unto thee are offerings made by all mankind, O thou lord to whom commemorations are made, both in heaven and in earth. Many are the shouts of joy that rise to thee at the Uak[*] festival, and cries of delight ascend to thee from the (9) whole world with one voice. Thou art the chief and prince of thy brethren, thou art the prince of the company of the gods, thou stablishest right and truth everywhere, thou placest thy son upon thy throne, thou art the object of praise of thy father Seb, and of the love of thy mother Nut. Thou art exceeding mighty, thou overthrowest those who oppose thee, thou art mighty of hand, and thou slaughterest thine (10) enemy. Thou settest thy fear in thy foe, thou removest his boundaries, thy heart is fixed, and thy feet are watchful. Thou art the heir of Seb and the sovereign of all the earth;

To a child raised in a heavily Christian environment, this isn't just close to biblical writing, it's the same. The whole world praises and worships him with a father and mother and woe to his foes who challenge him? I had assumed all of this was unique to Christianity. I knew there had been other religions but I didn't know they were saying the exact same things.

As important as the text is the surrounding context the academic sources put the text in. An expert walks me through how translations work, the source of the material, how our understanding has changed over time. As a kid drawn in by a cool title, I'm learning a lot about how to intake information. I'm learning real history has citations, explanations, debates, ambiguity. Real academic writing has a style, which when I stumble across the metaphysical Egyptian magic nonsense makes it easy to spot.

The reason this book mattered is the expert human commentary. The words themselves with some basic context wouldn't have meant anything. It's by understand the amount of work that went into this translation, what it means, what it also could mean, that the importance sets in. That's the human element which creates all the value. You aren't reading old words, you are being taken on a guided tour by someone who has lived with this text for a long time and knows it up and down.

I quickly expanded, growing from this historical text to a wide range of topics. I quickly find there is someone there to meet me at every stage of life. When I'm lonely or angry as a teenager I find those authors and stories that speak to that, put those feelings into a context and bigger picture. This isn't a new experience, people have felt this way going back to the very beginning. So much of the value isn't just the words, it's the sense of a relationship between me and the author. When you encounter this in fiction or in historical text, you come to understand as overwhelming as it feels in that second it is part of being a human being. This person experienced it and lived, you will too.

You also get to experience emotions that you may never experience. A Passage to India was a book I enjoyed a lot as a teen, even though it is about the story of two British women touring around India and clashing with the colonial realities of British history. I know nothing about British culture, the 1920s, all of this is as alien to me as anything else. It's fiction but with so much historical backing you still feel like you are seeing something different, something new.

That's a powerful part of why books work. Even if you the author are just imagining those scenarios, real life bleeds in. You can make text that reads like A Farewell to Arms, but you would miss the point if you did. It's more interesting and more powerful because its Hemingway basically recanting his wartime experience through his characters (obviously pumping up the manliness as he goes). It is when writers draw on their personal lives that it hits hardest.

Instead of finding a community that reinforced how alone and sad I was in that moment, I found evidence it didn't matter. People had survived far worse and ultimately turned out to be fine. You can't read about the complex relationship of fear and respect Egyptians had with the Nile, where too little water was dead and too much was also death, then endlessly fixate on your own problems. Humanity is capable of adaptation and the promise is, so are you.

Why AI Threatens Books

As readers get older and they spend a few decades going through books, they discover authors they like and more importantly styles they like. However you also like to see some experimentation in the craft, maybe with some rough edges. To me it's like people who listen to concert recordings instead of the studio album. Maybe it's a little rougher but there is also genius there from time to time. eBooks quickly became where you found the indie gems that would later get snapped up by publishers.

The key difference between physical and eBooks is bookstores and libraries are curated. They'll stock the shelves with things they like and things that will sell. Indie bookstores tend to veer a little more towards things they like, but in general it's not hard to tell the difference between the stack of books the staff loves and the ones they think the general population will buy. However each one had to get read by a person. That is the key difference between music or film and books.

A music reviewer needs to invest between 30-60 minutes to listen to an album. A movie reviewer somewhere between 1-3 hours. An owner of a bookstore in Chicago broke down his experience pretty well:

Average person: 4 books a year if they read at all

Readers (people who consider it a hobby): 30-50 books a year

Super readers: 80 books

80 books is not a lot of books. Adult novels clock in at about 90,000 words, 200-300 words per minute reading speed, 7-8 hours to get through a book. To combat this discrepancy websites like Goodreads were popularized because frankly you cannot invest 8 hours of your life in shitty eBooks very often. At the very least your investment should hopefully scare off others considering doing the same (or at least they can make an informed choice).

The ebook market also started to not be somewhere you wanted to wade in randomly due to the spike in New Age nonsense writing and openly racist or sexist titles. This book below was found by searching the term "war" and going to the second page. As a kid I would have had to send a money order to the KKK to get my hands on a book like this, but now it's in my virtual bookstore next to everything else. Since Amazon, despite their wealth and power, has no interest in policing their content, you are forced to solve the problem through community effort.

https://www.amazon.com/s?k=WAR&i=stripbooks-intl-ship&page=2&crid=2M4G32CTJBZ6V&qid=1700229573&sprefix=war%2Cstripbooks-intl-ship%2C248&ref=sr_pg_2https://www.amazon.com/s?k=WAR&i=stripbooks-intl-ship&page=2&crid=2M4G32CTJBZ6V&qid=1700229573&sprefix=war%2Cstripbooks-intl-ship%2C248&ref=sr_pg_2

The reason why AI books are so devastating to this ecosystem should be obvious, but let's lay it out. It breaks the emotional connection between reader and writer and creates a sense of paranoia. Is this real or fake? In order to discover it, someone else needs to invest a full work day into reading it to figure out. Then you need to join a community with enough trusted reviewers willing to donate their time for free to tell you whether the book is good or bad. Finally you need to hope that you are a member of the right book reading community to discover the review.

So if we were barely surviving the flood of eBooks and missing tons and tons of good books, the last thing we needed was someone to crank up the volume of books shooting out into the marketplace. The chances that one of the sacred reviewers even finds a new authors book decreases, so the community won't find it and the author will see that they have no audience and will either stop writing or will ensure they don't write another book like the first book. The feedback loop, which was already breaking under the load, completely collapses.

Now that AI books exist, the probability that I will ever blind purchase another eBook on Amazon from an unknown author drops to zero. Now more than ever I entirely rely on the reviews of others. Before I might have wandered through the virtual stacks, but no more. I'm not alone in this assessment, friends and family have reported the same feeling, even if they haven't themselves been burned by an AI book they knew about.

AI books solve a problem that didn't exist, which is this presumption by tech people that what we needed was more people writing books. Instead, like so many technical solutions to problems that the architects never took any time to understand, the result doesn't help smaller players. It places all the power back into publishers and the small cadre of super reviewers since they're willing to invest the time to check for at least some low benchmark of quality.

The sad part is this is unstoppable. eBooks are too easy to make with LLMs and no reliable detection systems exist to screen them before they're uploaded to the market. Amazon has no interest in setting realistic limits to how many books users can upload to the Kindle Store, still letting people upload a laughable three books a day. Google Play Store seems to have no limit, same with Apple Books. It's depressing that another market will become so crowded with trash, but nobody in a position to change it seems to care.

The Future

So where does that leave us? Well kind of back to where we started. If you are excellent at marketing and can get the name of your eBook out there, then people can go directly to it. But similar to how the App Store and Play Store are ruined for new app discoverability, it's a lopsided system which favors existing players and stacks the deck against anyone new. Publishers will still be able to get the authors to do the free market research through the eBook market and then snap up proven winners.

Since readers pay the price for this system by investing money and time into fake books, it both increases the amount of terrible out there and further incentives the push down in eBook price. If there are 600,000 "free" eBooks on Kindle Unlimited and you are trying to complete with a book that took a fraction of the time to produce, you are going to struggle to justify more than the $1.99-$2.99 price point. So not only are you selling a year (or years) of your life for the cost of a large soda, the probability of someone organically finding your book went from "bad" to "grain of sand in the ocean".

Even if there are laws, there is no chance they'll be able to make a meaningful difference unless the laws mandate that AI produced text is watermarked in some distinct way that everyone will immediately remove. So what becomes a "hard but possible" dream turns into a "attempting to become a professional athlete" level of statistical improbability. The end result will be fewer people trying so we get less good stories and instead just endlessly retread the writing of the past.

Help Everyone Do Better Security

October 27, 2023

One interesting thing about the contrast between infrastructure and security is the expectation of open-source software. When a common problem arises we all experience, a company will launch a product to solve this problem. In infrastructure, typically the core tool is open-source and free to use, with some value-add services or hosting put behind licensing and paid support contracts. On the security side, the expectation seems to be that the base technology will be open-source but any refinement is not. If I find a great tool to manage SSH certificates, I have to pay for it and I can't see how it works. If I rely on a company to handle my login, I can ask for their security audits (sometimes) but the actual nuts and bolts of "how they solved this problem" is obscured from me.

Instead of "building on the shoulders of giants", it's more like "You've never made a car before. So you make your first car, load it full of passengers, send it down the road until it hits a pothole and detonates." Then someone will wander by and explain how what you did was wrong. People working on their first car to send down the road become scared because they have another example of how to make the car incorrectly, but are not that much closer to a correct one given the nearly endless complexity. They may have many examples of "car" but they don't know if this blueprint is a good car or a bad car (or an old car that was good and is now bad).

In order to be good at security, one has to see good security first. I can understand in the abstract how SSH certificates should work, but to implement it I would have to go through the work of someone with a deep understanding of the problem to grasp the specifics. I may understand in the abstract how OAuth works, but the low level "how do I get this value/store it correctly/validate it correctly" is different. You can tell me until you are blue in the face how to do logins wrong, but I have very few criteria by which I can tell if I am doing it right.

To be clear there is no shortage of PDFs and checklists telling me how my security should look at an abstract level. Good developers will look at those checklists, look at their code, squint and say "yeah I think that makes sense". They don't necessarily have the mindset of "how do I think like someone attempting to break this code", in part because they may have no idea how the code works. Their code presents the user a screen, they receive a token, that token is used for other things and they got an email address in the process. The massive number of moving parts they just used is obscured from them, code they'll never see.

Just to do session cookies correctly, you need to know about and check the following things:

Is the expiration good and are you checking it on the server?
Have you checked that you never send the Cookie header back to the client and break the security model? Can you write a test for this? How time consuming will that test be?
Have you set the Secure flag? Did you set the SameSite flag? Can you use the HttpOnly flag? Did you set it?
Did you scope the domain and path?
Did you write checks to ensure you aren't logging or storing the cookies wrong?

That is so many places to get just one thing wrong.

We have to come up with a better way of throwing flares up in peoples way. More aggressive deprecation, more frequent spec bumps, some way of communicating to people "the way you have done things is legacy and you should look at something else". On the other side we need a way to say "this is a good way to do it" and "that is a bad way to do it" with code I can see. Pen-testing, scanners, these are all fine, but without some concept of "blessed good examples" it can feel like patching a ship in the dark. I closed that hole, but I don't know how many more there are until a tool or attacker finds it.

I'm gonna go through four examples of critical load-bearing security-related tooling or technology that is set up wrong by default or very difficult to do correctly. This is stuff everyone gets nervous when they touch because it doesn't help you set it up right. If we want people to do this stuff right, the spec needs to be more opinionated about right and wrong and we need to show people what right looks like on a code level.

SSH Keys

This entire field of modern programming is build on the back of SSH keys. Starting in 1995 and continuing now with OpenSSH, the protocol uses an asymmetric encryption process with the Diffie-Hellman (DH) key exchange algorithm to form a shared secret key for the SSH connection. SFTP, deploying code from CI/CD systems, accessing servers, using git, all of this happens largely on the back of SSH keys. Now you might be thinking "wait, SSH keys are great".

At a small scale SSH is easy and effortless. ssh-keygen -t rsa, select where to store it and if you want a passphrase. ssh-copy-id username@remoteserverip to move it to the remote box, assuming you set up the remote box with cloud-init or ansible or whatever. At the end of every ssh tutorial there is a paragraph that reads something like the following: "please ensure you rotate, audit and check all SSH keys for permissions". This is where things get impossible.

SSH keys don't help administrators do the right thing. Here's all the things I don't know about the SSH key I would need to know to do it correctly:

When was the key made? Is this a new SSH key or are they reusing a personal one or one from another job? I have no idea.
Was this key secured with a passhrase? Again like such a basic thing, can I ensure all the keys on my server were set up with a passphrase? Like just include some flag on the public key that says "yeah the private key has a passphrase". I understand you could fake it but the massive gain in security for everyone outweighs the possibility that someone manipulates a public key to say "this has a passphrase".
Expiration. I need a value that I can statelessly query to say "is this public key expired or not" and also to check when enrolling public keys "does this key live too long".

This isn't just a "what-if" conversation. I've seen this and I bet you have too, or would if you looked at your servers.

Many keys on servers are unused and represent access that was never properly terminated or shouldn't have been granted. I find across most jobs it's like 10% of the keys that ever get used.
Nobody knows who has the corresponding private keys. We understand the user who made them, but we don't know where they are now.
Alright so we use certificates! Well except they're special to OpenSSH, make auditing SSH key based access impossible since you don't know what keys the server will accept by looking at it and all the granting and revoking tooling is on you to build.

OpenSSH Certificates solves almost all these problems. You get expiration, limiting commands, limit IP address etc. It's a step forward but we're not using them in small and medium orgs due to the complexity of setup and we need to port some of these security concerns down the chain. It's exactly what I was talking about in the beginning. The default experience is terrible because backwards compatibility and for those 1% who know of the existence of SSH Certificates and can operationally support the creation of this mission-critical tooling, they reap the benefits.

So sure if I set up all of the infrastructure to do all the pieces, I can enforce ssh key rotation. I'll check the public key into object storage, sync it with all my servers, check the date the key was entered and remove it after a certain date. But seriously? We can't make a new version of the SSH key with some metadata? The entire internet operates off SSH keys and they're a half-done idea, fixed through the addition of certificates nobody uses cause writing the tooling to handle the user certificate process is a major project where if you break it, you can't get into the box.

This is a crazy state of affairs. We know SSH keys live in infrastructure forever, we know they're used for way too long all over the place and we know the only way to enforce rotation patterns is through the use of expiration. We also know that passphrases are absolutely essential for the use of keys. Effectively to use SSH keys you need to stick a PAM in there to enforce 2FA like libpam-google-authenticator. BTW, talking about "critical infrastructure not getting a ton of time", this is the repo of the package every tutorial recommends. Maybe nothing substantial has happened in 3 years but feels a little unlikely.

Mobile Device Management/Device Scanning/Network MITM Scanning

Nothing screams "security theater" to me like the absolutely excessive MDM that has come to plague major companies. I have had the "joy" of working for 3 large companies that went all-in on this stuff and each time have gotten the pleasure of rip your hair out levels of frustration. I'm not an admin on my laptop, so now someone who has no idea what my job is or what I need to do it gets to decide what software I get to install. All my network traffic gets scanned, so forget privacy on the device. At random intervals my laptop becomes unusable because every file on the device needs to "get scanned" for something.

Now in theory the way this stuff is supposed to work is a back and forth between security, IT and the users. In practice it's a one-way street that once the stupid shit gets bought and turned on, it never gets turned off. All of the organizational incentives are there to keep piling this worthless crap on previously functional machines and then almost dare the employee to get any actual work done. It just doesn't make any sense to take this heavy of a hand with this stuff.

What about stuff exploiting employee devices?

I mean if you have a well-researched paper which shows that this stuff actually makes a difference, I'd love to see it. Mostly it seems from my reading like vendors repeating sales talking points to IT departments until they accept it as gospel truth, mixed with various audits requiring the tooling be on. Also we know from recent security exploits that social engineering against IT Helpdesk is a new strategy that is paying off, so assuming your IT pros will catch the problems that normal users won't is clearly a flawed strategy.

The current design is so user-hostile and so aggressively invasive that there is just no way to think of it other than "my employer thinks I'm an idiot". So often in these companies you are told the strategies to work around stuff. I once worked with a team where everybody used a decommissioned desktop tucked away in a closet connected to an Ethernet port with normal internet access to do actual work. They were SSHing into it from their locked-down work computers because they didn't have to open a ticket every time they needed to do everything and hid the desktops existence from IT.

I'm not blaming the people turning it on

The incentives here are all wrong. There's no reward in security for not turning on the annoying or invasive feature so rank and file people are happy. On the off chance that is the vector by which you are attacked, you will be held responsible for that decision. So why not turn it all on? I totally understand it, especially when we all know every company has a VIP list of people for whom this shit isn't turned on, so the people who make the decisions about this aren't actually bearing the cost of it being on.

"Don't use your work laptop for personal stuff": hey before you hit me up with this gem, save it. I spend too many hours of my life at work to never have the two overlap. I need to write emails, look up stuff, schedule appointments, so just take this horrible know-it-all attitude and throw it away. People use work devices for personal stuff and telling them not to is a waste of oxygen.

JWTs

You have users and you have services. The users need to access the things they are allowed to access, the services need to be able to talk to each other and share information in a way where you know the information wasn't tampered with. It's JSON, but special limited edition JSON. You have a header, which says what it is (a JWT) and the signing algorithm being used.

{
  "alg": "HS256",
  "typ": "JWT"
}

You have a payload with claims. There are predefined (still optional) claims and then public and private claims. So here are some common ones:

"iss" (Issuer) Claim: identifies the principal that issued the JWT
"sub" (Subject) Claim: The "sub" (subject) claim identifies the principal that is the subject of the JWT.

You can see them all here. The diagram below shows the design source

Seems great. What's the problem?

See that middle part where both things need access to the same secret key? That's the problem. The service that makes the JWT and the service that verifies the JWT are both reading and using the same key, so there's nothing stopping me from making my own JWT with new insane permissions on application 2 and having it get verified. That's only the beginning of the issues with JWTs. This isn't called out to people, so when you are dealing with micro-services or multiple APIs where you pass around JWTs, often there is an assumption of security where one doesn't exist.

Asymmetric JWT implementations exist and work well, but so often people do not think about it or realize such an option exists. There is no reason to keep on-boarding people with this default dangerous design assuming they will "figure out" the correct way to do things later. We see this all over the place with JWTs though.

Looking at the alg claim in the header and using it rather than hardcoding the algorithm that your application uses. Easy mistake to make, I've seen it a lot.
Encryption vs signatures. So often with JWTs people think the payload is encrypted. Can we warn them to use JWEs? This is such a common misunderstanding among people starting with JWTs it seems insane to me to not warn people somehow.
Should I use a JWT? Or a JWE? Should I sign AND encrypt the thing where the JWS (the signed version of the JWT) is the encrypted payload of the JWE? Are normal people supposed to make this decision?
Who in the hell said none should be a supported algorithm? Are you drunk? Just don't let me use a bad one. ("Well it is the right decision for my app because the encryption channel means the JWT doesn't matter" "Well then don't check the signature and move on if you don't care.")
several Javascript Object Signing and Encryption (JOSE) libraries fail to validate their inputs correctly when performing elliptic curve key agreement (the "ECDH-ES" algorithm). An attacker that is able to send JWEs of its choosing that use invalid curve points and observe the cleartext outputs resulting from decryption with the invalid curve points can use this vulnerability to recover the recipient's private key. Oh sure that's a problem I can check for. Thanks for the help.
Don't let the super important claims like expiration be optional. Come on folks, why let people pick and choose like that? It's just gonna cause problems. OpenID Connect went through great lengths to improve the security properties of a JWT. For example, the protocol mandates the use of the exp, iss and aud claims. To do it right, I need those claims, so don't make them optional.

Quick, what's the right choice?

HS256 - HMAC using SHA-256 hash algorithm
HS384 - HMAC using SHA-384 hash algorithm
HS512 - HMAC using SHA-512 hash algorithm
ES256 - ECDSA signature algorithm using SHA-256 hash algorithm
ES256K - ECDSA signature algorithm with secp256k1 curve using SHA-256 hash algorithm
ES384 - ECDSA signature algorithm using SHA-384 hash algorithm
ES512 - ECDSA signature algorithm using SHA-512 hash algorithm
RS256 - RSASSA-PKCS1-v1_5 signature algorithm using SHA-256 hash algorithm
RS384 - RSASSA-PKCS1-v1_5 signature algorithm using SHA-384 hash algorithm
RS512 - RSASSA-PKCS1-v1_5 signature algorithm using SHA-512 hash algorithm
PS256 - RSASSA-PSS signature using SHA-256 and MGF1 padding with SHA-256
PS384 - RSASSA-PSS signature using SHA-384 and MGF1 padding with SHA-384
PS512 - RSASSA-PSS signature using SHA-512 and MGF1 padding with SHA-512
EdDSA - Both Ed25519 signature using SHA-512 and Ed448 signature using SHA-3 are supported. Ed25519 and Ed448 provide 128-bit and 224-bit security respectively.

You are holding it wrong. Don't tell me to issue and use x509 certificates. Trying that for micro-services cut years off my life.

But have you tried XML DSIG?

I need to both give something to the user that I can verify that tells me what they're supposed to be able to do and I need some way of having services pass the auth back and forth. So many places have adopted JWTs because JSON = easy to handle. If there is a right (or wrong) algorithm, guide me there. It is fine to say "this is now depreciated". That's a totally normal thing to tell developers and it happens all the time. But please help us all do the right thing.

Alright I am making a very basic application. It will provide many useful features for users around the world. I just need them to be able to log into the thing. I guess username and password right? I want users to have a nice, understood experience.

No you stupid idiot passwords are fundamentally broken

Well you decide to try anyway. You find this helpful cheat sheet.

Use Argon2id with a minimum configuration of 19 MiB of memory, an iteration count of 2, and 1 degree of parallelism.
If Argon2id is not available, use scrypt with a minimum CPU/memory cost parameter of (2^17), a minimum block size of 8 (1024 bytes), and a parallelization parameter of 1.
For legacy systems using bcrypt, use a work factor of 10 or more and with a password limit of 72 bytes.
If FIPS-140 compliance is required, use PBKDF2 with a work factor of 600,000 or more and set with an internal hash function of HMAC-SHA-256.
Consider using a pepper to provide additional defense in depth (though alone, it provides no additional secure characteristics).

None of these mean anything to you but that's fine. It looks pretty straightforward at first.

>>> from argon2 import PasswordHasher
>>> ph = PasswordHasher()
>>> hash = ph.hash("correct horse battery staple")
>>> hash  # doctest: +SKIP
'$argon2id$v=19$m=65536,t=3,p=4$MIIRqgvgQbgj220jfp0MPA$YfwJSVjtjSU0zzV/P3S9nnQ/USre2wvJMjfCIjrTQbg'
>>> ph.verify(hash, "correct horse battery staple")
True
>>> ph.check_needs_rehash(hash)
False
>>> ph.verify(hash, "Tr0ub4dor&3")
Traceback (most recent call last):
  ...
argon2.exceptions.VerifyMismatchError: The password does not match the supplied hash

Got it. But then you see this.

Rather than a simple work factor like other algorithms, Argon2id has three different parameters that can be configured. Argon2id should use one of the following configuration settings as a base minimum which includes the minimum memory size (m), the minimum number of iterations (t) and the degree of parallelism (p).

    m=47104 (46 MiB), t=1, p=1 (Do not use with Argon2i)
    m=19456 (19 MiB), t=2, p=1 (Do not use with Argon2i)
    m=12288 (12 MiB), t=3, p=1
    m=9216 (9 MiB), t=4, p=1
    m=7168 (7 MiB), t=5, p=1

What the fuck does that mean. Do I want more memory and fewer iterations? That doesn't sound right. Then you end up here: https://www.rfc-editor.org/rfc/rfc9106.html which says I should be using argon2.profiles.RFC_9106_HIGH_MEMORY. Ok but it warns me that it requires 2 GiB, which seems like a lot? How does that scale with a lot of users? Does it change? Should I do low_memory?

Alright I'm sufficiently scared off. I'll use something else.

I've heard about passkeys and they seem easy enough. I'll do that.

Alright well that's ok. I got....most of the big ones.

If you have Windows 10 or up, you can use passkeys. To store passkeys, you must set up Windows Hello. Windows Hello doesn’t currently support synchronization or backup, so passkeys are only saved to your computer. If your computer is lost or the operating system is reinstalled, you can’t recover your passkeys.

Nevermind I can't use passkeys. Good to know.

Well if you put the passkeys in 1password then it works

Great so passkeys cost $5 a month per user and they get to pay for the priviledge of using my site. Sounds totally workable.

OpenID Connect/OAuth

Ok so first I need to figure out what kind of this thing I need. I'll just read through all the initial information I need to make this decision.

Now that I've completed a masters degree in login, it's time for me to begin.

Apple

Facebook/Google/Microsoft

So each one of these requires me to create an account, set up their tokens and embed the button. Not a huge deal, but I can never get rid of any of these and if one was to get deactivated, it would be a problem. See when Login with Twitter stopped being a thing people could use. Plus with Google and Microsoft they also offer email services, so presumably a lot of people will be using their email address, then I've gotta create a flow on the backend where I can associate the same user with multiple email addresses. Fine, no big deal.

I'm also loading Javascript from these companies on my page and telling them who my customers are. This is (of course) necessary, but seems overkill for the problem I'm trying to solve. I need to know that the user is who they say they are, but I don't need to know what the user can do inside of their Google account.

I don't really want this data

Here's the default data I get with Login with Facebook after the user goes through a scary authorization page.

id
first_name
last_name
middle_name
name
name_format
picture
short_name
email

I don't need that. Same with Google

BasicProfile.getId()
BasicProfile.getName()
BasicProfile.getGivenName()
BasicProfile.getFamilyName()
BasicProfile.getImageUrl()
BasicProfile.getEmail()

I'm not trying to say this is bad. These are great tools and I think the Google one especially is well made. I just don't want to prompt users to give me access to data if I don't want the data and I especially don't want the data if I have no idea if its the data you intended to give me. Who hasn't hit the "Login with Facebook" button and wondered "what email is this company going to send to". My Microsoft account is back from when I bought an Xbox OG. I have no idea where it sends messages now.

Fine, Magic Links

I don't know how to hash stuff correctly in such a way that I am confident I won't mess it up. Passkeys don't work yet. I can use OpenID Connect but really it is overkill for this use case since I don't want to operate as the user on the third-party and I don't want access to all the users information since I intend to ask them how they want me to contact them. The remaining option is "magic links".

How do we set up magic links securely?

Short lifespan for the password. The one-time password issued will be valid for 5 minutes before it expires
The user's email is specified alongside login tokens to stop URLs being brute-forced
Each login token will be at least 20 digits
The initial request and its response must take place from the same IP address
The initial request and its response must take place in the same browser
Each one-time link can only be used once
Only the last one-time link issued will be accepted. Once the latest one is issued, any others are invalidated.

The fundamental problem here is that email isn't a reliable system of delivery. It's a best-effort system. So if something goes wrong, takes a long time, etc, there isn't much I can really do to troubleshoot that. My advice to the user would be like "I guess you need to try a different email address".

So in order to do this for actual normal people to use, I have to turn off a lot of those security settings. I can't guarantee people don't sign up on their phones and then go to their laptops (so no IP address or browser check). I can't guarantee when they'll get the email (so no 5 minute check). I also don't know the order in which they're gonna get these emails, so it will be super frustrating for people if I send them 3 emails and the second one is actually the most "recent".

I also have no idea how secure this email account is. Effectively I'm just punting on security because it is hard and saying "well this is your problem now".

I could go on and on and on and on

I could write 20,000 words on this topic and still not be at the end. The word miserable barely does justice to how badly this stuff is designed for people to use. Complexity is an unavoidable side effect of flexibility in software. If your thing can do many things, it is harder to use.

We rely on expertise as a species to assist us with areas outside of our normal functions. I don't know anything about medicine, I go to a doctor. I have no idea how one drives a semi truck or flies a plane or digs a mine. Our ability to let people specialize is a key component to our ability to advance. So it is not reasonable to say "if you do anything with security at all you must become an expert in security".

Part of that is you need to use your skill and intelligence to push me along the right path. Don't say "this is the most recommended and this is less recommended and this one is third recommended". Show me what you want people to build and I bet most teams will jump at the chance to say "oh thank God, I can copy and paste a good example".

Corrections/notes/"I think you are stupid": https://c.im/@matdevdug

Can We Make Idiot-Proof Infrastructure pt1?

October 20, 2023

One complaint I hear all the time online and in real life is how complicated infrastructure is. You either commit to a vendor platform like ECS, Lightsail, Elastic Beanstalk or Cloud Run or you go all in with something like Kubernetes. The first are easy to run but lock you in and also sometimes get abandoned by the vendor (looking at you Beanstalk). Kubernetes runs everywhere but it is hard and complicated and has a lot of moving parts.

The assumption seems to be that with containers there should be an easier way to do this. I thought it was an interesting thought experiment. Could I, a random idiot, design a simpler infrastructure? Something you could adopt to any cloud provider without doing a ton of work, that is relatively future proof and that would scale to the point when something more complicated made sense? I have no idea but I thought it could be fun to try.

Fundamentals of Basic Infrastructure

Here are the parameters we're attempting to work within:

It should require minimal maintenance. You are a small crew trying to get a product out the door and you don't want to waste a ton of time.
You cannot assume you will detect problems. You lack the security and monitoring infrastructure to truly "audit" the state of the world and need to assume that you won't be able to detect a breach. Anything you put out there has to start as secure as possible and pretty much fix itself.
Controlling costs is key. You don't have the budget for surprises and massive spikes in CPU usage is likely a problem and not organic growth (or if it is organic growth, you'll want to likely be involved with deciding what to do about it)
The infrastructure should be relatively portable. We're going to try and keep everything movable without too many expensive parts.
Perfect uptime isn't the goal. Restarting containers isn't a hitless operation and while there are ways to queue up requests and replay them, we're gonna try to not bite off that level of complexity with the first draft. We're gonna drop some requests on the floor, but I think we can minimize that number.

Basic Setup

You've got your good idea, you've written some code and you have a private repo in GitHub. Great, now you need to get the thing out onto the internet. Let's start with some good tips before we get anywhere near to the internet itself.

Semantic Versioning is your friend. If you get into the habit now of structuring commits and cutting releases, you are going to reap those benefits down the line. It seems silly right this second when the entirety of the application code fits inside of your head, but soon that won't be the case if you continue to work on it. I really like Release-Please as a tool to cut releases automatically based on commits and let you use the version number to be a meaningful piece of data for you to work off.
Containers are mandatory. Just don't overthink this and commit early. Don't focus on container disk space usage. Disk space is not our largest concern. We want an easy to work with platform with a minimum amount of surface area for attacks. While Distroless isn't actually....without a linux Distro (I'm not entirely clear why that name was chosen), it is a great place to start. If you can get away with using these, this is what you want to do. Link
Be careful about what dependencies you rely on in the early phase. So many jobs I've had there are a few unmaintained packages that are mission critical impossible to remove load-bearing weights around our necks. If you can do it with the standard library great. When you find a dependency on the internet, look at what you need it to do and see "can I just copy paste the 40 lines of code I need from this" vs adding a new dependency forever. Dependency minimization isn't very cool right now but I think especially when starting out it pays off big.
Healthcheck. You need some route on your app that you can hit which provides a good probability that the application is up and functional. /health or whatever, but this is gonna be pretty key to the rest of this works.

Deployment and Orchestration

Alright so you've made the app, you have some way of tracking major/minor etc. Everything works great on your laptop. How do we put it on the internet.

You want a way to take a container and deploy it out to a Linux host
You don't want to patch or maintain the host
You need to know if the deployment has gone wrong
Either the deployment should roll back automatically or fail safe waiting for intervention
The whole thing needs to be as safe as possible.

Is there a lightweight way to do this? Maybe!

Basic Design

Cloudflare -> Autoscaling Group -> 4 instances setup with Cloud init -> Docker Compose with Watchtower -> DBaaS

When we deploy we'll be hitting the IP addresses of the instances on the Watertower HTTP route with curl and telling it to connect to our private container registry and pull down new versions of our application. We shouldn't need to SSH into the boxes ever and when a box dies or needs to be replaced, we can just delete it and run Terraform again to make a new one. SSL will be static long-lived certificates and we should be able to distribute traffic across different cloud providers however we'd like.

Cloudflare as the Glue

I know, a lot of you are rolling your eyes. "This isn't portable at all!" Let me defend my work at bit. We need a WAF, we need SSL, we need DNS, we need a load balancer and we need metrics. I can do all of that with open-source projects, but it's not easy. As I was writing it out, it started to get (actually) quite difficult to do.

Cloudflare is very cheap for what they offer. We aren't using anything here that we couldn't move somewhere else if needed. It scales pretty well, up to 20 origins (which isn't amazing but if you have hit 20 servers serving customer traffic you are ready to move up in complexity). You are free to change the backend CPU as needed (or even experiment with local machines, mix and match datacenter and cloud, etc). You also get a nice dashboard of what is going on without any work. It's a hard value proposition to fight against, especially when almost all of it is free. I also have no ideological dog in the fight of OSS vs SaaS.

Pricing

Up to 2 origin servers: $5 per month

Additional origins, up to 20: $5 per month per origin

First 500k DNS requests are free

$0.50 per every 500k DNS requests after

Compared to ALB pricing, we can see why this is more idiot proof. There we have 4 dimensions to cost: New connections (per second), Active connections (per minute), Processed bytes (GBs per hour), Rule evaluations (per second). The hourly bill is calculated by taking the maximum LCUs consumed across the four dimensions and we're charged on the highest one. Now ALBs can be much cheaper than Cloudflare, but it's harder to control the cost. If one element starts to explode in price, there isn't a lot you can do to bring it back down.

Cloudflare we're looking at $20 a month and then traffic. So if we get 60,000,000 requests a month we're paying $60 a month in DNS and $20 for the load balancer. For ALB it would largely depend on the type of traffic we're getting and how it is distributed.

BUT there are also much cheaper options. For €7 a month on Hetzner, you can get 25 targets and 20 TB of network traffic. € 1/TB for network traffic above that. So for our same cost we could handle a pretty incredible amount of traffic through Hetzner, but it commits us to them and violates the spirit of this thing. I just wanted to mention it in case someone was getting ready to "actually" me.

Also keep in mind we're just in the "trying ideas out" part of the exercise. Let's define a load balancer.

provider "cloudflare" {
  email   = "[email protected]"
  api_key = "your_api_key"
}

resource "cloudflare_load_balancer" "example_lb" {
  name   = "example-load-balancer.example.com"
  zone_id = "0da42c8d2132a9ddaf714f9e7c920711"
  default_pool_ids = [cloudflare_load_balancer_pool.pool1.id, cloudflare_load_balancer_pool.pool2.id]
  fallback_pool_id = cloudflare_load_balancer_pool.pool1.id
  steering_policy = "random"
  session_affinity = "none"
  proxied = true

  # Add other load balancer settings here from https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/resources/load_balancer
  }

Then we need a monitor.

resource "cloudflare_load_balancer_monitor" "example" {
  account_id     = "f037e56e89293a057740de681ac9abbe"
  type           = "http"
  expected_body  = "alive"
  expected_codes = "2xx"
  method         = "GET"
  timeout        = 7
  path           = "/health"
  interval       = 60
  retries        = 2
  description    = "example http load balancer"
  header {
    header = "Host"
    values = ["example.com"]
  }
  allow_insecure   = false
  follow_redirects = true
  probe_zone       = "example.com"
}

Finally we need some pools

resource "cloudflare_load_balancer_pool" "pool1" {
  account_id = "f037e56e89293a057740de681ac9abbe"
  name       = "pool1"
  monitor = cloudflare_load_balancer_monitor.example.id
  origins {
    name    = "server01"
    address = "d9bb:3880:71b0:5fab:e426:8883:5a75:e82e"
    enabled = false
    header {
      header = "Host"
      values = ["server01"]
    }
  }
  origins {
    name    = "server02"
    address = "9726:61db:23a9:41d5:7eb0:649a:87b0:4291"
    header {
      header = "Host"
      values = ["server02"]
    }
  }
  description        = "example load balancer pool 1"
  enabled            = false
  minimum_origins    = 1
  notification_email = "[email protected]"
  load_shedding {
    default_percent = 55
    default_policy  = "random"
  }
  origin_steering {
    policy = "random"
  }
}

resource "cloudflare_load_balancer_pool" "pool2" {
  account_id = "f037e56e89293a057740de681ac9abbe"
  name       = "pool2"
  monitor = cloudflare_load_balancer_monitor.example.id
  origins {
    name    = "server03"
    address = "3601:03b9:88b7:fa50:8163:818c:eceb:bc14"
    enabled = false
    header {
      header = "Host"
      values = ["server03"]
    }
  }
  origins {
    name    = "server04"
    address = "8118:87ef:6b50:099d:fc4a:e66d:a991:5d20"
    header {
      header = "Host"
      values = ["server04"]
    }
  }
  description        = "example load balancer pool 2"
  enabled            = false
  minimum_origins    = 1
  notification_email = "[email protected]"
  load_shedding {
    default_percent = 55
    default_policy  = "random"
  }
  origin_steering {
    policy = "random"
  }
}

The addresses are just placeholders, but you'll need to swap values etc. This gives us a nice basic load balancer. Note that we don't have session affinity turned on, so we'll need to add Redis or something to help with state server-side. The IP addresses we point to will need to be reserved on the cloud provider side, but we can use IPv6 so hopefully should save us a few dollars a month there.

How much uptime is enough uptime

So there are two paths here we have to discuss before we get much further.

Path 1

When we deploy to a server, we make an API call to Cloudflare to mark the origin as not enabled. Then we wait for the connections to drain, deploy the container, bring it back up, wait for it to be healthy and then we mark it enabled again. This is traditionally the way we would need to do things, if we were targeting zero downtime.

Now we can do this. We have places later that we could stick such a script. But this is gonna be brittle. We'd basically need to do something like the following.

Run a GET against https://api.cloudflare.com/client/v4/user/load_balancers/pools
Take the result, look at the IP addresses, figure out which one is the machine in question and then mark it as not enabled IF all other origins were healthy. We wouldn't want to remove multiple machines at the same time. So we'd then need to hit: https://api.cloudflare.com/client/v4/user/load_balancers/pools/{identifier}/health and confirm the health of the pools.
But "health" isn't an instant concept. There is a delay between the concept of when the origin is unhealthly and I'll know about it, depending on how often I check and retries. So this isn't a perfect system, but it should work pretty well as long as I add a bit of jitter to it.

I think this exceeds what I want to do for the first pass. We can do it, but it's not consistent with the uptime discussion we had before. This is brittle and is going to require a lot of babysitting to get right.

Path 2

We rely on the healthchecks to steer traffic and assume that our deployments are going to be pretty fast, so while we might drop some traffic on the floor, a user (with our random distribution and server-side sessions) should be able to reload the page and hopefully get past the problem. It might not scale forever but it does remove a lot of our complexity.

Let's go with Path 2 for now.

Server setup + WAF

Alright so we've got the load balancer, it sits on the internet and takes traffic. Fabulous stuff. How do we set up a server? To do it cross-platform we have to use cloud-init.

The basics are pretty straight forward. We're gonna use latest debian and we're gonna update it and restart. Then we're gonna install Docker Compose and then finally stick a few files in there to run this. This is all pretty easy, but we do have a problem we need to tackle first. We need some way to do a level of secrets management so we can write out Terraform and cloud-init files, keep them in version control but also not have the secrets just kinda live there.

SOPS

So typically for secret management we want to use whatever our cloud provider gives us, but since we don't have something like that, we'll need to do something more basic.

We'll use age for encryption which is a great simple encryption library. You can install it here. We run age-keygen -o key.txt which gives us our secret file. Then we need to set an environmental variable with the path to the key like this: SOPS_AGE_KEY_FILE=/Users/mathew.duggan/key.txt

For those unfamiliar with how SOPS (installed here) works, basically you generate the age key as shown above and then you can encrypt files through a CLI or with Terraform locally. So:

secrets.json
{
   "username": "admin",
   "password": "password"
}

Turns into:

{
	"username": "ENC[AES256_GCM,data:+bGf/sI=,iv:J47szLfZ5wMWr6Ghc94VAABXs2Ec4Hi+e3ohc2HuF/Q=,tag:XIY1jOgDe9SBDMGxFhLwtw==,type:str]",
	"password": "ENC[AES256_GCM,data:RIHz14crqEk=,iv:H3g7/4Bd5vB/6U+Kf+rIR/xBRIGHGoZeN7U1zi5lgsM=,tag:+vD9BXb18rLhpf/sTsvYEA==,type:str]",
	"sops": {
		"kms": null,
		"gcp_kms": null,
		"azure_kv": null,
		"hc_vault": null,
		"age": [
			{
				"recipient": "age1j6dmaunhspfvh78lgnrtr6zkd7whcypcz6jdwypaydc6gaa79vtq5ryvzf",
				"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSA1YlcvdkpGc3pBbVFiUnhP\nYVJnalp0WlREVjlQZkFROGtvcWN2VWxsUUJnCmYvZ1ZPd3NzTjZxNHd6MEVNcmI1\nTTBZdnFaSEFSaXZRK28rc01VZGRxWHMKLS0tIGpZUjZCNDFDUnIvYXRJTDhtcGlu\nT3JJWlN1YlJYeU1ueEQ1cytDbDFXQ00K70mBEowf/AGgiFFNj3ocv0NfbI1IMJX/\nMJHMKtXPYJsoSKJla6Y+cXMXPe7LNNorSnmqvkNF7rgEMvONMNoEiA==\n-----END AGE ENCRYPTED FILE-----\n"
			}
		],
		"lastmodified": "2023-10-19T13:06:42Z",
		"mac": "ENC[AES256_GCM,data:q8R8Zb+PtpBs6TBPu6VJsQXEKLwi2+WtpE3culIy1obUNdfjWaXyBtC/zbWI5eeh2Z4u//2p49G2bMv0jSzMJZnH4TLIzpHxnd6XFjzu4TqObM6FnI3ZW/SSoPwTRxgHqvooMffm3NO5pxoz3FhnJDHwYk+jTK+JoGxyZF5nBe4=,iv:Ey+so87o/kYbvOaSUXs+vyIrEQXEC39vmswdl0L3Gvw=,tag:5mWJTfBgCFjXVuoYBUiDCA==,type:str]",
		"pgp": null,
		"unencrypted_suffix": "_unencrypted",
		"version": "3.8.1"
	}
}

By running this: sops --encrypt --age age1j6dmaunhspfvh78lgnrtr6zkd7whcypcz6jdwypaydc6gaa79vtq5ryvzf secrets.json > secrets.enc.json

So we can use this with Terraform pretty easily. We run export SOPS_AGE_KEY_FILE=/Users/mathew.duggan/key.txt just to ensure everything is set and then the Terraform looks like the following:

terraform {
  required_providers {
    sops = {
      source = "carlpett/sops"
      version = "~> 0.5"
    }
  }
}

data "sops_file" "secret" {
  source_file = "secrets.enc.json"
}

output "root-value-password" {
  # Access the password variable from the map
  value = data.sops_file.secret.data["password"]
  sensitive = true
}

Now you can use SOPS with AWS, GCP, Azure, or use their own secrets system. I present this only as a "we're small and am looking for a way to easily encrypt configuration files".

Cloud init

So now we're to the last part of the server setup. We'll need to define a cloud-init YAML to set up the host and we'll need to define a Docker Compose file to set up the application that is going to handle all the pulling of images from here. Now thankfully we should be able to reuse this stuff for the foreseeable future.

#cloud-config

package_update: true
package_upgrade: true
package_reboot_if_required: true

groups:
    - docker

users:
    - name: admin
      lock_passwd: true
      shell: /bin/bash
      ssh_authorized_keys:
      - ${init_ssh_public_key}
      groups: docker
      sudo: ALL=(ALL) NOPASSWD:ALL

packages:
  - apt-transport-https
  - ca-certificates
  - curl
  - gnupg-agent
  - software-properties-common
  - unattended-upgrades
  - nginx
  
write_files:
  - owner: root:root
    encoding: b64
    path: /etc/ssl/cloudflare.crt
    content: |
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tDQpNSUlHQ2pDQ0EvS2dBd0lCQWdJSVY1RzZsVmJDTG1Fd0RRWUpLb1pJaHZjTkFRRU5CUUF3Z1pBeEN6QUpCZ05WDQpCQVlUQWxWVE1Sa3dGd1lEVlFRS0V4QkRiRzkxWkVac1lYSmxMQ0JKYm1NdU1SUXdFZ1lEVlFRTEV3dFBjbWxuDQphVzRnVUhWc2JERVdNQlFHQTFVRUJ4TU5VMkZ1SUVaeVlXNWphWE5qYnpFVE1CRUdBMVVFQ0JNS1EyRnNhV1p2DQpjbTVwWVRFak1DRUdBMVVFQXhNYWIzSnBaMmx1TFhCMWJHd3VZMnh2ZFdSbWJHRnlaUzV1WlhRd0hoY05NVGt4DQpNREV3TVRnME5UQXdXaGNOTWpreE1UQXhNVGN3TURBd1dqQ0JrREVMTUFrR0ExVUVCaE1DVlZNeEdUQVhCZ05WDQpCQW9URUVOc2IzVmtSbXhoY21Vc0lFbHVZeTR4RkRBU0JnTlZCQXNUQzA5eWFXZHBiaUJRZFd4c01SWXdGQVlEDQpWUVFIRXcxVFlXNGdSbkpoYm1OcGMyTnZNUk13RVFZRFZRUUlFd3BEWVd4cFptOXlibWxoTVNNd0lRWURWUVFEDQpFeHB2Y21sbmFXNHRjSFZzYkM1amJHOTFaR1pzWVhKbExtNWxkRENDQWlJd0RRWUpLb1pJaHZjTkFRRUJCUUFEDQpnZ0lQQURDQ0Fnb0NnZ0lCQU4yeTJ6b2pZZmwwYktmaHAwQUpCRmVWK2pRcWJDdzNzSG12RVB3TG1xRExxeW5JDQo0MnRaWFI1eTkxNFpCOVpyd2JML0s1TzQ2ZXhkL0x1akpuVjJiM2R6Y3g1cnRpUXpzbzB4emxqcWJuYlFUMjBlDQppaHgvV3JGNE9rWkt5ZFp6c2RhSnNXQVB1cGxESDVQN0o4MnEzcmU4OGpRZGdFNWhxanFGWjNjbENHN2x4b0J3DQpoTGFhem0zTkpKbFVmemRrOTdvdVJ2bkZHQXVYZDVjUVZ4OGpZT09lVTYwc1dxbU1lNFFIZE92cHFCOTFiSm9ZDQpRU0tWRmpVZ0hlVHBOOHROcEtKZmI5TEluM3B1bjNiQzlOS05IdFJLTU5YM0tsL3NBUHE3cS9BbG5kdkEyS3czDQpEa3VtMm1IUVVHZHpWSHFjT2dlYTlCR2pMSzJoN1N1WDkzelRXTDAydTc5OWRyNlhrcmFkL1dTaEhjaGZqalJuDQphTDM1bmlKVURyMDJZSnRQZ3hXT2JzcmZPVTYzQjhqdUxVcGhXLzRCT2pqSnlBRzVsOWoxLy9hVUdFaS9zRWU1DQpscVZ2MFA3OFFyeG94UitNTVhpSndRYWI1RkI4VEcvYWM2bVJIZ0Y5Q21rWDkwdWFSaCtPQzA3WGpUZGZTS0dSDQpQcE05aEIyWmhMb2wvbmY4cW1vTGRvRDVIdk9EWnVLdTIrbXVLZVZIWGd3Mi9BNndNN093cmlueFppeUJrNUhoDQpDdmFBREg3UFpwVTZ6L3p2NU5VNUhTdlhpS3RDekZ1RHU0L1pmaTM0UmZIWGVDVWZIQWI0S2ZOUlhKd01zeFVhDQorNFpwU0FYMkc2Um5HVTVtZXVYcFU1L1YrRFFKcC9lNjlYeXlZNlJYRG9NeXdhRUZsSWxYQnFqUlJBMnBBZ01CDQpBQUdqWmpCa01BNEdBMVVkRHdFQi93UUVBd0lCQmpBU0JnTlZIUk1CQWY4RUNEQUdBUUgvQWdFQ01CMEdBMVVkDQpEZ1FXQkJSRFdVc3JhWXVBNFJFemFsZk5Wemphbm4zRjZ6QWZCZ05WSFNNRUdEQVdnQlJEV1VzcmFZdUE0UkV6DQphbGZOVnpqYW5uM0Y2ekFOQmdrcWhraUc5dzBCQVEwRkFBT0NBZ0VBa1ErVDlucWNTbEF1Vy85MERlWW1RT1cxDQpRaHFPb3I1cHNCRUd2eGJOR1YyaGRMSlk4aDZRVXE0OEJDZXZjTUNoZy9MMUNrem5CTkk0MGkzLzZoZURuM0lTDQp6VkV3WEtmMzRwUEZDQUNXVk1aeGJRamtOUlRpSDhpUnVyOUVzYU5RNW9YQ1BKa2h3ZzIrSUZ5b1BBQVlVUm9YDQpWY0k5U0NEVWE0NWNsbVlISi9YWXdWMWljR1ZJOC85YjJKVXFrbG5PVGE1dHVnd0lVaTVzVGZpcE5jSlhIaGd6DQo2QktZRGwwL1VQMGxMS2JzVUVUWGVUR0RpRHB4WllJZ2JjRnJSRERrSEM2QlN2ZFdWRWlINWI5bUgyQk9ONjB6DQowTzBqOEVFS1R3aTlqbmFmVnRaUVhQL0Q4eW9Wb3dkRkRqWGNLa09QRi8xZ0loOXFyRlI2R2RvUFZnQjNTa0xjDQo1dWxCcVphQ0htNTYzanN2V2Iva1hKbmxGeFcrMWJzTzlCREQ2RHdlQmNHZE51cmdtSDYyNXdCWGtzU2REN3kvDQpmYWtrOERhZ2piaktTaFlsUEVGT0FxRWNsaXdqRjQ1ZWFiTDB0MjdNSlY2MU8vakh6SEwzZGtuWGVFNEJEYTJqDQpiQStKYnlKZVVNdFU3S01zeHZ4ODJSbWhxQkVKSkRCQ0ozc2NWcHR2aERNUnJ0cURCVzVKU2h4b0FPY3BGUUdtDQppWVdpY240Nm5QRGpnVFUwYlgxWlBwVHByeVhidmNpVkw1UmtWQnV5WDJudGNPTERQbFpXZ3haQ0JwOTZ4MDdGDQpBbk96S2daazRSelpQTkF4Q1hFUlZ4YWpuL0ZMY09oZ2xWQUtvNUgwYWMrQWl0bFEwaXA1NUQyL21mOG83MnRNDQpmVlE2VnB5akVYZGlJWFdVcS9vPQ0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQ==
  - owner: root:root
    encoding: b64
    path: /etc/ssl/cert.pem
    content: |
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tDQpNSUlFcGpDQ0E0NmdBd0lCQWdJVUgzZXMwaHVaQy8rTUNxQWRyWXEwTE05UFY4QXdEUVlKS29aSWh2Y05BUUVMDQpCUUF3Z1lzeEN6QUpCZ05WQkFZVEFsVlRNUmt3RndZRFZRUUtFeEJEYkc5MVpFWnNZWEpsTENCSmJtTXVNVFF3DQpNZ1lEVlFRTEV5dERiRzkxWkVac1lYSmxJRTl5YVdkcGJpQlRVMHdnUTJWeWRHbG1hV05oZEdVZ1FYVjBhRzl5DQphWFI1TVJZd0ZBWURWUVFIRXcxVFlXNGdSbkpoYm1OcGMyTnZNUk13RVFZRFZRUUlFd3BEWVd4cFptOXlibWxoDQpNQjRYRFRJek1EY3pNVEUzTXprd01Gb1hEVE00TURjeU56RTNNemt3TUZvd1lqRVpNQmNHQTFVRUNoTVFRMnh2DQpkV1JHYkdGeVpTd2dTVzVqTGpFZE1Cc0dBMVVFQ3hNVVEyeHZkV1JHYkdGeVpTQlBjbWxuYVc0Z1EwRXhKakFrDQpCZ05WQkFNVEhVTnNiM1ZrUm14aGNtVWdUM0pwWjJsdUlFTmxjblJwWm1sallYUmxNSUlCSWpBTkJna3Foa2lHDQo5dzBCQVFFRkFBT0NBUThBTUlJQkNnS0NBUUVBdmtmbjB1eVZ3LzlSYlBDbDQ2dzhIeVZnTXZKREtVUWgvQUk0DQpIODRXRGRzM1hTRmxrbmFIK0FQdmJoM0Rsc3M5NEZnRDVGVVRMdENzQzRtSFpZVlNiRzJqeCtJbjJGcTdTSjdUDQp1QlJUbHBXWmNyVEViRjRBa00wRm53NGwwbEdQeFlZRjRaOG5uZm13YUtvNnlwb0Ftd3draXJWWXU3dWE4Mm01DQp3eWoyZHZKcWNkUExxTXdHRFVkYnlYemdwZE9IaXRBVFFoTE56VmtaOEI1L2RyODcweDR3TE8rRkVOOG92QUprDQpaNVZCRndSOEI5WEs4dUtEcmdBZkxYUVM5UVZ3WHpjcmQxQVp6S1RDVnBlMmlwemFiSGN5TUt1WDdpZjRTRGQ1DQpiZ2Ird1hycGY2dkNRWklDa3REdWJFcDdCVzlCNVhIUnlmMnJ2Yms2VEtjZ2xTbGNRUUlEQVFBQm80SUJLRENDDQpBU1F3RGdZRFZSMFBBUUgvQkFRREFnV2dNQjBHQTFVZEpRUVdNQlFHQ0NzR0FRVUZCd01DQmdnckJnRUZCUWNEDQpBVEFNQmdOVkhSTUJBZjhFQWpBQU1CMEdBMVVkRGdRV0JCU3pwcWpFOEJUK0FKYUg2c3VnRmwxajdqend4REFmDQpCZ05WSFNNRUdEQVdnQlFrNkZOWFhYdzBRSWVwNjVUYnV1RVdlUHdwcERCQUJnZ3JCZ0VGQlFjQkFRUTBNREl3DQpNQVlJS3dZQkJRVUhNQUdHSkdoMGRIQTZMeTl2WTNOd0xtTnNiM1ZrWm14aGNtVXVZMjl0TDI5eWFXZHBibDlqDQpZVEFwQmdOVkhSRUVJakFnZ2c4cUxtMWhkR1IxWjJkaGJpNWpiMjJDRFcxaGRHUjFaMmRoYmk1amIyMHdPQVlEDQpWUjBmQkRFd0x6QXRvQ3VnS1lZbmFIUjBjRG92TDJOeWJDNWpiRzkxWkdac1lYSmxMbU52YlM5dmNtbG5hVzVmDQpZMkV1WTNKc01BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQ3VvUG9KV05VZ0xPRXVmendLRlprMHBvL2tNR29qDQoxYTdCSGEzcWtNWGUrN2J4aW1pQTBvYzcyVEhYSm8zVm82bTIwaGRpbDRiSzVPYzZoTGpiUTFOR2ZXNm84MXk2DQpyUXZEaXBXN3JuL3R3V3hPTkpHTFNDZDZFalpqWXpUUW5EdFBSQWQrVnBwV1BuNUtLZHRSNkM2ZjhaMFlqeldjDQp3b3JLdkRuV2E5b0gycEUzZUNSRUZsc1lRUUtVNWxOYUpibm9nRXNaY2ZDa0MvU0JCaTRaN0lIRnJzWnd1YTU5DQorVDIxUWNOd3BKbExLZ2VRZlpLazMzTFc5MFlyYjRhNStMaTljQzZsVC9MRHdTc20ySkVVVm1nbDJOaC8wV2dpDQpBcHFxUjV5dmUwdUI2M0tTdW90Z2hyWlp0cnNhVW1OYytjRjhneHU4Si8rdXFhaWZQWk83NVZtVw0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQ==
  - owner: root:root
    encoding: b64
    path: /etc/ssl/key.pem
    permissions: '0600'
    content: ${private_ssl_key}
  - owner: admin:docker
    path: /home/admin/docker-compose.yaml
    content: |
    version: "3"
    services:
      app:
        image: ghcr.io/<org>/<image>:<tag>
        restart: unless-stopped
        ports:
          - "8000:2368"
        labels:
          - "com.centurylinklabs.watchtower.enable=true"
      watchtower:
        image: containrrr/watchtower
        command: --debug --http-api-update
        restart: unless-stopped
        environment:
          - WATCHTOWER_HTTP_API_TOKEN=${watchtower_token}
        labels:
          - "com.centurylinklabs.watchtower.enable=false"
        ports:
          - "8080:8080"
        volumes:
          - /var/run/docker.sock:/var/run/docker.sock
          - /home/admin/.docker/config.json:/config.json
    - owner: www-data:www-data
      path: /etc/nginx/sites-available/default
      content: |
        server {
          listen 443 ssl http2;
          listen [::]:443 ssl http2;
          charset UTF-8;
          ssl_session_timeout 5m;
          ssl_prefer_server_ciphers on;
          ssl_ciphers ECDH+AESGCM:ECDH+AES256:ECDH+AES128:DH+3DES:!ADH:!AECDH:!MD5;
          ssl_protocols TLSv1.2;
          ssl_buffer_size 4k;
          ssl_certificate         /etc/ssl/cert.pem;
          ssl_certificate_key     /etc/ssl/key.pem;
          ssl_client_certificate /etc/ssl/cloudflare.crt;
          ssl_verify_client on;
          
          server_name hostname.com www.hostname.com;
          
          location / {
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header Host $host;
            proxy_http_version 1.1;
            proxy_buffering        on;
            proxy_pass http://127.0.0.1:8000;
            proxy_redirect off;
            }
            
          location /v1/update {
            proxy_http_version 1.1;
            proxy_buffering on;
            proxy_pass http://127.0.0.1:8080;
            proxy_redirect off;
            }
          }
  
runcmd:
  - curl -fsSL https://download.docker.com/linux/debian/gpg | apt-key add -
  - add-apt-repository "deb [arch=$(dpkg --print-architecture)] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
  - apt-get update -y
  - apt-get install -y docker-ce docker-ce-cli containerd.io
  - systemctl start docker
  - systemctl enable docker
  - curl -L "https://github.com/docker/compose/releases/download/2.23.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
  - chmod +x /usr/local/bin/docker-compose
  - su admin -c 'docker login -u ${docker_username} -p ${docker_password} ${docker_repository}'
  - su admin -c 'docker compose -f /home/admin/docker-compose.yml up -d'

Now obviously you'll need to modify this and test it, it took some tweaks to get it working on mine and I'm confident there are improvements we could make. However I think we can use it as a sample reference doc with the understanding it is NOT ready to copy and paste.

So here's the basic flow. We're going to use the SSL certificates Cloudflare gives us as well as inserting their certificate for Authenticated Origin Pulls. This ensures all the traffic coming to our server is from Cloudflare. Now we could be traffic from another Cloudflare customer, a malicious one, but at least this gives us a good starting point to limit the traffic. Plus presumably if there is a malicious customer hitting you, at least you can reach out to Cloudflare and they'll do....something.

Now we put it together with Terraform and we have something we can deploy. We'll do Digital Ocean as our example but the cloud provider part doesn't really matter.

secrets.json

{
   "private_ssl_key": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tDQpNSUlFcGpDQ0E0NmdBd0lCQWdJVUgzZXMwaHVaQy8rTUNxQWRyWXEwTE05UFY4QXdEUVlKS29aSWh2Y05BUUVMDQpCUUF3Z1lzeEN6QUpCZ05WQkFZVEFsVlRNUmt3RndZRFZRUUtFeEJEYkc5MVpFWnNZWEpsTENCSmJtTXVNVFF3DQpNZ1lEVlFRTEV5dERiRzkxWkVac1lYSmxJRTl5YVdkcGJpQlRVMHdnUTJWeWRHbG1hV05oZEdVZ1FYVjBhRzl5DQphWFI1TVJZd0ZBWURWUVFIRXcxVFlXNGdSbkpoYm1OcGMyTnZNUk13RVFZRFZRUUlFd3BEWVd4cFptOXlibWxoDQpNQjRYRFRJek1EY3pNVEUzTXprd01Gb1hEVE00TURjeU56RTNNemt3TUZvd1lqRVpNQmNHQTFVRUNoTVFRMnh2DQpkV1JHYkdGeVpTd2dTVzVqTGpFZE1Cc0dBMVVFQ3hNVVEyeHZkV1JHYkdGeVpTQlBjbWxuYVc0Z1EwRXhKakFrDQpCZ05WQkFNVEhVTnNiM1ZrUm14aGNtVWdUM0pwWjJsdUlFTmxjblJwWm1sallYUmxNSUlCSWpBTkJna3Foa2lHDQo5dzBCQVFFRkFBT0NBUThBTUlJQkNnS0NBUUVBdmtmbjB1eVZ3LzlSYlBDbDQ2dzhIeVZnTXZKREtVUWgvQUk0DQpIODRXRGRzM1hTRmxrbmFIK0FQdmJoM0Rsc3M5NEZnRDVGVVRMdENzQzRtSFpZVlNiRzJqeCtJbjJGcTdTSjdUDQp1QlJUbHBXWmNyVEViRjRBa00wRm53NGwwbEdQeFlZRjRaOG5uZm13YUtvNnlwb0Ftd3draXJWWXU3dWE4Mm01DQp3eWoyZHZKcWNkUExxTXdHRFVkYnlYemdwZE9IaXRBVFFoTE56VmtaOEI1L2RyODcweDR3TE8rRkVOOG92QUprDQpaNVZCRndSOEI5WEs4dUtEcmdBZkxYUVM5UVZ3WHpjcmQxQVp6S1RDVnBlMmlwemFiSGN5TUt1WDdpZjRTRGQ1DQpiZ2Ird1hycGY2dkNRWklDa3REdWJFcDdCVzlCNVhIUnlmMnJ2Yms2VEtjZ2xTbGNRUUlEQVFBQm80SUJLRENDDQpBU1F3RGdZRFZSMFBBUUgvQkFRREFnV2dNQjBHQTFVZEpRUVdNQlFHQ0NzR0FRVUZCd01DQmdnckJnRUZCUWNEDQpBVEFNQmdOVkhSTUJBZjhFQWpBQU1CMEdBMVVkRGdRV0JCU3pwcWpFOEJUK0FKYUg2c3VnRmwxajdqend4REFmDQpCZ05WSFNNRUdEQVdnQlFrNkZOWFhYdzBRSWVwNjVUYnV1RVdlUHdwcERCQUJnZ3JCZ0VGQlFjQkFRUTBNREl3DQpNQVlJS3dZQkJRVUhNQUdHSkdoMGRIQTZMeTl2WTNOd0xtTnNiM1ZrWm14aGNtVXVZMjl0TDI5eWFXZHBibDlqDQpZVEFwQmdOVkhSRUVJakFnZ2c4cUxtMWhkR1IxWjJkaGJpNWpiMjJDRFcxaGRHUjFaMmRoYmk1amIyMHdPQVlEDQpWUjBmQkRFd0x6QXRvQ3VnS1lZbmFIUjBjRG92TDJOeWJDNWpiRzkxWkdac1lYSmxMbU52YlM5dmNtbG5hVzVmDQpZMkV1WTNKc01BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQ3VvUG9KV05VZ0xPRXVmendLRlprMHBvL2tNR29qDQoxYTdCSGEzcWtNWGUrN2J4aW1pQTBvYzcyVEhYSm8zVm82bTIwaGRpbDRiSzVPYzZoTGpiUTFOR2ZXNm84MXk2DQpyUXZEaXBXN3JuL3R3V3hPTkpHTFNDZDZFalpqWXpUUW5EdFBSQWQrVnBwV1BuNUtLZHRSNkM2ZjhaMFlqeldjDQp3b3JLdkRuV2E5b0gycEUzZUNSRUZsc1lRUUtVNWxOYUpibm9nRXNaY2ZDa0MvU0JCaTRaN0lIRnJzWnd1YTU5DQorVDIxUWNOd3BKbExLZ2VRZlpLazMzTFc5MFlyYjRhNStMaTljQzZsVC9MRHdTc20ySkVVVm1nbDJOaC8wV2dpDQpBcHFxUjV5dmUwdUI2M0tTdW90Z2hyWlp0cnNhVW1OYytjRjhneHU4Si8rdXFhaWZQWk83NVZtVw0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQ",
   "watchtower_token": "tx#okr#n+8_wpf%#n9cxr@30vi7wy_@*@69bw+smfic&k^zb8h",
   "docker_username": "username",
   "docker_password": "password",
   "docker_repository": "repository"
}

Base64 encoded private key for SSL along with the watchtower token to access the API and everything else

Terraform file

terraform {
  required_providers {
    digitalocean = {
      source  = "digitalocean/digitalocean"
      version = "2.30.0"
    }
    sops = {
      source  = "carlpett/sops"
      version = "~> 0.5"
    }
    cloudflare = {
      source  = "cloudflare/cloudflare"
      version = "4.17.0"
    }
  }
}

variable "ssh_public_key" {
  type        = string
  description = "SSH public key"
  default     = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCeogciUcb1roDZWVXaTFrMSqU66qlb4YT2GhDMZQm+cM6kxAgl5GY72Yiuir/Sml8pHMvTRPV5ezg+17gSntnBtIbf3wNwuB0F/21l7vGS2XteY6p557cRHZjSFuc2uPiysnI21FfZCrsEJ7uM3Ebyd/zJ394URcWQm54NtVh/QxuHzfuK9QCbxhlsXXFAfTnrWvLVGQkq/R+fjtKy12o42Y59JIsZT4aORSGujDiagBysGOCXonYqRhs9gmdZPkcKUe3r8j6fZRY2l8/QX3D6zhDZ8x74Gi70ojuvR8oCsWs9tB2sF/XQi806G/s/mbhh6hcj7ALyo5Th+jw7I8rj matdevdug@matdevdug-ThinkPad-X1-Carbon-5th"
}

provider "digitalocean" {
  token = "secret_api_key"
}

data "sops_file" "secret" {
  source_file = "secrets.enc.json"
}

locals {
  virtual_machines = {
    "server01" = { vm_size = "s-4vcpu-8gb", zone = "nyc1" },
    "server02" = { vm_size = "s-4vcpu-8gb", zone = "nyc1" },
    "server03" = { vm_size = "s-4vcpu-8gb", zone = "nyc1" },
    "server04" = { vm_size = "s-4vcpu-8gb", zone = "nyc1" }
  }
}

resource "digitalocean_droplet" "web" {
  for_each = local.virtual_machines
  name     = each.key
  image    = "debian-12-x64"
  size     = each.value.vm_size
  region   = each.value.zone
  user_data = templatefile("${path.module}/cloud-init.yaml", {
    init_ssh_public_key = file(var.ssh_public_key)
    private_ssl_key     = data.sops_file.secret.data["private_ssl_key"]
    watchtower_token    = data.sops_file.secret.data["watchtower_token"]
    docker_username     = data.sops_file.secret.data["docker_username"]
    docker_password     = data.sops_file.secret.data["docker_password"]
    docker_repository   = data.sops_file.secret.data["docker_repository"]
  })
}

resource "digitalocean_reserved_ip" "example" {
  for_each   = digitalocean_droplet.web
  droplet_id = each.value.id
  region     = each.value.region
}

Hooking it all together

So we'll need to go back to the Cloudflare terraform and set the reserved_ips we get from the cloud provider as the IPs for the origins. Then we should be able to go through, set Authenticated Origin Pulls up as well as SSL to "Strict" in the Cloudflare control panel. Finally since we have Watchtower set up, all we need to deploy a new version of the application is to write a simple deploy script that curls each one of our servers IP addresses with the Watchtower HTTP Token set to tell it to pull a new version of our container from our registry and deploy it. Read more about that here.

In my testing (which was somewhat limited), even though the scripts needed tweaks and modifications, the underlying concept actually worked pretty well. I was able to see all my traffic coming through Cloudflare easily, the SSL components all worked and whenever I wanted to upgrade a host it was pretty simple to stop traffic to the host in the web UI, reboot or destroy and run Terraform again and then send traffic to it again.

In terms of encryption while my age solution wasn't perfect I think it'll hold together reasonably well. It is a secret value which you can safely commit to source control and rotate the secret pretty easily whenever you want.

Next Steps

Put the whole thing together in a structured Terraform module so it's more reliable and less prone to random breakage
Write out a bunch of different cloud provider options to make it easier to switch between them
Write a simple CLI to remove an origin from the load balancer before running the deploy and then confirming the origin is healthy before sticking it back in (for the requirement of zero-downtime deployments)
Taking a second pass at the encryption story.

Going through this is a useful exercise in explaining why these infrastructure products are so complicated. They're complicated because its hard to do and has a lot of moving parts. Even with the heavy use of existing tooling, this thing turned out to be more complicated than I expected.

Hopefully this has been an interesting thought experiment. I'm excited to take another pass at this idea and potentially turn it into a more usable product. If this was helpful (or if I missed something based), I'm always open to feedback. Especially if you thought of an optimization! https://c.im/@matdevdug

Terraform Cloud Review

September 13, 2023 in DevOps

If I were told to go off and make a hosted Terraform product, I would probably end up with a list of features that looked something like the following:

Extremely reliable state tracking
Assistance with upgrading between versions of Terraform and providers and letting users know when it looked safe to upgrade and when there might be problems between versions
Consistent running of Terraform with a fresh container image each time, providers and versions cached on the host VM so the experience is as fast as possible
As many linting, formatting and HCL optimizations I can offer, configurable on and off
Investing as much engineering work as I can afford in providing users an experience where, unlike with the free Terraform, if a plan succeeds on Terraform Cloud, the Apply will succeed
Assisting with Workspace creation. Since we want to keep the number of resources low, seeing if we can leverage machine learning to say "we think you should group these resources together as their own workspace" and showing you how to do that
Figure out some way for organizations to interact with the Terraform resources other than just running the Terraform CLI, so users can create richer experiences for their teams through easy automation that feeds back into the global source of truth that is my incredibly reliable state tracking
Try to do whatever I can to encourage more resources in my cloud. Unlimited storage, lots of workspaces, helping people set up workspaces. The more stuff in there the more valuable it is for the org to use (and also more logistically challenging for them to cancel)

This is me would be a product I would feel confident charging a lot of money for. Terraform Cloud is not that product. It has some of these features locked behind the most expensive tiers, but not enough of them to justify the price.

I've written about my feelings around the Terraform license change before. I won't bore you with that again. However since now the safest way to use Terraform is to pay Hashicorp, what does that look like? As someone who has used Terraform for years and Terraform Cloud almost daily for a year, it's a profoundly underwhelming experience.

Currently it is a little-loved product with lots of errors and sharp edges. This is as v0.1 of a version of this as I could imagine, except the pace of development has been glacial. Terraform Cloud is a "good enough" platform that seems to understand that if you could do better, you would. Like a diner at 2 AM on the side of the highway, it's primary selling point is the fact that it is there. That and the license terms you will need to accept soon.

Terraform Cloud - Basic Walkthrough

At a high level Terraform Cloud allows organizations to centralize their Projects and Workspaces and store that state with Hashicorp. It also gives you access to a Registry for you to host your own privacy Terraform modules and use them in your workspaces. The top level options look as follows:

You may be wondering "What does Usage do?" I have no idea, as the web UI has never worked for me even though I appear to have all the permissions one could have. I have seen the following since getting my account:

I'm not sure what access I lack or if the page was intended to work. It's very mysterious in that way.

There is Explorer, which lets you basically see "what versions of things do I use across the different repos". You can't do anything with that information, like I can't say "alright well upgrade these two to the version that everyone else uses". It's also a beta feature and not one that existed when I first started using the platform.

Finally there are the Workspaces, where you spend 99% of your time.

You get some ok stats here. Up in the top left you see "Needs Attention", "Errors", "Running", "Hold" and then "Applied." Even though you may have many Workspaces, you cannot change how many you see here. 20 is the correct number I guess.

Creating a Workspace

Workspaces are either based on a repo, CLI driven or you call the API. You tell it what VCS, what repo, if you want to use the root of the repo or a sub-directory (which is good because soon you'll have too many resources for one workspace for everything). You tell it Auto Apply (which is checked by default) or Manual and when to trigger a run (whenever a change, whenever specific files in a path change or whenever you push a tag). That's it.

You can see all the runs, what their status is and basically what resources have changed or will change. Any plan that you run from your laptop also show up here. Now you don't need to manage your runs here, you can still do local, but then there is absolutely no reason to use this product. Almost all of the features rely on your runs being handled by Hashicorp here inside of a Workspace.

Workspace flow

Workspaces show you when the run was, how long the plan took, what resources are associated with this (10 resources at a time even though you might have thousands. Details links you to the last run, there are tags and run triggers. Run triggers allow you to link workspaces together, so this workspace would dependent on the output of another workspace.

The settings are as follows:

Runs is pretty straight forward. States allow you to inspect the state changes directly. So you can see the full JSON of a resource and roll back to this specific state version. This can be nice for reviewing what specifically changed on each resource, but in my experience you don't get much over looking at the actual code. But if you are in a situation where something has suddenly broken and you need a fast way of saying "what was added and what was removed", this is where you would go.

NOTE: BE SUPER CAREFUL WITH THIS

The state inspector has the potential to show TONS of sensitive data. It's all the data in Terraform in the raw form. Just be aware it exists when you start using the service and take a look to ensure there isn't anything you didn't want there.

Variables are variables and the settings allow you to lock the workspace, apply Sentinel settings, set an SSH key for downloading private modules and finally if you want changes to the VCS to trigger an action here. So for instance, when you merge in a PR you can trigger Terraform Cloud to automatically apply this workspace. Nothing super new here compared to any CI/CD system, but still it is all baked in.

That's it!

No-Code Modules

One selling point I heard a lot about, but haven't actually seen anyone use. The idea is good though, where you write premade modules and push them to your private registry. Then members of your organization can just run them to do things like "stand up a template web application stack". Hashicorp has a tutorial here that I ran though and found it to work pretty much as expected. It isn't anywhere near the level of power that I would want, compared to something like Pulumi, but it is a nice step forward for automating truly constant tasks (like adding domain names to an internal domain or provisioning some SSL certificate for testing).

Dynamic Credentials

You can link Terraform Cloud and Vault, if you use it, so you no longer need to stick long-living credentials inside of the Workspace to access cloud providers. Instead you can leverage Vault to get short-lived credentials that improve the security of the Workspaces. I ran through this and did have problems getting it worked for GCP, but AWS seemed to work well. It requires some setup inside of the actual repository, but it's a nice security improvement vs leaving production credentials in this random web application and hoping you don't mess up the user scoping.

User scoping is controlled primarily through "projects", which basically trickle down to the user level. You make a project, which has workspaces, that have their own variables and then assign that to a team or business unit. That same logic is reflected inside of credentials.

Private Registry

This is one thing Hashicorp nailed. It's very easy to hook up Terraform Cloud to allow your workspaces to access internal modules backed by your private repositories. It supports the same documentation options as public modules, tracks downloads and allows for easy versioning control through git tags. I have nothing but good things to say about this entire thing.

Sharing between organizations is something they lock at the top tier, but this seems like a very niche usecase so I don't consider it to be too big of a problem. However if you are someone looking to produce a private provider or module for your customers to use, I would reach out to Hashicorp and see how they want you to do that.

The primary value for this is just to easily store all of your IaC logic in modules and then rely on the versioning inside of different environments to roll out changes. For instance, we do this for things like upgrading a system. Make the change, publish the new version to the private registry and then slowly roll it out. Then you can monitor the rollout through git grep pretty easily.

Pricing

$0.00014 per hour per resource. So a lot of money when you think "every IAM custom role, every DNS record, every SSL certificate, every single thing in your entire organization". You do get a lot of the nice features at this "standard" tier, but I'm kinda shocked they don't unlock all the enterprise features at this price point. No-code provisioning is only available at the higher levels, as well as Drift detection, Continuous validation (checks between runs to see if anything has changed ) as well as Ephemeral workspaces. The last one is a shame, because it looks like a great feature. Set up your workspace to self-destruct at regular intervals so you can nuke development environments. I'd love to use that but alas.

Problems

Oh the problems. So the runners sometimes get "stuck", which seems to usually happen after someone cancels a job in the web UI. You'll run into an issue, try to cancel a job, fix the problem and rerun the runner only to have it get stuck forever. I've sat there and watched it try to load the modules for 45 minutes. There isn't any way I have seen to tell Terraform Cloud "this runner is broken, go get me another one". Sometimes they get stuck for an unknown reason.

Since you need to make all the plans and applies remotely to get any value out of the service, it can also sometimes cause traffic jams in your org. If you work with Terraform a lot, you know you need to run plans pretty regularly. Since you need to wait for a runner every single time, you can end up wasting a lot of time sitting there waiting for another job to finish. Again I'm not sure what triggers you getting another runner. You can self host, but then I'm truly baffled at what value this tool brings.

Even if that was an option for you and you wanted to do it, its locked behind the highest subscription tier. So I can't even say "add a self-hosted runner just for plans" so I could unstick my team. This seems like an obvious add, along with a lot more runner controls so I could see what was happening and how to avoid getting it jammed up.

Conclusion

I feel bad this is so short, but there just isn't anything else to write. This is a super bare-bones tool that does what it says on the box for a lot of money. It doesn't give you a ton of value over Spacelift or or any of the others. I can't recommend it, it doesn't work particularly well and I haven't enjoyed my time with it. Managing it vs using an S3 bucket is an experience I would describe as "marginally better". It's nice that it handles contention across team mates for me, but so do all the others at a lower price.

I cannot think of a single reason to recommend this over Spacelift, which has better pricing, better tooling and seems to have a better runner system except for the license change. Which was clearly the point of the license change. However for those evaluating options, head elsewhere. This thing isn't worth the money.

We need a different name for non-technical tech conferences

September 06, 2023

I recently returned from Google Cloud Next. Typically I wouldn't go to a vendor conference like this, since they're usually thinly veiled sales meetings wearing the trench-coat of a conference. However I've been to a few GCP events and found them to be technical and well-run, so I rolled the dice and hopped on the 11 hour flight from London to San Francisco.

We all piled into Moscone Center and I was pretty hopeful. There were a lot of engineers from Google and other reputable orgs, the list of talks we had signed up for before showing up sounded good, or at least useful. I figured this could be a good opportunity to get some idea of where GCP was going and perhaps hear about some large customers technical workarounds to known limitations and issues with the platform. Then we got to the keynote.

AI. The only topic discussed and the only thing anybody at the executive level cared about was AI. This would become a theme, a constant refrain among every executive-type I spoke to. AI was going to replace customer service, programmers, marketing, copy writers, seemingly every single person in the company except for the executives. It seemed only the VPs and the janitors were safe. None of the leaders I spoke to afterwards seemed to appreciate my observation that if they spent most of their day in meetings being shown slide decks, wouldn't they be the easiest to replace with a robot? Or maybe their replacement could be a mop with sunglasses leaned against an office chair if no robot was available.

I understand keynotes aren't for engineers, but the sense I got from this was "nothing has happened in GCP anywhere else except for AI". This isn't true, like objectively I know new things have been launched, but it sends a pretty clear message that it's not a priority if nobody at the executive level seems to care about them. This is also a concern because Google famously has institutional ADHD with an inability to maintain long-term focus on slowly incrementing and improving a product. Instead it launches amazing products, years ahead of the competition then, like a child bored with a toy, drops them into the backyard and wanders away. But whatever, let's move on from the keynote.

Over the next few days what I was to experience was an event with some fun moments, mostly devoid of any technical discussion whatsoever. Rarely were talks geared towards technical staff, when technical questions came up during the recorded events they were almost never answered. Most importantly there was no presentation I heard that even remotely touched on long-known missing features of GCP when compared to peers or roadmaps. When I asked technical questions, often Google employees would come up to me after the talk with the answer, which I appreciate. But everyone at home and in the future won't get that experience and miss out on the benefit.

Most talks were the GCP products marketing page turned into slides, with a seemingly mandatory reference to AI in each one. Several presenters joked about "that was my required AI callout", which started funny but as time went on I began to worry...maybe they were actually required to mention AI? There are almost no live demos (pre-recorded which is ok but live is more compelling), zero code shown, mostly a tour of existing things the GCP web console could do along with a few new features. I ended up getting more value from finding the PMs of various products on the floor and subjecting to these poor souls to my many questions.

This isn't just a Google problem. Every engineer I spoke to about this talked about a similar time they got burned going to "not a conference conference". From AWS to Salesforce and Facebook, these organizations pitch people on getting facetime with engineers and concrete answers to questions. Instead they're opportunity to try and pitch you on more products, letting executives feel loved by ensuring they get one-on-one time from senior folks in the parent company. They sound great but mostly it's an opportunity to collect stickers.

We need to stop pretending these types of conferences are technical conferences. They're not. It's an opportunity for non-technical people inside of your organization who interact with your technical SaaS providers to get facetime with employees of that company and ask basic questions in a shame-free environment. That has value and should be something that exists, but you should also make sure engineers don't wander into these things.

Here are the 7 things I think you shouldn't do if you call yourself a tech conference.

7 Deadly Sins of "Tech" Conferences

Discussing internal tools that aren't open source and that I can't see or use. It's great if X corp has worked together with Google to make the perfect solution to a common problem. It doesn't mean shit to me if I can't use it or at least see it and ask questions about it. Don't let it into the slide deck if it has zero value to the community outside of showing that "solving this problem is possible".
Not letting people who work with customers talk about common problems. I know, from talking to Google folks and from lots of talks with other customers, common issues people experience with GCP products. Some are misconfigurations or not understanding what the product is good at and designed to do. If you talk about a service, you need to discuss something about "common pitfalls" or "working around frequently seen issues".
Pretending a sales pitch is a talk. Nothing makes me see red like halfway through a talk, inviting the head of sales onto the stage to pitch me on their product. Jesus Christ, there's a whole section of sales stuff, you gotta leave me alone in the middle of talks.
Not allowing a way for people to get questions into the livestream. Now this isn't true for every conference, but if this is the one time a year people can ask questions of the PM for a major product and see if they intend to fix a problem, let me ask that question. I'll gladly submit it beforehand and let people vote on it, or whatever you want. It can't be a free-for-all but there has to be something.
Skipping all specifics. If you are telling me that X service is going to solve all my problems and you have 45 minutes, don't spend 30 explaining how great it is in the abstract. Show me how it solves those problems in detail. Some of the Google presenters did this and I'm extremely grateful to them, but it should have been standard. I saw the "Google is committed to privacy and safety" generic slides so many times across different presentations that I remembered the stock photo of two women looking at code and started trying to read what she had written. I think it was Javascript.
Blurring the line between presenter and sponsor. Most well-run tech conferences I've been to make it super clear when you are hearing from a sponsor vs when someone is giving an unbiased opinion. A lot of these not-tech tech conferences don't, where it sounds like a Google employee is endorsing a third-party solution who has also sponsored the event. For folks new to this environment, it's misleading. Is Google saying this is the only way they endorse doing x?
Keeping all the real content behind NDAs. Now during Next there were a lot of super useful meetings that happened, but I wasn't in them. I had to learn about them from people at the bar who had signed NDAs and were invited to learn actual information. If you aren't going to talk about roadmap or any technical details or improvements publically, don't bother having the conference. Release a PDF with whatever new sales content you want me to read. The folks who are invited to the real meetings can still go to those. No judgement, you don't want to have those chats publically, but don't pretend you might this year.

One last thing: if you are going to have a big conference with people meeting with your team, figure out some way you want them to communicate with that team. Maybe temporary email addresses or something? Most people won't use them, but it means a lot to people to think they have a way of having some line of communication with the company. If they get weird then just deactivate the temp email. It's weird to tell people "just come find me afterwards". Where?

What are big companies supposed to do?

I understand large companies are loathe to share details unless forced to. I also understand that companies hate letting engineers speak directly to the end users, for fear that the people who make the sausage and the people who consume the sausage might learn something terrible about how its made. That is the cost of holding a tech conference about your products. You have to let these two groups of people interact with each other and ask questions.

Now obviously there are plenty of great conferences based on open-source technology or about more general themes. These tend to be really high quality and I've gone to a ton I love. However there is value, as we all become more and more dependent on cloud providers, to letting me know more about what this platform is moving towards. I need to know what platforms like GCP are working on so I know what is the technology inside the stack on the rise and which are on the decline.

Instead these conferences are for investors and the business community instead of anyone interested in the products. The point of Next was to show the community that Google is serious about AI. Just like the point of the last Google conference was to show investors that Google is serious about AI. I'm confident the next conference on any topic Google has will also be asked to demonstrate their serious committment to AI technology.

You can still have these. Call them something else. Call them "leadership conferences" or "vision conferences". Talk to Marketing and see what words you can slap in there that conveys "you are an important person we want to talk about our products with" that also tells me, a technical peon, that you don't want me there. I'll be overjoyed not to fly 11 hours and you'll be thrilled not to have me asking questions of your engineers. Everybody wins.