Skip to content

If I Could Make My Own GitHub

My friend and I have a game where we talk about what we'd do if we were rich. Not rich like 'paid off the mortgage' rich. Rich like a man who owns a submarine he's never been inside. Rich like a man whose third wife has a skincare line. Tech-titan rich — the kind of money that buys you a compound in Wyoming and the confidence to wear the same gray t-shirt to congressional testimony."

One of mine, for a long time, has been the dream of making a new forge. I was prompted to write this after reading the good post about Ghostty leaving GitHub but it's something I've written and talked about for a few years. Given how bad GitHub has become at its core job, it seemed like a fun opportunity to try and write up what my billionaire folly of a forge would look like. This folly would have less penile rockets filled with aging celebrities.

What are the problems with modern forges?

GitHub, GitLab and Gitea (those being the 3 I've used the most) are all modeled on effectively the same design. There are differences, but you can tell that GitHub sets the pattern for the industry and then those features are ported over to the other two with varying levels of success. The issue with all of these is that they're designed to add things git doesn't do that you need.

Git is great at what it is designed to do, but what it is designed to do isn't the way most people are using it. Git is a perfect tool for kernel development. It is a decentralized distributed version control system that relies on the idea of patches being sent to maintainers over email. You trust those maintainers to maintain their sections and merge in the stuff that makes sense and not merge in the other stuff. It's a pretty high trust environment that places very few restrictions on how online a specific contributor is or what system they are using. If you have a laptop from 2010 that connects to the Internet once a week you can still be a meaningful contributor to a project with these workflows. .

However, in most jobs, git is effectively just the way I pull and push from a centralized repository stored in a forge. All the important stuff happens inside the forge, and very little of it happens on my client. Pull Requests are how I enforce the four-eyes principle, GitHub Actions are how I run my tests and linting on those Pull Requests to ensure they are functional and meet my organizational requirements, the user's identity in relation to that forge is how I verify who they are. I track issues with my code through Issues and cut releases for users to download through Releases. There's not a lot of git in this workflow, this is mostly placed on top of git.

So here are the primary issues I see with modern forges that I would love to solve.

  1. Stuff happens in the wrong order. You know the PR. Commit 1: 'Feature.' Commit 2: 'fix.' Commit 3: 'fix.' Commit 4: 'actually fix.' Commit 5: 'please.' Commit 6, made at 11:47 PM on a Thursday: 'asdfasdf'. This person has a family. This person has hobbies. This person is, at this moment, crying. You don't want the feedback loop after the commit you want it before. Let me do an enforced pre-commit hook to run the jobs remotely on the forge and provide the feedback to the user before they push.
  2. PR approval is too boolean. The PR is approved or it's not approved. Real code review, like real life, lives in the middle. 'Sure, fine, we'll deal with it later' is a legitimate human response and should be a legitimate button. Gerrit has a better model for this. If I weakly approve something as a maintainer, let me flag it for later.
  3. PRs are too inflexible. I don't need 4 eyes on every change, especially in a universe where LLMs exist. The global GDP lost annually to senior engineers staring at a four-line PR waiting for someone — anyone — to type 'LGTM' could fund a moon mission. A nice one. With legroom. Let me customize and more easily control this. If the person is a maintainer and the LLM says its low risk/no risk just let them go.
  4. Stacked PRs are just better. They're easier to review and understand. They have to be a first-class citizen not an add-on through a tool other than your VCS.
  5. A forge shouldn't do everything. Issue tracking yes. Kanban board, probably not. Wiki? I doubt it. Everything tools always turn into crap. You add features when its easy to add features and then pay the maintenance price for those features forever regardless of their rate of adoption because now someone, somewhere uses them and you are locked in.
  6. The standard unit of hosting is too large. Running Github Enterprise is a big task. Running GitLab is also a relatively big ask. These are complicated products with a lot of moving pieces. I want smaller individual units of hosting that I can link together to make an organization. It's fine if they're not globally federated and I need to make an account for each Organization, but an Organization should be flexible enough to let me say "these 12 Raspberry Pis are my org". I don't know how they communicate securely, I hire people for that problem.
  7. My local copy of the repo should be a representation of the entire repo, not just the code. I should be able to approve a PR from the same VCS I use to check in the code. I should be able to go through my issues by looking through local files.
  8. On the flip side, since I need to be online all the time to really work with a team, don't make me pay the storage price all the time. I want my VCS and my forge to work together. If I clone a repo, I want a pretty limited history for that repo when I clone. If I start to go back in time, spin up a worker to go fetch that stuff from the VCS when I need it. I don't need to hammer the forge with giant clone requests on the assumption that I might need to rebuild the forge at any moment with the entire history of the entire project.
  9. Actions need to be signed, SHA'd and usable offline. If I want, I should be able to get tarballs of all my actions, stick them in the repo and tell my system "don't go anywhere for checkout action, that's right here". If I say latest, have that work like Dependabot does now where it opens up a PR to put the latest tarballs in my repo. Actions are critical and they should be runnable on my local machine through the same VCS if I want to.

Well Y Does Some Of That

Absolutely. There are a lot of tools that do parts of this. I want someone to take them, put them all together and fit them up. I want JJ as the VCS, I want this as the forge and I want the expectation that I as a user could live happily with a raspberry pi as a forge for a long time. I want those forges designed around modern concepts like object storage and shallow clones and getting constantly hammered by LLM bots.

Now in a universe where GitHub was doing a good job, I wouldn't even bother writing this up. GitHub is the default and talking to people about overcoming the default is usually a waste of time. Heinz is the default ketchup, when I order a Coke I don't want a Pepsi and if I'm going to use a forge up until 2026 there would have to be an amazing reason for me to not choose the one that everyone uses. Up until recently other forges have been like sweet potato french fries, which is to say never the thing you actually want.

But we live in a world where the monolithic forge is breaking down and nobody has built the replacement. The people with the money are busy with the rockets. The people with the taste are busy with their day jobs. And the rest of us are opening PRs titled 'asdfasdf' at midnight, waiting for a robot to check them, wondering when the tool we spend our whole working lives inside stopped being built for us.

If I ever get the submarine money, I'll let you know.


You can absolutely have an RSS dependent website in 2026

I write stuff here. Sometimes the stuff is good. Sometimes it reads like I wrote it at 2 AM after an argument with a YAML file, which is because I did. But one decision I made early on was that I didn't want to offer an email newsletter.

Part of this was simple economics. At one point I did have a Subscribe button up, and enough people clicked it that the cost of actually sending those emails started to resemble a real bill. Sending thousands of emails when you have no ads, no sponsors, and no monetization strategy beyond "I guess people will just... read it?" doesn't make a lot of financial sense.

But the bigger reason — the one I actually care about — is that I didn't want a database full of email addresses sitting under my control if I could possibly avoid it. There's a particular flavor of anxiety that comes with being the custodian of other people's personal data, a low-grade dread not unlike realizing you've been entrusted with someone's elderly cat for two weeks and the cat has a medical condition. I can't lose data I don't have. I never need to lie awake wondering whether some user is reusing their bank password to log into my website just to manage their subscription preferences. The best way I can safeguard user data is by never having any in the first place. It's not a security strategy you'll find in any textbook, but it is airtight.

Now, when I explained this philosophy to people who run similar websites, the reaction was — and I'm being generous here — warm laughter. The kind of laughter you get when you ask if an apartment in Copenhagen is under $1,000,000. Email newsletters are the only way to run a site like this, they said. RSS is dead, they said. You might as well be distributing your writing via carrier pigeon or community bulletin board. One person looked at me the way you'd look at someone who just announced they were going to navigate cross-country using only a paper atlas. Not angry. Just sad.

I'm lucky in that I'm not trying to get anyone to pay me to come here. If I were, the math would probably change. I'd be out there A/B testing subject lines and agonizing over open rates like everyone else, slowly losing pieces of my soul in a spreadsheet. But if your question is simply, "Can I make a hobbyist website that actual humans will find and read without an email newsletter?" — the answer is a resounding yes. And I have the logs to prove it.

Stats

All of this is from Nginx access.log.

===========================================
 Traffic Source Breakdown (Current Log)
===========================================

Total Filtered Requests:  49089

  RSS/Feed Traffic:            24750  (50.42%)
  Homepage Traffic:             1534  (3.12%)
  Other Traffic:               22805  (46.46%)

These logs get rotated daily and don't include the majority of requests that hit the Cloudflare cache before they ever reach my server, so the real numbers are higher. But I think they're reasonably representative of the overall shape of things. About half my traffic is readers hitting /feed or /rss — people who have, of their own free will, pointed an RSS reader at my site and said yes, tell me when this person has opinions again. The other half are arriving via a specific link they stumbled across somewhere in the wild.

If we do a deeper dive into that specific RSS traffic, we learn a few interesting things.

The user-agent breakdown shows the usual suspects — the RSS readers you'd expect, the ones that have been around long enough to have their own Wikipedia articles. There are also some abusers in the metrics. I have no idea what "Daily-AI-Morning" is, but whatever it's doing, it's polling my feed with the frantic energy of someone refreshing a package tracking page on delivery day.

The time distribution, though, is pretty good — spread out across the day in a way that suggests real humans checking their feeds at real human intervals, rather than a single bot hammering me every thirty seconds.

My conclusion is this: if you want to run a website that relies primarily on RSS instead of email newsletters, you absolutely can. The list of RSS readers hasn't dramatically changed in a long time, which is actually reassuring — it means the ecosystem is stable, not dead. The people who use RSS really use RSS. They're not trend-chasers. They're the type who still have a working bookmark toolbar. They are, in the best possible sense, your people.

Effectively, if you make your site RSS-friendly and you test it in NetNewsWire, you will — slowly, quietly, without a single "SUBSCRIBE FOR MORE" pop-up — build a real audience of people who actually want to read what you write. No email database required. No passwords to leak. No giant confusing subscription system.


I Can't See Apple's Vision

Companies, as they grow to become multi-billion-dollar entities, somehow lose their vision. They insert lots of layers of middle management between the people running the company and the people doing the work. They no longer have an inherent feel or a passion about the products. The creative people, who are the ones who care passionately, have to persuade five layers of management to do what they know is the right thing to do. - Steve Jobs

I don't typically write about Apple stuff. It's the most written-about company on earth. Every product launch gets the kind of forensic scrutiny normally reserved for plane crashes and celebrity divorces.

Mostly though, I feel like a line cook at a Denny's talking trash about whether the French Laundry has lost their way. I'm back here microwaving a Grand Slam and opining about Thomas Keller's sauce work. The engineers I know personally at Apple are, on average, much more talented than me. They work harder, they do it for decades without a break, and none of them have ever shipped a feature while still wearing pajama pants at 2 PM. It seems insane for someone of my mediocre talent to critique them.

It also feels a little dog-pile-y. Apple employees know Tahoe sucks. They know it the way you know your haircut is bad — they don't need strangers on the internet confirming it. And to be fair, there's genuinely great work buried inside Tahoe: the clipboard manager, the automation APIs, a much-improved Spotlight. But visually it's gross, and that matters when your entire brand identity is "we're the ones who care about design."

Instead, I want to talk about a bigger problem and one that I do feel qualified to talk about because I am very guilty of committing this sin. I don't see a cohesive vision for MacOS and WatchOS. This, more than one bad release, seems far worse to me and dangerous for the company. Since this is already 2000 words as a draft I'll save WatchOS for another time. I'm verbose but even I have limits.

Now to be clear this isn't across every product. iPadOS has a strong vision and have the strength of their convictions to change approaches. The different stabs at solving the window problem inside of the iPad and make it so that you still have an iPad experience while being able to do multiple things at the same time is proof of that. iOS has an incredibly strong vision for what the product is and isn't and how the software works with that.

VisionOS and tvOS are less strong, but visionOS is still finding its footing in a brand new world. The Apple TV hardware and software is in a weirdly good position even though nothing has changed about it in what feels like geological time. I've purchased every version of the Apple TV, and with the exception of that black glass remote — the one that felt like it was designed by someone who had never held a remote, or possibly a physical object — everything has been pretty good. I'm still not clear how storage works on the Apple TV and I don't think anybody outside of Apple does either. I'm not even sure Apple knows. But somehow it's fine.

But with watchOS and MacOS we have 2 software stacks that seem to be letting down the great hardware they are installed in. They seem to be evolving in random directions with no clear end goal in mind. I used to be able to see what OS X was aiming for, even if it didn't hit that goal. Now with two of Apple's platform I'm not able to see anything except a desire to come up with something to show as this years release.

OS X Has An Opinion

When I got my first Mac — an iBook G3 — the experience was like test-driving a Ferrari that someone had fitted with a lawnmower engine. You'd click on the hard drive icon and wait. And wait. And in those few seconds of waiting, you'd think: man, this would be incredible if the hardware could keep up. The software had somewhere it wanted to go. The hardware just couldn't get it there yet.

This trend continued for a long time on OS X, where you'd see Apple really pushing the absolute limits of what it could get away with. After the rock solid stability of 10.4 Apple took a lot of swings with 10.5 and they didn't all land. The first time you opened the Time Machine UI and the entire thing crawled to an almost crash, you'd think boy maybe this wasn't quite ready for prime time. But this entire time there wasn't really a question, ever, that there was a vision for what this looked like.

10.5 Time Machine

The progression of OS X from the beta onward was this:

  • It's Unix, but you never need to know that. All the power, none of the beard. You get the stability of a server OS without ever having to type sudo into anything.
  • Everything annoying is abstracted away. Drivers? Gone. "Installing" an application? You drag it into a folder. That's it. That's the install. It felt like the computer was meeting you more than halfway — it was practically doing your job for you and then apologizing for not doing it sooner.
  • If it seems like it should work, it works. Double-click a PDF, it opens. Put in a DVD, it plays. Drag an app to the Applications folder and it becomes an application. This sounds obvious now, but in 2003 this was like witchcraft if you were coming from Windows.
  • But it was also serious. It wasn't cluttered with stupid bullshit. It was designed for people who made things — with real font management, color calibration, the works. The OS tried to stay out of your way. Your content was the show; everything else was stagecraft.

OS X tried to accommodate you, not the other way around. When you look at these screenshots I'm always surprised how light the touch is. There isn't a lot of OS here to the user. Almost everything is happening behind the scenes and the stuff you do see is pretty obvious.

The first time I thought "oh man, they've lost the thread" was Notifications. On iOS, Notifications make sense — you've got apps buried in folders three screens deep, so a unified system for surfacing what's happening is genuinely useful. On macOS, this design makes absolutely no sense at all. You can see your applications. They're right there. In the Dock. Which is also right there.

This is the beginning of this feeling of "we aren't sure what we're doing here with the Mac anymore". iOS users like Notifications so maybe you dorks will too? It consumes a huge amount of screen real estate, it was never (and still isn't) clear what should and shouldn't be a notification. Even opening up mine right now it's filled with garbage that doesn't make sense to notify me about. A thing has completed running the thing that I asked it to run? Why would I need to know that?

There is also already a clear way to communicate this information to me. The application icon adds an exclamation point or bounces up and down in the dock. With Notifications you end up with just garbage noise taking up your screen for no reason. Maybe worse, it's not even garbage designed with the Mac in mind. It's just like random crap nobody cares about that looks exactly like iOS Notifications.

The issue with copying everything from iOS is that it's like copying someone's homework — except they go to a different school, in a different country, studying a different subject. It's not just wrong in the way where you tried and failed. It's wrong in a way that makes everyone who encounters it deeply uncomfortable. The teacher doesn't even know where to begin. They just stare at it.

For years afterwards it seemed like the purpose of MacOS was just to port iOS features to the Mac years after their launch on iOS. Often these didn't make much sense or hadn't had a lot of effort expended in making them very Mac-y. Like there was clearly a favorite child with iOS, then a sassy middle child with iPadOS and then, like a 1980s sitcom where there was a contract dispute, "another child" you saw every 5th episode run down the stairs in the background with no lines. Me at home would shout at my TV "I knew they didn't kill you off MacOS!".

Pour one out for Apple's most hated product. RIP bud.

Now with Tahoe there's clearly some sort of struggle happening inside of the team. And here's what's maddening — buried inside this visual catastrophe, someone at Apple is doing incredible work. Clipboard management has been table stakes in the third-party ecosystem for years. Apple finally added a version that handles 90% of use cases. It's classic Sherlocking: Apple shows up ten years late to the party, brings a decent bottle of wine, and somehow half the guests leave with them.

Same with Spotlight. Spotlight hasn't gotten a ton of love in years. Suddenly it's really competing with third-party tools. If you're searching for a file, you can filter it based on where the file is stored. Type "name of Directory" press the Tab key, and then type the name of the file before pressing Enter. This is great! We finally have keyword search for stuff like kind:reminder. Application shortcuts for opening stuff with things like ff for Firefox is nice. Assign a quick key like “se” to Send Email. Type it in Spotlight, hit enter, and compose your message.

This is all classic Apple thinking which is "how can we make the Mac as good as possible such that you, the user, don't need to download any third-party applications to get a nice experience". You don't need a word processor, you have a word processor and a spreadsheet application and presentation software and a PDF viewer and a clipboard manager and a system launcher and automation APIs etc etc etc. This is a vision that is consistent throughout the entire systems history, how can we help you do the things you need to do more easily.

But the reason why I'm stressed as someone who is pretty invested in the ecosystem is that the visual stuff is so bad and not just bad, but negligent. We didn't test how it was gonna look under a bunch of situations so that's now someone else's problem. Whenever I get a finder sidebar covering folder contents so I had to resize the window every time, or the Dock freaks out and refuses to come back out, it feels like I installed one of those OS X skins for a Linux distro. I buy Apple stuff cause its nice to look at and this is horrible to look at.

Why is this so big? Why did you cut off the word "Finder" from Force Quit? Everywhere you look there's a million of these papercuts. We have a resolution on our laptops screen that would have made people collapse in 2005 why must we waste all of it on UI elements? Also you can't grab window edges as shown by the best post ever written here: https://noheger.at/blog/2026/01/11/the-struggle-of-resizing-windows-on-macos-tahoe/

Why is there so much empty space between everything? Why are there six ways to do literally everything? Why did we copy the concept of Control Center from iOS at all if there's very little limit on screen real estate and we could already do this from the menu bar? So we're going to keep the Mac menu bar but we're going to add a full iPad control system and then we're going to use the iPad control system to manage the menu bar.

I will say the "Start Screen Saver" makes me laugh because its a mistake I would make in CSS. The text is too long so the button is giant but we didn't resize the icon so it looks crazy. Now do we need the same text inside the button as outside of it? No, and that leads me to the other banger. It's pretty clear the two white boxes inside of "Scene or Accessory" were supposed to be text, Scene on the top and then Accessory on the bottom, but SwiftUI couldn't do that so they left the placeholder. Somewhere there is a Jira ticket to come back to this that got trashed.

Also, complete aside. Has anyone in the entire fucking world ever run Shazam from a Mac? What scenario are we designing for here? I hear a banger at the coffee shop so I hold my MacBook Pro up over my head like John Cusack in Say Anything, hoping it catches enough audio before my arms give out? "Recognize Music" is in my menu bar, taking up space that could be used for literally anything else, on the off chance I need to identify a song using a device that weighs four pounds and has no microphone worth using in a noisy room. If you are going to copy ipadOS's homework you need to think about it for 30 seconds.


So my hope is that the improvement camp wins. That the people who built the better Spotlight and the clipboard manager and the automation APIs are the ones who get to set the direction. Because right now it feels like the best work on macOS is being done in spite of the overall vision, not because of it. Like someone's sneaking vegetables into a toddler's mac and cheese. The good stuff is in there — you just have to eat around a lot of neon orange nonsense to find it.

Steve Jobs talked about creative people having to persuade five layers of management to do what they know is right. I don't know how many layers there are now. But I know what it looks like when the creative people are losing that argument, and I know what it looks like when they're winning it. Right now, on macOS, it looks like both are happening at the same time, in the same release, on the same screen. And that's scarier than any one bad design choice.


Hosting a Snowflake Proxy

In the nightmarish world of 2026 it can be difficult to know how to help at all. There are too many horrors happening to quickly to know where one can inject even a small amount of assistance. However I wanted to quickly post about something I did that was easy, low impact and hopefully helps a tiny fraction of a fraction of a percent of people.

Snowflake

So I was browsing Mastodon when someone posted a link asking for people to host Snowflake proxies. Snowflake is a lightweight proxy best explained by David Fifield below.

So, in summary, Snowflake is a censorship circumvention system, and what that means is, it's a way of enabling network communication between two endpoints despite the presence of some adversary in the middle, a censor in the middle, who's interfering with the communication. Now, that's kind of an abstract, scientifically useful definition of censorship, but this model is, of course, motivated by real-world considerations, actual censorship people encounter in practice. It's security and privacy, but it's also tied up with human rights and freedom of expression, and that's why we do this work. There are a lot of networks in the world—I won't belabor the point—but there are a lot of networks where, you want to read some news, you want to use some app, you want to participate in some discussion group, and you can't. Or you cannot easily, because there's a censor preventing you from doing so. And to give you an idea of the types of things we see in practice, a censor can do stuff like block IP addresses, it can inject RST packets to tear down TCP connections, it can give you false answers to DNS queries, and these are all very commonly seen in practice.
So there are a lot of different circumvention systems, using a variety of different techniques, what is Snowflake's angle? In a nutshell, Snowflake uses a large network of very lightweight, temporary proxies, which we call snowflakes, and they communicate using WebRTC protocols. So when I say "temporary proxies," what I mean by that is that these proxies are allowed to appear and disappear at any time. So the pool is, kind of, constantly changing, and you don't depend on these proxies to be reliable. And WebRTC is a suite of protocols that are often used for real-time communication on the web. So: audio, video, text chat, online games, a lot of these things use WebRTC.
Now we're we're equipped to answer the following two questions. And if you are accustomed, if you're used to censorship research, these answers to these two questions will tell you most of what you need to know to understand what Snowflake is doing. And you'll also understand why these are the two critical questions to ask. If you're not so familiar with this research field, I hope to give you a little bit of familiarity with why these are important questions through the course of this talk. The first questions is: How does Snowflake resist address-based blocking? Well, the answer there is the pool of temporary proxies. It's large, and by "large" you should think, about 100 thousand, and it's not always the same 100 thousand, which is important. Making proxies very easy to run is part of achieving this large proxy pool. The second question is: How does Snowflake resist content-based blocking? Well, that's WebRTC. Rather than transmit client traffic in the clear, we wrap it in an encrypted WebRTC container.
Now, our team started the Snowflake project in order to innovate in the circumvention space, to explore a different combination of parameters and see how well it works. It turns out, it works quite well. But this is more than a research prototype. This has been in serious deployment for three or more years. We serve actual users on an ongoing basis. We have to care about operations, and things like that. It is a built-in circumvention option in Tor Browser, so in Tor Browser you can just choose "Snowflake" from the menu and you'll be using Snowflake. And at any given time we're serving an average of a few tens of thousands of users.

Effectively it is a lightweight and easy to run way to bypass censorship that doesn't require running a VPN and involves almost zero technical knowledge. It's quite the design and one that I kept shaking my head thinking "man I never would have thought of this in a mission years" as I read more about how it works.

So I have a box sitting on an internet connection where I'm lucky enough to have plenty of excess capacity. I figured "why not share it". I thought I'd post the process here in case people were curious but were worried about how much bandwidth it might use or how many resources.

Running Snowflake

Setting it up on a Debian box took like 5 minutes.

  1. Get the package from here: https://packages.debian.org/sid/snowflake-proxy with: sudo apt install snowflake-proxy
  2. Make sure it is enabled and running
● snowflake-proxy.service - snowflake-proxy
     Loaded: loaded (/usr/lib/systemd/system/snowflake-proxy.service; enabled; preset: enabled)
     Active: active (running) since Thu 2026-03-05 14:40:10 UTC; 2 weeks 4 days ago
 Invocation: 06bacb9a1d164c73a02eaf1873d483d2
       Docs: man:snowflake-proxy
             https://snowflake.torproject.org/
   Main PID: 386999 (snowflake-proxy)
      Tasks: 8 (limit: 4459)
     Memory: 715.6M (peak: 817.3M)
        CPU: 8h 32min 11.845s
     CGroup: /system.slice/snowflake-proxy.service
             └─386999 /usr/bin/snowflake-proxy

That's it.

So this has been running for two weeks and in that two weeks I've served up the following amount of traffic:

Upload: 91.81 GB
Download: 7.87 GB
Total: 99.68 GB

CPU usage is quite low, Memory is slightly higher than I would have thought but that's likely a function of running for so long. Remember you can modify the systemd service file to limit memory if you are interested in running this yourself but are concerned about crossing a gig of memory.

[Service]
MemoryMax=512M

All in all I haven't noticed I'm running this at all. Obviously its great to run the browser extension to increase the pool of IP addresses and keep them from becoming static and blockable, but if you have a dedicated box with a large amount of bandwidth and are looking for a quick 20 minute project to help out people trying to deal with internet censorship, this seems like a good one to me.


Markdown Ate The World

I have always enjoyed the act of typing words and seeing them come up on screen. While my favorite word processor of all time might be WordPerfect (here), I've used almost all of them. These programs were what sold me on the entire value proposition of computers. They were like typewriters, which I had used in school, except easier in every single way. You could delete things. You could move paragraphs around. It felt like cheating, and I loved it.

As time has gone up what makes up a "document" in word processing has increased in complexity. This grew as word processors moved on from being proxies for typewriters and into something closer to a publishing suite. In the beginning programs like WordPerfect, WordStar, MultiMate, etc had flat binary files with proprietary formatting codes embedded in there.

When word processors were just proxies for typewriters, this made a lot of sense. But as Microsoft Word took off in popularity and quickly established itself as the dominant word processor, we saw the rise of the .doc file format. This was an exponential increase in complexity from what came before, which made sense because suddenly word processors were becoming "everything tools" — not just typing, but layout, images, revision tracking, embedded objects, and whatever else Microsoft could cram in there.

The .doc: A Filesystem Inside Your File

At its base the .doc is a Compound File Binary Format, which is effectively just a FAT file system with the file broken into sectors that are chained together with a File Allocation Table.

It's an interesting design. A normal file system would end up with sort of a mess of files to try and contain everything that the .doc has, but if you store all of that inside of a simplified file system contained within one file then you could optimize for performance and reduced the overhead that comes with storing separate objects in a flat file. It also optimizes writes, because you don't need to rewrite the entire file when you add an object and it keeps it simple to keep revision history. But from a user perspective, they're "just" dealing with a single file. (Reference)

The .doc exploded and quickly became the default file format for humanity's written output. School papers, office memos, résumés, the Great American Novel your uncle was definitely going to finish — all .doc files. But there was a problem with these files.

They would become corrupted all of the goddamn time.

Remember, these were critical documents traveling from spinning rust drives on machines that crashed constantly compared to modern computers, often copied to floppy disks or later to cheap thumb drives you got from random vendor giveaways at conferences, and then carried to other computers in backpacks and coat pockets. The entire workflow had the structural integrity of a sandwich bag full of soup.

Your hard drive filesysystem -> your .doc file (which can get corrupted as a file) -> containers a mini filesystem (which can ALSO get corrupted internally) -> manages your actual document content

So when Word was saving your critical file, it was actually doing a bunch of different operations. It was:

  • Updating the document stream (your text)
  • Updating the formatting tables
  • Update the sector allocation tables
  • Update the directory entries
  • Update summary information
  • Flush everything to disk

These weren't atomic operations so it was super easy in an era when computers constantly crashed or had problems to end up in a situation where some structures were updated and others weren't. Compared to like a .txt file where you would either get the old version or a truncated new version. You might lose content, but you almost never ended up with an unreadable file. With .doc as someone doing like helpdesk IT, you constantly ended up with people that had just corrupted unreadable files.

And here's the part that really twisted the knife: the longer you worked on the same file, the more important that file likely was. But Word didn't clean up after itself. As a .doc accumulated images, tracked changes, and revision history, the internal structure grew more complex and the file got larger. But even when you deleted content from the document, the data wasn't actually removed from the file. It was marked as free space internally but left sitting there, like furniture you moved to the curb that nobody ever picked up.

The file bloated. The internal fragmentation worsened. And the probability of corruption increased in direct proportion to how much you cared about the contents.

Users had to be trained both to save the file often (as AutoRecover wasn't reliable enough) and to periodically "Save As" a new file to force Word to write a clean version from scratch. This was the digital equivalent of being told that your car works fine, you just need to rebuild the engine every 500 miles as routine maintenance.

The end result was that Microsoft Word quickly developed a reputation among technical people as horrible to work with. Not because it was a bad word processor — it was actually quite good at the word processing part — but because when a user showed up at the Help Desk with tears in their eyes, the tools I had to help them were mostly useless.

I could scan the raw file for text patterns, which often pulled out the content, but without formatting it wasn't really a recovered file — it was more like finding your belongings scattered across a field after a tornado. Technically your stuff, but not in any useful arrangement. Sometimes you could rebuild the FAT or try alternative directory entries to recover slightly older versions. But in general, if the .doc encountered a structural error, the thing was toast and your work was gone forever.

This led to a never-ending series of helpdesk sessions where I had to explain to people that yes, I understood they had worked on this file for months, but it was gone and nobody could help them. I became a grief counselor who happened to know about filesystems. Thankfully, people quickly learned to obsessively copy their files to multiple locations with different names — thesis_final.doc, thesis_final_v2.doc, thesis_FINAL_FINAL_REAL.doc — but this required getting burned at least once, which is sort of like saying you learned your car's brakes didn't work by driving into a bus.

Enter the XML Revolution

So around 2007 we see the shift from .doc to .docx, which introduces a lot of hard lessons from the problems of .doc. First, it's just a bundle, specifically a ZIP archive.

my_document.docx (renamed to .zip)
├── [Content_Types].xml
├── _rels/
│   └── .rels
├── word/
│   ├── document.xml        ← the actual text content
│   ├── styles.xml          ← formatting/styles
│   ├── fontTable.xml
│   ├── settings.xml
│   └── media/
│       ├── image1.png      ← embedded images
│       └── image2.jpg
└── docProps/
    ├── app.xml
    └── core.xml            ← metadata

Now in theory, this is great. Your content is human-readable XML. Your images are just image files. If something goes wrong, you can rename the file to .zip, extract it, and at least recover your text by opening document.xml in Notepad. The days of staring at an opaque binary blob and praying were supposed to be over.

However, in practice, something terrible happened. Microsoft somehow managed to produce the worst XML to ever exist in human history.

Let me lay down the scope of this complexity, because I have never seen anything like it in my life.

Here is the standards website for ECMA-376. Now you know you are in trouble when you see a 4 part download that looks like the following:

  • Part 1 “Fundamentals And Markup Language Reference”, 5th edition, December 2016
  • Part 2 “Open Packaging Conventions”, 5th edition, December 2021
  • Part 3 “Markup Compatibility and Extensibility”, 5th edition, December 2015
  • Part 4 “Transitional Migration Features”, 5th edition, December 2016

If you download Part 1, you are given the following:

Now if you open that PDF, get ready for it. It's a 5039 page PDF.

I have never conceived of something this complicated. It's also functionally unreadable, and I say this as someone who has, on multiple occasions in his life, read a car repair manual cover to cover because I didn't have anything else to do. I once read the Haynes manual for a 1994 Honda Civic like it was a beach novel. This is not that. This is what happens when a standards committee gets a catering budget and no deadline.

Some light reading before bed

There was an accusation at the time that Microsoft was making OOXML deliberately more complicated than it needed to be — that the goal was to claim it was an "open standard" while making the standard so incomprehensibly vast that it would take a heroic effort for anyone else to implement it. I think this is unquestionably true. LibreOffice has a great blog post on it that includes this striking comparison:

The difference in complexity between the document.xml and content.xml files is striking when you compare their lengths: the content.xml file has 6,802 lines, while the document.xml file has 60,245 lines, compared to a text document of 5,566 lines.

So the difference between ODF format and the OOXML format results in a exponentially less complicated XML file. Either you could do the incredible amount of work to become compatible with this nightmarish specification or you could effectively find yourself cut out of the entire word processing ecosystem.

Now without question this was done by Microsoft in order to have their cake and eat it too. They would be able to tell regulators and customers that this wasn't a proprietary format and that nobody was locked into the Microsoft Office ecosystem for the production of documents, which had started to become a concern among non-US countries that now all of their government documents and records were effectively locked into using Microsoft. However the somewhat ironic thing is it ended up not mattering that much because soon the only desktop application that would matter is the browser.

Rise of Markdown

The file formats of word processors were their own problems, but more fundamentally the nature of how people consumed content was changing. Desktop based applications became less and less important post 2010 and users got increasingly more frustrated with the incredibly clunky way of working with Microsoft Word and all traditional files with emailing them back and forth endlessly or working with file shares.

So while .docx was a superior format from the perspective of "opening the file and it becoming corrupted", it also was fundamentally incompatible with the smartphone era. Even though you could open these files, soon the expectation was that whatever content you wanted people to consume should be viewable through a browser.

As "working for a software company" went from being a niche profession to being something that seemingly everyone you met did, the defacto platform for issues, tracking progress, discussions, etc moved to GitHub. This was where I (and many others) first encountered Markdown and started using it on a regular basis.

John Gruber, co-creator of Markdown, has a great breakdown of "standard" Markdown and then there are specific flavors that have branched off over time. You can see that here. The important part though is: it lets you very quickly generate webpages that work on every browser on the planet with almost no memorization and (for the most part) the same thing works in GitHub, on Slack, in Confluence, etc. You no longer had to ponder whether the person you were sending to had the right license to see the thing you were writing in the correct format.

This combined with the rise of Google Workspace with Google Docs, Slides, etc meant your technical staff were having conversations through Markdown pages and your less technical staff were operating entirely in the cloud. Google was better than Microsoft at the sort of stuff Word had always been used for, which is tracking revisions, handling feedback, sharing securely, etc. It had a small subset of the total features but as we all learned, nobody knew about the more advanced features of Word anyway.

By 2015 the writing was on the wall. Companies stopped giving me an Office license by default, switching them to "you can request a license". This, to anyone who has ever worked for a large company, is the kiss of death. If I cannot be certain that you can successfully open the file I'm working on, there is absolutely no point in writing it inside of that platform. Combine that with the corporate death of email and replacing it with Slack/Teams, the entire workflow died without a lot of fanfare.

Then with the rise of LLMs and their use (perhaps overuse) of Markdown, we've reached peak .md. Markdown is the format of our help docs, many of our websites are generated exclusively from Markdown. It's now the most common format that I write anything in. This was originally written in Markdown inside of Vim.

Why It Won

There's a lot of reasons why I think Markdown ended up winning, in no small part because it solved a real problem in an easy to understand way. Writing HTML is miserable and overkill for most tasks, this removed the need to do that and your output was consumable in a universal and highly performant way that required nothing of your users except access to a web browser.

But I also think it demonstrates an interesting lesson about formats. .doc and .docx along with ODF are pretty highly specialized things designed to handle the complexity of what modern word processing can do. LibreOffice lets you do some pretty incredible things that cover a huge range of possible needs.

Markdown doesn't do most of what those formats do. You can't set margins. You can't do columns. You can't embed a pivot table or track changes or add a watermark that says DRAFT across every page in 45-degree gray Calibri. Markdown doesn't even have a native way to change the font color.

And none of that mattered, because it turns out most writing isn't about any of those things. Most writing is about getting words down in a structure that makes sense, and then getting those words in front of other people. Markdown does that with less friction than anything else ever created. You can learn it in ten minutes, write it in any text editor on any device, read the source file without rendering it, diff it in version control, and convert it to virtually any output format.

The files are plain text. They will outlive every application that currently renders them. They don't belong to any company. They can't become corrupted in any meaningful way — the worst thing that can happen to a Markdown file is you lose some characters, and even then the rest of the file is fine. After decades of nursing .doc files like they were delicate flowers that you had to transport home strapped to your car roof, the idea of a format that simply cannot structurally fail is not just convenient. It's a kind of liberation.

I think about this sometimes when I'm writing in Vim at midnight, just me and a blinking cursor and a plain text file that will still be readable when I'm dead. No filesystem-within-a-filesystem. No sector allocation tables. No 5,039-page specification. Just words, a few hash marks, and never having to think about it again.


Update to the Ghost theme that powers this site

I added a few modifications to the OSS Ghost theme that powers this site. You can get it here: https://gitlab.com/matdevdug/minimal-ghost-theme

  • Added better image caption support.
  • Added the cool Mastodon feature outlined here to attribute posts from your site back to your Mastodon username by following the instructions here.

I tried to make it pretty easy to customize, but if you need something changed feel free to open an issue on the repo. Thanks for all the feedback!


Boy I was wrong about the Fediverse

Boy I was wrong about the Fediverse

I have never been an "online community first" person. The internet is how I stay in touch with people I met in real life. I'm not a "tweet comments at celebrities" guy. I was never funny enough to be the funniest person on Twitter.

So when Twitter was accidentally purchased by a fascist high on ketamine, I moved to Mastodon mostly because it seemed to be “Twitter without the bullshit”. No recommended for you feed, no ads, it was broken in a way I find charming. Of course search was broken because all OSS social tools must have one glaring lack of functionality. In a nightmare world full of constant change it’s good to have a few constants to hold on to.

A lot of the narrative at the time was “this is our flag in the ground in the fight against The Man”. It wasn’t clear in this context if they meant corporations or the media or the weird pseudo celebrity that had taken over social media where people would breathlessly tell me about shit like “Chris-Chan” and “Logan Paul bought a Pokemon card”.

We all need pointless hobbies, but I care about YouTube stars like I care about distant stars dying. It’s interesting to someone somewhere but those people don’t talk to me. I mostly use social media as a place to waste time, not a platform to form para-social relationships to narcissists. I prefer my narcissism farm to table. I’d rather dig a grave with a rusty spoon than watch a Twitch “star”.

Anyway, I watched mostly apathetically as the internet tried to rally itself to another cause. I read my news at the normal newspapers, watched my normal television and put social media off into its own silo. Then Trump effectively shut down the entire free press in the US in a series of bullshit lawsuits.

See I had forgotten the one golden rule of capitalism. To thrive in capitalism one must be amoral. Now you can be wildly sickeningly successful with morals but you cannot reach that absolute zenith of shareholder value. Either you accept a lower share price and don’t commit atrocities or you become evil. There is no third option.

So of course media corporations became bargaining chips for the oligarchs' actual businesses. Why fight a defamation suit when you can settle it by running favorable coverage and maybe bankrupting the media outlet you bought as a stocking stuffer? Suddenly I couldn’t find any reliable reporting about anything in the US. My beloved Washington Post became straight-up propaganda and desperate attempts to cope. "Best winter stews to make while you watch your neighbors get kidnapped at gunpoint." Twelve dollars a month for that.

Threads was worthless because it’s the most boring social media website ever imagined. It’s a social media network designed by brands for brands, like if someone made a cable channel that was just advertisements and meta commentary about the advertisements you just saw. Billions of dollars at their disposal and Meta made a hot new social media network with the appeal of junk mail.

Bluesky had a bunch of “stuff” but they’re trying to capture that 2008 Twitter lightning in a bottle which is a giant waste of time. We’re never going to go back to pretending that tweeting at politicians does anything and everyone there is desperately trying to build a “brand” as the funny one or whatever. I want news I don’t want your endless meta commentary on the news.

People talk a lot about the protocols that power Bluesky vs. ActivityPub, because we're nerds and we believe deep in our hearts that the superior protocol will win. This is adorable. It flies in the face of literally all of human history, where the more convenient thing always wins regardless of technical merit. VHS beat Betamax. USB-C took twenty years. The protocol fight is interesting the way medieval siege warfare is interesting — I'm glad someone's into it, but it has no bearing on my life. There's no actual plan to self-host Bluesky. Their protocol makes it easier to scale their service. That's why it was written and that's what it does. End of story.

Now EU news remained reliable, but sending European reporters into the madness of the US and trying to get a “report” out of it is an exercise in frustration. This became especially relevant for me when Trump threatened to invade Greenland and suddenly there was a distinct possibility that there might be an armed conflict between Denmark and the US. Danish reporters weren’t getting meetings with the right people and it was just endless rumors and Truth Social nonsense.

If the American press had given me 20 minutes of airtime I could have convinced everyone they don’t want to get involved with Greenland. We’re not tough enough as a people to survive in Greenland, much less “take it over”. Greenlandic people shrug off horrific injuries hundreds of kilometers from medical help with a smile. I watched a Greenlandic toddler munch meat from the spine of a seal with its head very much intact. We aren’t equipped to fuck with these people, they are the real deal.

So in this complete breakdown of the press came in the Fediverse. It became the only reliable source of information I had. People posted links with a minimal amount of commentary, picking and choosing the best content from other social media networks. They’re not doing it to “build a brand” because that’s not a thing in the Fediverse. It’s too disjointed to be a place to build a newsletter subscription base.

Instead it became the only place consistently posting trustworthy information I could actually access. This became personally relevant when Trump threatened to invade Greenland, which is the kind of sentence I never expected to type and yet here we are. It would be funny if I wasn't a tiny bit concerned that my new home was going to get a CIA overnight regime change special in the middle of the night.

It was somewhere in the middle of DMing with someone who had forgotten more about Greenland than I would ever know and someone who lived close to an RAF base in the UK that it clicked. This was what they had been talking about. Actual human beings were able to find each other and ask direct questions without this giant mountain of bullshit engagement piled on top of it. Meta or Oracle or whoever owns TikTok this week couldn't stop me.

I never expected to find my news from strangers on a federated social network that half the internet has never heard of. I never expected a lot of things. But there's something quietly beautiful about a place where people just... share what they know. No brand deals, no engagement metrics, no algorithm nudging you toward rage. Just someone who spent twenty years studying Arctic policy posting a thread at 2 AM because they think you should understand what's happening. It's the internet I was promised in 1996. It only took thirty years and the complete collapse of American journalism to get here.

Find me at: https://c.im/@matdevdug


I Sold Out for $20 a Month and All I Got Was This Perfectly Generated Terraform

Until recently the LLM tools I’ve tried have been, to be frank, worthless. Copilot was best at writing extremely verbose comments. Gemini would turn a 200 line script into a 700 line collection of gibberish. It was easy for me to, more or less, ignore LLMs for being the same over-hyped nonsense as the Metaverse and NFTs.

This is great for me because I understand that LLMs represent a massive shift in power from an already weakened worker class to an increasingly monarch-level wealthy class. By stealing all human knowledge and paying nothing for it, then selling the output of that knowledge, LLMs are an impossibly unethical tool. So if the energy wasting tool of the tech executive class is also a terrible tool, easy choice.

Like boycotting Tesla for being owned by an evil person and also being crappy overpriced cars, or not shopping at Hobby Lobby and just buying directly from their Chinese suppliers, the best boycotts are ones where you aren’t really losing much. Google can continue to choke out independent websites with their AI results that aren’t very good and I get to feel superior doing what I was going to do anyway by not using Google search.

This logic was all super straight forward right up until I tried Claude Code. Then it all got much more complicated.

Some Harsh Truths

Let’s just get this out of the way right off the bat. I didn't want to like Claude Code. I got a subscription with the purpose of writing a review on it where I would find that it was just as terrible as Gemini and Copilot. Except that's not what happened.

Instead it was like discovering the 2AM kebab place might actually make the best pizza in town. I kept asking Claude to do annoying tasks where it was easy for me to tell if it had made a mistake and it kept doing them correctly. It felt impossible but the proof was right in front of me.

I’ve written tens of thousands of lines of Terraform in my life. It is a miserable chore to endlessly flip back and forth between the provider documentation and Vim, adding all the required parameters. I don’t learn anything by doing it, it’s just a grind I have to push through to get back to the meaningful work.

The amount of time I have wasted on this precious time on Earth importing all of a companies DNS records into Terraform, then taking the autogenerated names and organizing them so that they make sense for the business is difficult to express. It's like if the only way I knew how to make a hamburger bun was to carefully put every sesame seed by hand on the top only to stumble upon an 8 pack of buns for $4 at the grocery store after years of using tiny tweezers to put the seeds in exactly the right spot.

I feel the same way about writing robust READMEs, k8s YAML and reorganizing the file structure of projects. Setting up more GitHub Actions is as much fun as doing my taxes. If I never had to write another regex for the rest of my life, that would be a better life by every conceivable measure.

These are tasks that sap my enthusiasm for this type of work, not feed it. I’m not sad to offload them and switch to mostly reviewing its PRs.

But the tool being useful doesn’t remove what’s bad about it. This is where a lot of pro-LLM people start to delude themselves.

Pro-LLM Arguments

In no particular order are the arguments I keep seeing about LLMs from people who want to keep using them for why their use is fine.

IP/Copyright Isn’t Real

This is the most common one I see and the worst. It can be condensed down to “because most things on the internet originally existed to find pornography and/or pirate movies, stealing all content on the internet is actually fine because programmers don’t care about copyright”.

You also can’t have it both ways. OpenAI can’t decide to enforce NDAs and trademarks and then also declare law is meaningless. If I don’t get to launch a webmail service named Gmail+ then Google doesn’t get to steal all the books in human existence.

The argument basically boils down to: because we all pirated music in 2004, intellectual property is a fiction when it stands in the way of technology. By this logic I shoplifted a Snickers bar when I was 12 so property rights don't exist and I should be allowed to live in your house.

Code Quality Doesn't Matter (According to Someone Who Might Be Right)

I have an internet friend I met years ago playing EVE Online that is a brutally pragmatic person. To someone like him, code craftsmanship is a joke. For those of you who are unaware, EVE Online is the spaceship videogame where sociopaths spend months plotting against each other.

His approach to development is 80% refining requirements and getting feedback. He doesn’t care at all about DRY, he uses Node because then he can focus on just JavaScript, he doesn’t invest a second into optimization until the application hits a hard wall that absolutely requires it. His biggest source of clients? Creating fast full stacks because internal teams are missing deadlines. And he is booked up for at least 12 months out all the time because he hits deadlines.

When he started freelancing I thought he was crazy. Who was going to hire this band of Eastern European programmers who chain smoke during calls and whose motto is basically "we never miss a deadline". As it turns out, a lot of people.

Why doesn't he care?

Why doesn't he care about these things? He believes that programmers fundamentally don't understand the business they are in. "Code is perishable" is something he says a lot and he means it. Most of the things we all associate with quality (full test coverage, dependency management, etc) are programmers not understanding the rate of churn a project undergoes over its lifespan. The job of a programmer, according to him, is delivering features that people will use. How pleasant and well-organized that code is to work with is not really a thing that matters in the long term.

He doesn't see LLM-generated code as a problem because he's not building software with a vision that it will still be used in 10 years. Most of the stuff typically associated with quality he, more or less, throws in the trash. He built a pretty large stack for a automotive company and my jaw must have hit the table when he revealed they're deploying m6g.4xlarge for a NodeJS full-stack application. "That seems large to me for that type of application" was my response.

He was like "yeah but all I care about are whether the user metrics show high success rate and high performance for the clients". It's $7000 a year for the servers, with two behind a load balancer. That's absolutely nothing when compared with the costs of what having a team of engineers tune it would cost and it means he can run laps around the internal teams who are, basically, his greatest competition.

To be clear, he is very technically competent. He simply rejects a lot of the conventional wisdom out there about what one has to do in order to make stuff. He focuses on features, then securing endpoints and more or less gives up on the rest of it. For someone like this, LLMs are a logical choice for him.

Why This Argument Doesn't Work for Me

The annoying thing about my friend is that his bank account suggests he's right. But I can't get there. If I'm writing a simple script or something as a one-off, it can sometimes feel like we're all wasting the companies time when we have a long back and forth on the PR discussing comments or the linting or whatever. So it's not that this idea is entirely wrong.

But the problem with programming is you never know what is going to be "the core" of your work life for the next 5 years. Sometimes I write a feature, we push it out, it explodes in popularity and then I'm a little bit in trouble because I built a MVP and now it's a load-bearing revenue generating thing that has to be retooled.

I also just have trouble with the idea that this is my career and the thing I spend my limited time on earth doing and the quality of it doesn't matter. I delight in craftsmanship when I encounter it in almost any discipline. I love it when you walk into an old house and see all the hand crafted details everywhere that don't make economic sense but still look beautiful. I adore when someone has carefully selected the perfect font to match something.

Every programmer has that library or tool that they aspire to. That code base where you delight at looking at it because it proves perfection is possible even if you have never come close to reaching that level. For me its always been looking through the source code of SQLite that restores my confidence. I might not know what I'm doing but it's good to be reminded that someone out there does.

Not everything I make is that great, but the concept of "well great doesn't matter at all" effectively boils down to "don't take pride in your work" which is probably the better economic argument but feels super bad to me. In a world full of cheap crap, it feels bad to make more of it and then stick my name on it.

So Why Are People Still Using Them?

The best argument for why programmers should be using LLMs is because it's going to be increasingly difficult to compete for jobs and promotions against people who are using them. In my experience Claude Code allows me to do two tasks at once. That's a pretty hard advantage to overcome.

Last Tuesday I had Claude Code write a GitHub Action for me while I worked on something else. When it was done, I reviewed it, approved it, and merged it. It was fine. It was better than fine, actually — it was exactly what I would have written, minus the forty-five minutes of resentment. I sat there for a moment, staring at the merged PR, feeling the way I imagine people feel when they hire a cleaning service for the first time: relieved, and then immediately guilty about the relief, and then annoyed at myself for feeling guilty about something that is, by any rational measure, a completely reasonable thing to do. Except it isn't reasonable. Or maybe it is. I genuinely don't know anymore, and that's the part that bothers me the most — not that the tool works, but that I've lost the clean certainty that it shouldn't.

So now I'm paying $20 a month to a company that scraped the collective knowledge of humanity without asking so that I can avoid writing Kubernetes YAML. I know what that makes me. I just haven't figured out a word for it yet that I can live with.

When I asked my EVE friend about it on a recent TeamSpeak session, he was quiet for awhile. I thought that maybe my moral dilemma had shocked him into silence. Then he said, "You know what the difference is between you and me? I know I'm a mercenary. You thought you were an artist. We're both guys who type for money."

I couldn't think of a clever response to that. I still can't.


The Small Web is Tricky to Find

One of the most common requests I've gotten from users of my little Firefox extension(https://timewasterpro.xyz) has been more options around the categories of websites that you get returned. This required me to go through and parse the website information to attempt to put them into different categories. I tried a bunch of different approaches but ended up basically looking at the websites themselves seeing if there was anything that looked like a tag or a hint on each site.

This is the end conclusion of my effort at putting stuff into categories.

Unknown just means I wasn't able to get any sort of data about it. This is the result of me combining Ghost, Wordpress and Kagi Small Web data sources.

Interestingly one of my most common requests is "I would like less technical content" which as it turns out is tricky to provide because it's pretty hard to find. They sorta exist but for less technical users they don't seem to have bought into the value of the small web own your own web domain (or if they have, I haven't been able to figure out a reliable way to find them).

This is an interesting problem, especially because a lot of the tools I would have previously used to solve this problem are....basically broken. It's difficult for me to really use Google web search to find anything at this point even remotely like "give me all the small websites" because everything is weighted to steer me away from that towards Reddit. So anything that might be a little niche is tricky to figure out.

Interesting findings

So there's no point in building a web extension with a weighting algorithm to return less technical content if I cannot find a big enough pool of non-technical content to surface. It isn't that these sites don't exist its just that we never really figured out a way to reliably surface "what is a small website".

So from a technical perspective I have a bunch of problems.

  • First I need to reliably sort websites into a genre, which can be a challenge when we're talking about small websites because people typically write about whatever moves them that day. Most of the content on a site might be technical, but some of it might not be. Big sites tend to be more precise with their SEO settings but small sites that don't care don't do that, so I have fewer reliable signals to work with.
  • Then I need to come up with a lot of different feeding systems for independent websites. The Kagi Small Web was a good starting point, but Wordpress and Ghost websites have a much higher ratio of non-technical content. I need those sites, but it's hard to find a big batch of them reliably.
  • Once I have the type of website as a general genre and I have a series of locations, then I can start to reliably distribute the types of content you get.

I think I can solve....some of these, but the more I work on the problem the more I'm realizing that the entire concept of "the small web" had a series of pretty serious problems.

  • Google was the only place on Earth sending any traffic there
  • Because Google was the only one who knew about it, there never needed to be another distribution system
  • Now that Google is broken, it's almost impossible to recreate that magic of becoming the top of list for a specific subgenre without a ton more information than I can get from public records.


GitButler CLI Is Really Good

My workflow has remained mostly the same for over a decade. I write everything in Vim using the configuration found here. I run Vim from inside of tmux with a configuration found here. I write things on a git branch, made with the git CLI, then I add them with git add --patch to that branch, trying to run all of the possible linting and tests with git hooks before I waste my time on GitHub Actions. Then I run git up which is an alias to pull --rebase --autostash. Finally I successfully commit, then I copy paste the URL returned by GitHub to open a PR. Then I merge the PR and run git ma to go back to the primary branch, which is an alias to ma = "!f() {git checkout $(git primary) &&git pull;}; f".

This workflow, I think, is pretty familiar for anyone working with GitHub a lot. Now you'll notice I'm not saying git because almost nothing I'm doing has anything to do with git. There's no advantage to my repo being local to my machine, because everything I need to actually merge and deploy code lives on GitHub. The CI runs there, the approval process runs there, the monitoring of the CI happens there, the injection of secrets happens there. If GitHub is down my local repo does, effectively, nothing.

My source of truth is always remote, which means I pay the price for git complexity locally but I don't benefit from it. At most jobs:

  • You can't merge without GitHub (PRs are the merge mechanism)
  • You can't deploy without GitHub (Actions is the deployment trigger)
  • You can't get approval without GitHub (code review lives there)
  • Your commits are essentially "drafts" until they exist on GitHub

This means the following is also true:

  • You never work disconnected intentionally
  • You don't use local branches as long-lived divergent histories
  • You don't merge locally between branches (GitHub PRs handle this)
  • You don't use git log for archaeology — you use GitHub's blame/history UI (I often use git log personally but I have determined I'm in the minority on this).

Almost all the features of git are wasted on me in this flow. Now because this tool serves a million purposes and is designed to operate in a way that almost nobody uses it for, we all pay the complexity price of git and never reap any of the benefits. So instead I keep having to add more aliases to paper over the shortcomings of git.

These are all the aliases I use at least once a week.

[alias]
  up = pull --rebase --autostash
  l = log --pretty=oneline -n 20 --graph --abbrev-commit
  # View the current working tree status using the short format
  s = status -s
  p = !"git pull; git submodule foreach git pull origin master"
  ca = !git add -A && git commit -av
  # Switch to a branch, creating it if necessary
  go = "!f() { git checkout -b \"$1\" 2> /dev/null || git checkout \"$1\"; }; f"
  # Show verbose output about tags, branches or remotes
  tags = tag -l
  branches = branch -a
  remotes = remote -v
  dm = "!git branch --merged | grep -v '\\*' | xargs -n 1 git branch -d"
  contributors = shortlog --summary --numbered
  st = status
  primary = "!f() { \
    git branch -a | \
    sed -n -E -e '/remotes.origin.ma(in|ster)$/s@remotes/origin/@@p'; \
  }; f"
  # Switch to main or master, whichever exists, and update it.
  ma = "!f() { \
    git checkout $(git primary) && \
    git pull; \
  }; f"
  mma = "!f() { \
    git ma && \
    git pull upstream $(git primary) --ff-only && \
    git push; \
  }; f"

Enter GitButler CLI

Git's offline-first design creates friction for online-first workflows, and GitButler CLI eliminates that friction by being honest about how we actually work.

(Edit: I forgot to add this disclaimer. I am not, nor have ever been an employee/investor/best friends with anyone from GitButler. They don't care that I've written this and I didn't communicate with anyone from that team before I wrote this.)

So let's take the most basic command as an example. This is my flow that I do 2-3 times a day without my aliases.

git checkout main
git pull
git checkout -b my-feature
# or if you're already on a branch:
git pull --rebase --autostash 
git status

I do this because git can't make assumptions about the state of the world.

  • Your local repo might be offline for days or weeks
  • The "remote" might be someone else's laptop, not a central server
  • Divergent histories are expected and merging is a deliberate, considered act

However because GitButler is designed with the assumption that I'm working online, we can skip a lot of this nonsense.

It's status command understands that there is always a remote main that I care about and that when I run a status that I need to understand my status relative to the remote main as it exists right now. Not how it existed the last time I remembered to pull.

However this is far from the best trick it has up its sleeve.

Parallel Branches: The Problem Git Can't Solve

You're working on a feature, notice an unrelated bug, and now you have to stash, checkout, fix, commit, push, checkout back, stash pop. Context switching is expensive and error-prone.

GitButler effectively hacks a solution into git that fixes this with multiple branches applied simultaneously. Assign files to different branches without leaving your workspace. What do I mean by that. Let's start again with my status

Great looks good. Alright so lets say I make 2 new branches. I'm working on a new feature for adding auth and while I'm working on that, I see a typo I need to fix in a YAML.

I can work on both things at the same time:

but stage istar_metrics_text.py feature-auth
but stage example.txt bugfix-typo

And easily commit to both at the same time without doing anything weird.

Stacked PRs Without the Rebase Nightmare

Stacked PRs are the "right" way to break up large changes so people on your team don't throw up at being asked to review 2000 lines, but Git makes them miserable. When the base branch gets feedback, you have to rebase every dependent branch, resolve conflicts, force-push, and pray. Git doesn't understand branch dependencies. It treats every branch as independent, so you have to manually maintain the stack.

GitButler solves this problem with First-class stacked branches. The dependency is explicit, and updates propagate automatically.

So what do I mean. Let's say I make a new API endpoint in some Django app. First I make the branch.

but branch new api-endpoints
# Then add my stuff to it
but commit -m "add REST endpoints" api-endpoints
# Create a stacked branch on top
but branch new --anchor api-endpoints api-tests

So let's say I'm working on the api-endpoints branch and get some good feedback on my PR. It's easy to resolve the comments there while leaving my api-tests branched off this api-endpoints as a stacked thing that understands the relationship back to the first branch as shown here.

In practice this is just a much nicer way of dealing with a super common workflow.

Easy Undo

Maybe the most requested feature from new git users I encounter is an easier undo. When you mess up in Git, recovery means diving into git reflog, understanding the cryptic output, and hoping you pick the right HEAD@{n}. One wrong move and you've made it worse.

GitButlers oplog is just easier to use. So the basic undo functionality is super simple to understand.

but undo rolls me back one operation.

To me the mental model of a snapshot makes a lot more sense than the git history model. I do an action, I want to undo that action. This is better than the git option of:

git log --oneline                 # figure out what you committed
git reset --soft HEAD~1           # undo commit, keep changes staged
git stash                         # stash the changes
git checkout correct-branch       # switch branches
git stash pop                     # restore changes (hope no conflict)
git add .                         # re-stage
git commit -m "message"           # recommit

Very exciting tool

I've been using GitButler in my daily work since I got the email that the CLI was available and I've really loved it. I'm a huge fan of what this team is doing to effectively remodel and simplify Git operations in a world where almost nobody is using it in the way the tool was originally imagined to be used. I strongly encourage folks go check it out for free at: https://docs.gitbutler.com/cli-guides/cli-tutorial/tutorial-overview. It does a ton of things (like help you manage PRs) that I didn't even touch on here.

Let me know if you find something cool that I forgot at: https://c.im/@matdevdug