matduggan.com

Making RSS More Fun

December 02, 2025

I don't like RSS readers. I know, this is blasphemous especially on a website where I'm actively encouraging you to subscribe through RSS. As someone writing stuff, RSS is great for me. I don't have to think about it, the requests are pretty light weight, I don't need to think about your personal data or what client you are using. So as a protocol RSS is great, no notes.

However as something I'm going to consume, it's frankly a giant chore. I feel pressured by RSS readers, where there is this endlessly growing backlog of things I haven't read. I rarely want to read all of a websites content from beginning to end, instead I like to jump between them. I also don't really care if the content is chronological, like an old post about something interesting isn't less compelling to me than a newer post.

What I want, as a user experience, is something akin to TikTok. The whole appeal of TikTok, for those who haven't wasted hours of their lives on it, is that I get served content based on an algorithm that determines what I might think is useful or fun. However what I would like is to go through content from random small websites. I want to sit somewhere and passively consume random small creators content, then upvote some of that content and the service should show that more often to other users. That's it. No advertising, no collecting tons of user data about me, just a very simple "I have 15 minutes to kill before the next meeting, show me some random stuff."

In this case the "algorithm" is pretty simple: if more people like a thing, more people see it. But with Google on its way to replacing search results with LLM generated content, I just wanted to have something that let me play around with the small web the way that I used to.

There actually used to be a service like this called StumbleUpon which was more focused on pushing users towards popular sites. It has been taken down, presumably because there was no money in a browser plugin that sent users to other websites whose advertising you didn't control.

TL;DR

You can go download the Firefox extension now and try this out and skip the rest of this if you want. https://timewasterpro.xyz/ If you hate it or find problems, let me know on Mastodon. https://c.im/@matdevdug

Functionality

So I wanted to do something pretty basic. You hit a button, get served a new website. If you like the website, upvote it, otherwise downvote it. If you think it has objectionable content then hit report. You have to make an account (because I couldn't think of another way to do it) and then if you submit links and other people like it, you climb a Leaderboard.

On the backend I want to (very slowly so I don't cost anyone a bunch of money) crawl a bunch of RSS feeds, stick the pages in a database and then serve them up to users. Then I want to track what sites get upvotes and return those more often to other users so that "high quality" content shows up more often. "High quality" would be defined by the community or just me if I'm the only user.

It's pretty basic stuff, most of it copied from tutorials scattered around the Internet. However I really want to drive home to users that this is not a Serious Thing. I'm not a company, this isn't a new social media network, there are no plans to "grow" this concept beyond the original idea unless people smarter than me ping with me ideas. So I found this amazing CSS library: https://sakofchit.github.io/system.css/

The Apple's System OS design from the late-80s to the early 90s was one of my personal favorites and I think would send a strong signal to a user that this is not a professional, modern service.

Great, the basic layout works. Let's move on!

Backend

So I ended up doing FastAPI because it's very easy to write. I didn't want to spend a ton of time writing the API because I doubt I nailed the API design on the first round. I use sqlalchemy for the database. The basic API layout is as follows:

admin - mostly just generating read-only reports of like "how many websites are there"
leaderboard - So this is my first attempt at trying to get users involved. Submit a website that other people like? Get points, climb leaderboard.

The source for the RSS feeds came from the (very cool) Kagi small web Github. https://github.com/kagisearch/smallweb. Basically I assume that websites that have submitted their RSS feeds here are cool with me (very rarely) checking for new posts and adding them to my database. If you want the same thing as this does, but as an iFrame, that's the Kagi small web service.

The scraping work is straightforward. We make a background worker, they grab 5 feeds every 600 seconds, they check for new content on each feed and then wait until the 600 seconds has elapsed to grab 5 more from the smallweb list of RSS feeds. Since we have a lot of feeds, this ends up look like we're checking for new content less than once a day which is the interval that I want.

Then we write it out to a sqlite database and basically track "has this URL been reported", if so, put it into a review queue and then how many times this URL has been liked or disliked. I considered a "real" database but honestly sqlite is getting more and more scalable every day and its impossible to beat the immediate start up and functionality. Plus very easy to back up to encrypted object storage which is super nice for a hobby project where you might wipe the prod database at any moment.

In terms of user onboarding I ended up doing the "make an account with an email, I send a link to verify the email". I actually hate this flow and I don't really want to know a users email. I never need to contact you and there's not a lot associated with your account, which makes this especially silly. I have a ton of email addresses and no real "purpose" in having them. I'd switch to Login with Apple, which is great from a security perspective but not everybody has an Apple ID.

I also did a passkey version, which worked fine but the OSS passkey handling was pretty rough still and most people seem to be using a commercial service that handled the "do you have the passkey? Great, if not, fall back to email" flow. I don't really want to do a big commercial login service for a hobby application.

Auth is a JWT, which actually was a pain and I regret doing it. I don't know why I keep reaching for JWTs, they're a bad user experience and I should stop.

Can I just have the source code?

I'm more than happy to release the source code once I feel like the product is in a somewhat stable shape. I'm still ripping down and rewriting relatively large chunks of it as I find weird behavior I don't like or just decide to do things a different way.

In the end it does seem to do whats on the label. We have over 600,000 individual pages indexed.

So how is it to use?

Honestly I've been pretty pleased. But there are some problems.

First I couldn't find a reliable way of switching the keyboard shortcuts to be Mac/Windows specific. I found some options for querying platform but they didn't seem to work, so I ended up just hardcoding them as Alt which is not great.

The other issue is that when you are making an extension, you spend a long time working with these manifests.json. The specific part I really wasn't sure about was:

"browser_specific_settings": {
    "gecko": {
      "id": "[email protected]",
      "strict_min_version": "80.0",
      "data_collection_permissions": {
        "required": ["authenticationInfo"]
      }
    }
  }

I'm not entirely sure if that's all I'm doing? I think so from reading the docs.

Anyway I built this mostly for me. I have no idea if anybody else will enjoy it. But if you are bored I encourage you to give it a try. It should be pretty light weight and straight-forward if you crack open the extension and look at it. I'm not loading any analytics into the extension so basically until people complain about it, I don't really know if its going well or not.

Future stuff

I need to sort stuff into categories so that you get more stuff in genres you like. I don't 100% know how to do that, maybe there is a way to scan a website to determine the "types" of content that is on there with machine learning? I'm still looking into it.
There's a lot of junk in there. I think if we reach a certain number of downvotes I might put it into a special "queue".
I want to ensure new users see the "best stuff" early on but there isn't enough data to determine "best vs worst".
I wish there were more independent photography and science websites. Also more crafts. That's not really a "future thing", just me putting a hope out into the universe. Non-technical beta testers get overwhelmed by technical content.

I broke and fixed my Ghost blog

October 16, 2025

Once a month I will pull down the latest docker images for this server and update the site. The Ghost CMS team updates things at a pretty regular pace so I try to not let an update sit for too long.

With this last round I suddenly found myself locked out of my Ghost admin panel. I was pretty confident that I hadn't forgotten my password and when I was looking at the logs, I saw this pretty spooky error.

blog-1               | [2025-10-15 11:36:29] ERROR "GET /ghost/api/admin/users/me/?include=roles" 403 188ms
blog-1               |
blog-1               | Authorization failed
blog-1               |
blog-1               | "Unable to determine the authenticated user or integration. Check that cookies are being passed through if using session authentication."
blog-1               |
blog-1               | Error ID:
blog-1               |     5b3ec250-aa84-11f0-bb51-b7057fc0f6b0
blog-1               |
blog-1               | ----------------------------------------
blog-1               |
blog-1               | NoPermissionError: Authorization failed
blog-1               |     at authorizeAdminApi (/var/lib/ghost/versions/5.130.5/core/server/services/auth/authorize.js:33:25)
blog-1               |     at Layer.handle [as handle_request] (/var/lib/ghost/versions/5.130.5/node_modules/express/lib/router/layer.js:95:5)
blog-1               |     at next (/var/lib/ghost/versions/5.130.5/node_modules/express/lib/router/route.js:149:13)
blog-1               |     at authenticate (/var/lib/ghost/versions/5.130.5/core/server/services/auth/session/middleware.js:55:13)
blog-1               |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
blog-1               |
blog-1               | [2025-10-15 11:36:29] ERROR "GET /ghost/api/admin/users/me/?include=roles" 403 13ms

I was surprised by this sudden error, especially when I dumped out the database and confirmed that the hashed password for my Ghost user matched the password I was giving it. If you want to try that, this is the guide I followed: https://hostarmada.com/tutorials/blog-cms/ghost/how-to-change-the-admin-password-of-your-ghost-blog-if-you-get-locked-out/

Maybe I messed up the Nginx?

So Ghost is a good CMS system, but it can be a little bit slow under load from automated scraping from RSS readers. I want to cache everything that I can with Nginx, so I use Nginx to store a lot of that junk. My configuration is not too terribly clever and has worked up to this point.

map $sent_http_content_type $expires {
      default                    off;
      text/css                   max;
      application/javascript     max;
      ~image/                    max;
  }

  server {
      listen 80;
      listen [::]:80;
      server_name matduggan.com www.matduggan.com;
      return 301 https://$server_name$request_uri;  # Changed to 301 (permanent)
  }

  proxy_cache_path /tmp/cache levels=1:2 keys_zone=STATIC:512m inactive=24h max_size=10g;
  client_max_body_size 1000M;

  server {
      listen 443 ssl http2;
      listen [::]:443 ssl http2;

      server_name matduggan.com www.matduggan.com;

      charset UTF-8;

      # SSL Configuration
      ssl_certificate         /etc/ssl/cert.pem;
      ssl_certificate_key     /etc/ssl/key.pem;
      ssl_client_certificate  /etc/ssl/cloudflare.crt;
      ssl_verify_client on;

      # Modern TLS settings
      ssl_protocols TLSv1.2 TLSv1.3;
      ssl_prefer_server_ciphers off;  # Let client choose (better for TLS 1.3)
      ssl_session_cache shared:SSL:10m;
      ssl_session_timeout 10m;
      ssl_buffer_size 4k;

      # Security headers
      add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
      add_header X-Frame-Options "SAMEORIGIN" always;
      add_header X-Content-Type-Options "nosniff" always;
      add_header X-XSS-Protection "1; mode=block" always;

      # Compression
      gzip on;
      gzip_vary on;
      gzip_proxied any;
      gzip_comp_level 6;
      gzip_types text/plain text/css text/xml text/javascript application/json application/javascript application/xml+rss application/rss+xml font/truetype font/opentype
  application/vnd.ms-fontobject image/svg+xml;

      expires $expires;

      # Ghost admin and protected routes - no caching
      location ~ ^/(ghost/|p/|\.ghost/|members/) {
          proxy_set_header Host $http_host;
          proxy_set_header X-Real-IP $remote_addr;
          proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
          proxy_set_header X-Forwarded-Proto $scheme;
          proxy_set_header X-Forwarded-Host $http_host;
          proxy_buffering off;
          proxy_cache_bypass 1;
          proxy_no_cache 1;
          add_header Cache-Control "no-cache, no-store, must-revalidate";
          proxy_pass http://127.0.0.1:8080;
      }

      # Public content - cached
      location / {
          proxy_set_header Host $http_host;
          proxy_set_header X-Real-IP $remote_addr;
          proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
          proxy_set_header X-Forwarded-Proto $scheme;

          proxy_buffering on;
          proxy_cache STATIC;
          proxy_cache_valid 200 1d;
          proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504;
          proxy_cache_bypass $http_cache_control;

          add_header X-Cache-Status $upstream_cache_status;

          proxy_pass http://127.0.0.1:8080;
          proxy_redirect off;
      }
  }

The basic point is to get caching on the public content and then definitely NOT cache the ghost admin panel. After some testing, I confirmed this seemed to all work. But I was still locked out.

To the changelog!

Alright so I still couldn't figure out what was going on, so I went through the docs. Then I found this seemingly new addition. https://docs.ghost.org/config?_ga=2.92846045.1713439663.1760543217-1048546310.1760543217#security

Now I have transactional email set up, but just looking at the error it seemed to feel related. So I added: security__staffDeviceVerification: false to my docker-compose file to disable this new feature and then blamo, suddenly works fine.

So if you are locked out of your Docker CMS admin panel, disable this (temporarily hopefully because it's a good feature) to let you continue to log in, debug your transactional email and then turn it back on. Hope that helps.

Greenland is a beautiful nightmare

September 27, 2025

Greenland is a complicated topic here in Denmark. The former colony that is still treated a bit like a colony is something that inspires a lot of emotions. Greenland has been subjected to a lot of unethical experiments by Denmark, from taking their kids to wild experiments in criminal justice. But there is also a genuine pride a lot of people have here for the place and you run into Danes who grew up there more often than I would have guessed.

When the idea of going to Greenland was introduced to me, I was curious. Having lived in Denmark for awhile, you hear a lot about the former colony and its 55,000 residents. We were invited by a family that my wife was close with growing up and is Danish. They wanted to take their father back to see the place he had spend some time in during his 20s and had left quite an impression. A few drinks in, I said "absolutely let's do it", not realizing we had already committed to going and I had missed the text message chain.

A few weeks before I went, I realized "I don't know anything about Greenland" and started to watch some YouTube videos. It was about this time when I started to get a pit in my stomach, the "oh god I think I've made a huge mistake" feeling I'm painfully familiar with after a career in tech. Greenland appeared to have roughly 9 people living there and maybe 5 things to look at. Even professional travel personalities seemed to be scraping the bottom of the barrel. "There's the grocery store again!" they would point out as they slipped down the snowy roads. I couldn't tell any difference between different towns in the country.

It reminded me a lot of driving through Indiana. For those not in the US, Indiana is a state in the US famous for being a state one must drive through in order to get somewhere better. If you live in Michigan, a good state and want to go to Illinois, another good state, one must pass through Indiana, a blank state. Because of this little strip here, you often found yourself passing through this place.

Driving through Indiana isn't bad, it's just an empty void. It's like a time machine back to the 90s when people still smoke in restaurants but also there's nothing that sticks out about it. There is nothing distinct about Indiana, it's just a place full of people who got too tired on their way to somewhere better and decided "this is good enough". The difference is that Greenland is very hard to get to, as I was about to learn.

Finally the day arrived. Me, my wife, daughter, 4 other children and 6 other adults all came to the Copenhagen Airport and held up a gate agent for what felt like an hour to slowly process all of our documents. Meanwhile, I nursed a creeping paranoia that I'd be treated as some sort of American spy, given my government's recent hobby of threatening to purchase entire countries like they're vintage motorcycles on Craigslist.

The 5 hour flight is uneventful, the children are beautifully behaved and I begin to think "well this seems ok!" like the idiot I am. As I can look down and see the airport, the pilot comes on and informs us that there is too much fog to land safely. Surely fog cannot stop a modern aircraft full of all these dials and screens I think, foolishly. We are informed there is enough fuel to circle the airport for 5 hours to wait for the fog to lift.

What followed was three hours of flying in lazy circles, like a very expensive, very slow merry-go-round. After the allotted time, we are informed that we must fly to Iceland to refuel and then we will be returning to Denmark. After a total of 15 hours in the air we will be going back to exactly where we started, to do the entire thing again. We were obviously upset at this turn of events, but I noticed the native Greenlandic folks seemed not surprised at this turn of events. As I later learned, this happens all the time.

The native Greenlanders on board seemed utterly unsurprised by this development, displaying the kind of resigned familiarity that suggested this was Tuesday for them. I began wondering if I could just pretend Iceland was Greenland—surely my family wouldn't notice the difference? But the pilot, apparently reading my mind, announced that no one would be disembarking in Iceland. It felt oddly authoritarian, like being grounded by an airline, as if they knew we'd all just wander off into Reykjavik and call it close enough.

We crash out in a airport hotel 20 minutes from our apartment after 15 hours in the air and tons of CO2 emissions only to wake up the next day to start again. This time, I notice that all of the people are asking for (and receiving) free beer from the crew that they are stashing in their bags. It turns out soda and beer, really anything that needs to be imported, is pretty expensive in Greenland. The complimentary drinks are there to be kept for later.

Finally we land. The first thing you notice when you land in Greenland is there are no trees or grass. There is snow and then there is exposed rock. The exterior of the airport is metal but the inside is wood, which is strange because again there are no trees. This would end up being a theme, where buildings representing Denmark were made out of lots of wood, almost to ensure that you understood they weren't from here. We ended up piling all of our stuff into a bus and heading for the hotel in Nuuk.

Nuuk

Nuuk is the capital of Greenland and your introduction to the incredible calm of the Greenlandic people. I have never met a less stressed out group of humans in my life. Nobody is really rushing anywhere, it's all pretty quiet and calm. The air is cold and crisp with lots of kids playing outside and just generally enjoying life.

The city itself sits in a landscape so dramatically inhospitable it makes the surface of Mars look cozy. Walking through the local mall, half the shops sell gear designed to help you survive what appears to be the apocalypse. Yet somehow, there's traffic. Actual traffic jams in a place where you can walk from one end to the other in twenty minutes. It's like being stuck behind a school bus in your own driveway.

To put this map into some perspective, it is only six kilometers from the sorta furthest tip to the airport.

But riding the bus around Nuuk was a peaceful experience that lets you see pretty much the entire city without needing to book a tour or spend a lot of money. We went to Katuaq, a cultural center with a cafe and a movie theater that was absolutely delicious food.

But again even riding the bus around it is impossible to escape the feeling that this is a fundamentally hostile to human life place. The sun is bright and during the summer its pretty hot, with my skin feeling like it was starting the burn pretty much the second it was exposed to the light. It's hard to even dress for, with layers of sunscreen, bug spray and then something warm on top if you suddenly got cold.

The sun, meanwhile, has apparently forgotten how to set, turning our hotel rooms into solar ovens. You wake up in a pool of your own sweat, crack a window for relief, and immediately get hit with air so cold it feels personal. It's like being trapped in a meteorological mood swing.

So after a night here, we went back to the airport again and flew to our final destination, Ilulissat.

Ilulissat

The flight to our final destination revealed Greenland's true nature: endless, empty hills stretching toward infinity, punctuated by ice formations that look like nature's sculpture garden.

Landing in Ilulissat felt like victory—we'd made it to the actual destination, not just another waypoint in our Arctic odyssey. Walking through the tiny airport, past Danish military recruitment posters (apparently someone, somewhere, thought this place needed defending), I felt genuinely optimistic for the first time in days.

Well you can sleep easy Danish military, because Ilulissat is completely protected from invasion. The second I stepped outside I was set upon by a flood of mosquitos like I have never experienced before. I have been to the jungles of Vietnam, the swamps of Florida and the Canadian countryside. This was beyond anything I've ever experienced.

There are bugs in my mouth, ears, eyes and nose almost immediately. The photo below is not me being dramatic, it is actually what is required to keep them off of me.

In fact what you need to purchase in order to walk around this area at all are basically bug nets for your face. They're effectively plastic mesh bags that you put on.

The Dogs

Our hotel, charming in that "remote Arctic outpost" way, sat adjacent to what I can only describe as a canine correctional facility. Dozens of sled dogs were chained to rocks like some sort of prehistoric parking lot, each with a tiny house they could retreat to when the existential weight of their circumstances became too much.

Now, I'd always imagined sled dogs living their best life—running through snow, tongues lolling, living the Disney version of Arctic life. I'd never really considered their downtime, assuming they frolicked in meadows or something equally wholesome. The reality was more "minimum security prison with a view."

The dogs are visited roughly twice a day by the person who owns and feeds them, which was quite the party for the dogs that lost their minds whenever the car pulled up. Soon the kids really looked forward to dog feeding time. The fish scrapes the dogs lived on came out of a chest freezer that was left exposed up on the rock face without electricity and you could smell it from 50 yards away when it opened.

During one such performance, a fellow parent leaned over and whispered with the casual tone of someone commenting on the weather, "I think that one is dead." Before I could process this information, the frozen canine was unceremoniously launched over a small cliff like a furry discus. A second doggy popsicle followed shortly after, right in front of our assembled children, who watched with the kind of wide-eyed fascination usually reserved for magic shows.

We stopped making dog feeding time a group activity after that and had to distract the kids from ravens flying away with tufts of dog fur.

Whales taste like seaweed

Obviously a big part of Greenland is the nature, specifically the icebergs. Icebergs are incredible and during the week we spend up there, I enjoyed watching them every morning. It's like watching a mountain slowly moving while you sit still. The visual contrast of the ice and the exposed stone is beautiful and peaceful.

Finding our tour operator proved to be an exercise in small-town efficiency. The man who gave me directions was the same person who picked us up from the airport, who was also our tour guide, who probably doubled as the mayor and local meteorologist. It was like a one-man civic operation disguised as multiple businesses—the ultimate small-town gig economy.

The sea around Greenland is calmer than anything I've ever been on before, perfectly calm and serene. All around us whales emerged, thrilling my daughter. However the biggest hit of the entire tour, maybe the entire trip, was a member of the crew who handed each of the kids a giant rock of glacier ice to eat. I had to pull my daughter away to observe the natural beauty as she ate glacier ice like it was ice cream. "LOOK AT MY ICE" she was yelling as they slipped and slid around the deck of this boat.

So if you've ever wonder "what is a glacier", let me tell you. Greenland has a lot of ice and it pushes out from the land that is covers into the sea. When that happens, a lot of it breaks off. This sounds more exciting than it is. On TV in 4K it looks incredible, giant mountains of ice falling into the ocean. Honestly you can go read the same thing I did here.

However that doesn't happen very often. So in order for us tourists to be able to see anything, we had to go to a very productive glacier. This means there are constantly small chunks breaking off and falling into the sea. Practically though, it kinda looks like you are a boat in a slushee. It's beautiful and something to see, but also depressing to see along the rock face how much more ice there used to be.

Back in town, we hopped on the "bus". Now the bus here is clearly a retrofitted party van, complete with blue LED lights. The payment system is zip tied to a desk chair that is, itself, wedged in the front. However the bus works well and does get you around. The confusing part is that you will, once again, sometimes encounter a lot of traffic. People are driving pretty quickly and really seem to have somewhere to go. You also see a lot of fancy cars parked outside of houses here.

Which begs a pretty basic question. If there was almost nowhere to drive to in Nuuk, where in the hell are these people driving. The distance between the end of the road and the beginning of the road is less than 6 km. Also the process to make a road here is beyond anything you've ever seen. Everything requires a giant pile of explosives.

Where did these vehicles even come from? Why does one ship a BMW to a place accessible only by plane and boat? More importantly, where was everyone going with such determination? It was like watching a very expensive version of bumper cars, except everyone was committed to the illusion that they had somewhere important to be. Everyone had dings and scrapes like crashes were common.

Grocery Store from the Sea

Anyway, as I dodged speeding cars filled with people heading nowhere, I decided to hop off the bus and head to the grocery store. Inside was less a store and more the idea of a store. There was a lot of alcohol, chips, candy and shelf-stable foods, which all makes sense to me. What was strange was there wasn't a lot else, including meat. Locals couldn't be eating at the local restaurants, where the prices were as high as Berlin or Copenhagen for food. So what were they eating?

When I asked one of my bus drivers, he told me that it was pretty unusual to buy meat. They purchased a lot of whale and seal meat. I had sorta heard this before, but when we stopped the bus he pointed out a group of men hauling guns out into a small boat to go shoot seals. The guns were held together with a surprising amount of duct tape, which is not something I associate with the wild.

I had assumed, based on my casual reading of the news, that we were mostly done killing whales. As it turns out, I was wrong. They eat a lot of whale and it is, in fact, not hard to find. If you are curious, whale does not taste fishy. It tastes a little bit like if you cooked reindeer in a pot of seaweed. I wouldn't go out of your way for it, but it's not terrible.

The argument I've always heard for why people still kill whales is because it's part of their culture and also because it's an important source of protein. When you hear the phrase "part of their culture" I always imagined like traditional boats going out with spears. What I didn't imagine was industrial fishing boats and an industrial crane that lifts the dead whale out of the water for "processing". Some of the illusion is broken when your boat tour guide points out the metal warehouse with the word "whale" on the side. "Yeah the water here was red with blood for a week" the guide said, counting the cigarettes left in a pack he had.

Should you go to Greenland?

It's a wild place unlike anywhere I've ever been. It is the closest I have ever felt to living a sci-fi type experience. The people of Greenland are amazing, tough, calm and kind. I have nothing but positive experiences to recount from the many people I met there, Danish and Greenlandic, who patiently sat through my millions of questions.

However it is, by far, the least hospitable to human life place I've ever been to. The folks who live there have adapted to the situation in, frankly, genius ways. If that's your idea of a good time, Greenland is perfect for you. Maybe don't get emotionally attached to the sled dogs though. Or the whales.

FYI: Broadcom is ruining Bitnami containers

August 28, 2025 in DevOps

For a long time Bitnami containers and Helm charts have been widely considered the easiest and fastest way to get reliable, latest versions of popular applications built following container best practices. They also have some of the better docs on the internet for figuring out how to configure all this stuff.

However Broadcom, in their infinite capacity for short term gain over long term relationships, has decided to bring that to a close. On July 16th they informed their users that the platform was changing. Originally they were going to break a ton of workflows with only 43 days warning, but have expanded that out to a generous 75 days.

It's impossible to read these timelines as anything other than Broadcom knows that enterprise customers won't be able to switch off in 43 or 75 days and is using this to extort people into paying them the rumored $50,000 a year to keep using the images.

You can read the entire announcement here: https://github.com/bitnami/containers/issues/83267

Here is my summary though:

TL;DR: Bitnami is significantly reducing their free container image offerings and moving most existing images to a legacy repository with no future updates.

What's Changing:

Free Community Tier (Severely Limited):

Only a small subset of hardened images will remain free
Available only with "latest" tags (no version pinning)
Intended for development use only
Find the limited selection at: https://hub.docker.com/u/bitnamisecure

Your Existing Images:

All current Bitnami images (including versioned tags) move to docker.io/bitnamilegacy
No updates, patches, or support for legacy images
Use legacy repo only as temporary migration solution

Production Users:

Need to subscribe to "Bitnami Secure Images" for continued support
Includes security patches, LTS branches, and full version catalog

Action Items for DevOps Teams:

Before September 29th:

Audit your deployments - Check which Bitnami images you're using
Update CI/CD pipelines - Remove dependencies on deprecated images
Choose your path:
- Development only: Migrate to the limited free tier (latest tags only)
- Production: Subscribe to Bitnami Secure Images or find alternatives
- Temporary fix: Update image references to bitnamilegacy/ (not recommended long-term)

Helm Charts:

Source code remains open source on GitHub
Existing OCI charts at docker.io/bitnamicharts won't receive updates
Charts will fail unless you override image repositories

Bottom Line:

If you're using Bitnami for anything beyond basic development with latest tags, you'll need to either pay for Bitnami Secure Images or migrate to alternative container images before September 29th.

What Does a Post-Google Internet Look Like

June 30, 2025

With the rise of the internet came the need to find information more quickly. The concept of search engines came into this space to fill this need, with a relatively basic initial design.

This is the basis of the giant megacorp Google, whose claim to fame was they made the best one of these. Into this stack they inject ads, both ads inside the sites themselves and then turning the search results themselves into ads.

As time went on, what we understood to be "Google search" was actually a pretty sophisticated machine that effectively determined what websites lived or died. It was the only portal that niche websites had to get traffic. Google had the only userbase large enough for a website dedicated to retro gaming or VR headsets or whatever to get enough clicks to pay their bills.

Despite the complexity, the basic premise remained. Google steers traffic towards your site, the user gets the answer from your site and then everyone is happy. Google showed some ads, you showed some ads, everyone showed everyone on Earth ads.

This incredibly lucrative setup was not enough, however, to drive endless continous growth, which is now the new expectation of all tech companies. It is not enough to be fabulously profitable, you must become Weyland-Yutani. So now Google is going to break this long-standing agreement with the internet and move everything we understand to be "internet search" inside their silo.

Zero-Click Results

In March 2024 Google moved to embed LLM answers in their search results (source). The AI Overview takes the first 100 results from your search query, combines their answers and then returns what it thinks is the best answer. As expected, websites across the internet saw a drop in traffic from Google. You started to see a flood of smaller websites launch panic membership programs, sell off their sites, etc.

It became clear that Google has decided to abandon the previous concept of how internet search worked, likely in the face of what it considers to be an existential threat from OpenAI. Maybe the plan was always to bring the entire search process in-house, maybe not, but OpenAI and its rise to fame seems to have forced Google's hand in this space.

This is not a new thing, Google has been moving in this direction for years. It was a trend people noticed going back to 2019.

It appears the future of Google Search is going to be a closed loop that looks like the following:

Google LLM takes the information from the results it has already ingested to respond to most questions.
Companies will at some point pay for their product or service to be "the answer" in different categories. Maybe this gets disclosed, maybe not, maybe there's just a little i in the corner that says "these answers may be influenced by marketing partners" or something.
Google will attempt to reassure strategic partners that they aren't going to kill them, while at the same time turning to their relationship with Reddit to supply their "new data".

This is all backed up by data from outside the Google ecosystem confirming that the ratio of scrapes to click is going up. Basically it's costing more for these services to make their content available to LLMs and they're getting less traffic from them.

This new global strategy makes sense, especially in the context of the frequent Google layoffs. Previously it made strategic sense to hold onto all the talent they could, now it doesn't matter because the gates are closing. Even if you had all the ex-Google engineers money could buy, you can't make a better search engine because the concept is obsolete. Google has taken everything they need from the internet, it no longer requires the cooperation or goodwill of the people who produce that content.

What happens next?

So the source of traffic for the internet is going to go away. My guess is there will be some effort to prevent this, some sort of alternative Google search either embraced or pushed by people. This is going to fail, because Google is an unregulated monopoly. Effectively because the US government is so bad at regulating companies and so corrupt with legalized bribery in the form of lobbying, you couldn't stop Google at this point even if you wanted to.

Android is the dominant mobile platform on Earth
Chrome is the dominant web browser
Apple gets paid to make the other mobile platform default to Google
Firefox gets paid to make the other web browser default to Google

While the US Department of Justice has finally decided to doing something, it's almost too late to make a difference. https://www.justice.gov/opa/pr/department-justice-prevails-landmark-antitrust-case-against-google

Even if you wanted to and had a lot of money to throw at the problem, it's too late. If Apple made their own search engine and pointed iOS to it as the default and paid Firefox to make it the default, it still wouldn't matter. The AI Overview is a good enough answer for most questions and so convincing consumers to:

switch platforms
and go back to a two/three/four step process compared to a one step process is a waste of time.

I'm confident there will still be sites doing web searching, but I suspect given the explosion in AI generated slop it's going to be impossible to use them even if you wanted to. We're quickly reaching a point where it would be possible to generate a web page on demand, meaning the capacity of the slop-generation exceeds the capacity of humans to fight it.

Because we didn't regulate the internet, we're going to end up with an unbreakable monopoly on all human knowledge held by Microsoft and Google. Then because we didn't learn anything we're going to end up with a system that can produce false data on demand and make it impossible to fact check anything that the LLM companies return. Paid services like Kogi will be the only search engines worth trying.

Impact down the line

So I think you are going to see a rush of shutdowns and paywalls like you've never seen before. In some respects, it is going to be a return to the pre-Google internet, where it will once again be important that consumers know your domain name and go directly to your site. It's going to be a massive consolidation of the internet down and I think the ad-based economy of the modern web will collapse. Google was the ad broker, but now they're going to operate like Meta and keep the entire cycle inside their system.

My prediction is that this is going to basically destroy any small or medium sized business that attempts to survive with the model of "produce content, get paid per visitor through ads". Everything instead is going to get moved behind aggressive paywalls, blocking archive.org. You'll also see prices go way up for memberships. Access to raw, human produced information is going to be a premium product, not something for everyday people. Fake information will be free.

Anyone attempting to make an online store is gonna get mob-style shakedown. You can either pay Amazon to let consumers see your product or you can pay Google to have their LLM recommend your product or you can (eventually) pay OpenAI/Microsoft to do it. I also think these companies will use this opportunity to dramatically reprice their advertising offerings. I don't think it'll be cheap to get the AI Summary to recommend your frying pan.

I suspect there will be a brief spike in other forms of marketing spend, like podcasts, billboards, etc. When companies see the sticker shock from Google they're going to explore other avenues like social media spend, influencers, etc. But all those channels are going to be eaten by the LLM snake at the same time.

If consumers are willing to engage with an LLM-generated influencer, that'll be the direction companies go in because they'll be cheaper and more reliable. Podcast search results are gonna be flooded with LLM-generated shows and my guess is that they're going to take more of the market share than anyone wants to admit. Twitch streaming has already moved from seeing the person to seeing an anime-style virtual overlay where you don't see the persons face. There won't be a reason for an actual human to be involved in that process.

End Game

My prediction is that a lot of the places that employ technical people are going to disappear. FAANG isn't going to be hiring at anywhere near the same rate they were before, because they won't need to. I don't need 10,000 people maintaining relationships with ad sellers and ad buyers or any of the staff involved in the maintenance or improvement of those systems.

The internet is going to return to more of its original roots, which are niche fan websites you largely find through social media or word of mouth. These sites aren't going to be ad driven, they'll be membership driven. Very few of them are going to survive. Subscription fatigue is a real thing and the math of "it costs a lot of money to pay people to write high quality content" isn't going to go away.

In a relatively short period of time, it will go from "very difficult" to absolutely impossible to launch a new commercially viable website and have users organically discover that website. You'll have to block LLM scrapers and need a tremendous amount of money to get a new site bootstrapped. Welcome to the future, where asking a question costs $4.99 and you'll never be able to find out if the answer is right or not.

What Would a Kubernetes 2.0 Look Like

June 19, 2025

Around 2012-2013 I started to hear a lot in the sysadmin community about a technology called "Borg". It was (apparently) some sort of Linux container system inside of Google that ran all of their stuff. The terminology was a bit baffling, with something called a "Borglet" inside of clusters with "cells" but the basics started to leak. There was a concept of "services" and a concept of "jobs", where applications could use services to respond to user requests and then jobs to complete batch jobs that ran for much longer periods of time.

Then on June 7th, 2014, we got our first commit of Kubernetes. The Greek word for 'helmsman' that absolutely no one could pronounce correctly for the first three years. (Is it koo-ber-NET-ees? koo-ber-NEET-ees? Just give up and call it k8s like the rest of us.)

Microsoft, RedHat, IBM, Docker join the Kubernetes community pretty quickly after this, which raised Kubernetes from an interesting Google thing to "maybe this is a real product?" On July 21st 2015 we got the v1.0 release as well as the creation of the CNCF.

In the ten years since that initial commit, Kubernetes has become a large part of my professional life. I use it at home, at work, on side projects—anywhere it makes sense. It's a tool with a steep learning curve, but it's also a massive force multiplier. We no longer "manage infrastructure" at the server level; everything is declarative, scalable, recoverable and (if you’re lucky) self-healing.

But the journey hasn't been without problems. Some common trends have emerged, where mistakes or misconfiguration arise from where Kubernetes isn't opinionated enough. Even ten years on, we're still seeing a lot of churn inside of ecosystem and people stepping on well-documented landmines. So, knowing what we know now, what could we do differently to make this great tool even more applicable to more people and problems?

What did k8s get right?

Let's start with the positive stuff. Why are we still talking about this platform now?

Containers at scale

Containers as a tool for software development make perfect sense. Ditch the confusion of individual laptop configuration and have one standard, disposable concept that works across the entire stack. While tools like Docker Compose allowed for some deployments of containers, they were clunky and still required you as the admin to manage a lot of the steps. I set up a Compose stack with a deployment script that would remove the instance from the load balancer, pull the new containers, make sure they started and then re-added it to the LB, as did lots of folks.

K8s allowed for this concept to scale out, meaning it was possible to take a container from your laptop and deploy an identical container across thousands of servers. This flexibility allowed organizations to revisit their entire design strategy, dropping monoliths and adopting more flexible (and often more complicated) micro-service designs.

Low-Maintenance

If you think of the history of Operations as a sort of "naming timeline from pets to cattle", we started with what I affectionately call the "Simpsons" era. Servers were bare metal boxes set up by teams, they often had one-off names that became slang inside of teams and everything was a snowflake. The longer a server ran, the more cruft it picked up until it became a scary operation to even reboot them, much less attempt to rebuild them. I call it the "Simpsons" era because among the jobs I was working at the time, naming them after Simpsons characters was surprisingly common. Nothing fixed itself, everything was a manual operation.

Then we transition into the "01 Era". Tools like Puppet and Ansible have become common place, servers are more disposable and you start to see things like bastion hosts and other access control systems become the norm. Servers aren't all facing the internet, they're behind a load balancer and we've dropped the cute names for stuff like "app01" or "vpn02". Organizations designed it so they could lose some of their servers some of the time. However failures still weren't self-healing, someone still had to SSH in to see what broke, write up a fix in the tooling and then deploy it across the entire fleet. OS upgrades were still complicated affairs.

We're now in the "UUID Era". Servers exist to run containers, they are entirely disposable concepts. Nobody cares about how long a particular version of the OS is supported for, you just bake a new AMI and replace the entire machine. K8s wasn't the only technology enabling this, but it was the one that accelerated it. Now the idea of a bastion server with SSH keys that I go to the underlying server to fix problems is seen as more of a "break-glass" solution. Almost all solutions are "destroy that Node, let k8s reorganize things as needed, make a new Node".

A lot of the Linux skills that were critical to my career are largely nice to have now, not need to have. You can be happy or sad about that, I certainly switch between the two emotions on a regular basis, but it's just the truth.

Running Jobs

The k8s jobs system isn't perfect, but it's so much better than the "snowflake cron01 box" that was an extremely common sight at jobs for years. Running on a cron schedule or running from a message queue, it was now possible to reliably put jobs into a queue, have them get run, have them restart if they didn't work and then move on with your life.

Not only does this free up humans from a time-consuming and boring task, but it's also simply a more efficient use of resources. You are still spinning up a pod for every item in the queue, but your teams have a lot of flexibility inside of the "pod" concept for what they need to run and how they want to run it. This has really been a quality of life improvement for a lot of people, myself included, who just need to be able to easily background tasks and not think about them again.

Service Discoverability and Load Balancing

Hard-coded IP addresses that lived inside of applications as the template for where requests should be routed has been a curse following me around for years. If you were lucky, these dependencies weren't based on IP address but were actually DNS entries and you could change the thing behind the DNS entry without coordinating a deployment of a million applications.

K8s allowed for simple DNS names to call other services. It removed an entire category of errors and hassle and simplified the entire thing down. With the Service API you had a stable, long lived IP and hostname that you could just point things towards and not think about any of the underlying concepts. You even have concepts like ExternalName that allow you to treat external services like they're in the cluster.

What would I put in a Kubernetes 2.0?

Ditch YAML for HCL

YAML was appealing because it wasn't JSON or XML, which is like saying your new car is great because it's neither a horse nor a unicycle. It demos nicer for k8s, looks nicer sitting in a repo and has the illusion of being a simple file format. In reality. YAML is just too much for what we're trying to do with k8s and it's not a safe enough format. Indentation is error-prone, the files don't scale great (you really don't want a super long YAML file), debugging can be annoying. YAML has so many subtle behaviors outlined in its spec.

I still remember not believing what I was seeing the first time I saw the Norway Problem. For those lucky enough to not deal with it, the Norway Problem in YAML is when 'NO' gets interpreted as false. Imagine explaining to your Norwegian colleagues that their entire country evaluates to false in your configuration files. Add in accidental numbers from lack of quotes, the list goes on and on. There are much better posts on why YAML is crazy than I'm capable of writing: https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-from-hell

Why HCL?

HCL is already the format for Terraform, so at least we'd only have to hate one configuration language instead of two. It's strongly typed with explicit types. There's already good validation mechanisms. It is specifically designed to do the job that we are asking YAML to do and it's not much harder to read. It has built-in functions people are already using that would allow us to remove some of the third-party tooling from the YAML workflow.

I would wager 30% of Kubernetes clusters today are already being managed with HCL via Terraform. We don't need the Terraform part to get a lot of the benefits of a superior configuration language.

The only downsides are that HCL is slightly more verbose than YAML, and its Mozilla Public License 2.0 (MPL-2.0) would require careful legal review for integration into an Apache 2.0 project like Kubernetes. However, for the quality-of-life improvements it offers, these are hurdles worth clearing.

Why HCL is better

Let's take a simple YAML file.

# YAML doesn't enforce types
replicas: "3"  # String instead of integer
resources:
  limits:
    memory: 512  # Missing unit suffix
  requests:
    cpu: 0.5m    # Typo in CPU unit (should be 500m)

Even in the most basic example, there are footguns everywhere. HCL and the type system would catch all of these problems.

replicas = 3  # Explicitly an integer

resources {
  limits {
    memory = "512Mi"  # String for memory values
  }
  requests {
    cpu = 0.5  # Number for CPU values
  }
}

Take a YAML file like this that you probably have 6000 in your k8s repo. Now look at HCL without needing external tooling.

# Need external tools or templating for dynamic values
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  # Can't easily generate or transform values
  DATABASE_URL: "postgres://user:password@db:5432/mydb"
  API_KEY: "static-key-value"
  TIMESTAMP: "2023-06-18T00:00:00Z"  # Hard-coded timestamp

resource "kubernetes_config_map" "app_config" {
  metadata {
    name = "app-config"
  }
  
  data = {
    DATABASE_URL = "postgres://${var.db_user}:${var.db_password}@${var.db_host}:${var.db_port}/${var.db_name}"
    API_KEY      = var.api_key != "" ? var.api_key : random_string.api_key.result
    TIMESTAMP    = timestamp()
  }
}

resource "random_string" "api_key" {
  length  = 32
  special = false
}

Here's all the pros you get with this move.

Type Safety: Preventing type-related errors before deployment
Variables and References: Reducing duplication and improving maintainability
Functions and Expressions: Enabling dynamic configuration generation
Conditional Logic: Supporting environment-specific configurations
Loops and Iteration: Simplifying repetitive configurations
Better Comments: Improving documentation and readability
Error Handling: Making errors easier to identify and fix
Modularity: Enabling reuse of configuration components
Validation: Preventing invalid configurations
Data Transformations: Supporting complex data manipulations

Allow etcd swap-out

I know, I'm the 10,000 person to write this. Etcd has done a fine job, but it's a little crazy that it is the only tool for the job. For smaller clusters or smaller hardware configuration, it's a large use of resources in a cluster type where you will never hit the node count where it pays off. It's also a strange relationship between k8s and etcd now, where k8s is basically the only etcd customer left.

What I'm suggesting is taking the work of kine and making it official. It makes sense for the long-term health of the project to have the ability to plug in more backends, adding this abstraction means it (should) be easier to swap in new/different backends in the future and it also allows for more specific tuning depending on the hardware I'm putting out there.

What I suspect this would end up looking like is much like this: https://github.com/canonical/k8s-dqlite. Distributed SQlite in-memory with Raft consensus and almost zero upgrade work required that would allow cluster operators to have more flexibility with the persistence layer of their k8s installations. If you have a conventional server setup in a datacenter and etcd resource usage is not a problem, great! But this allows for lower-end k8s to be a nicer experience and (hopefully) reduces dependence on the etcd project.

Beyond Helm: A Native Package Manager

Helm is a perfect example of a temporary hack that has grown to be a permanent dependency. I'm grateful to the maintainers of Helm for all of their hard work, growing what was originally a hackathon project into the de-facto way to install software into k8s clusters. It has done as good a job as something could in fulfilling that role without having a deeper integration into k8s.

All that said, Helm is a nightmare to use. The Go templates are tricky to debug, often containing complex logic that results in really confusing error scenarios. The error messages you get from those scenarios are often gibberish. Helm isn't a very good package system because it fails at some of the basic tasks you need a package system to do, which are transitive dependencies and resolving conflicts between dependencies.

What do I mean?

Tell me what this conditional logic is trying to do:

# A real-world example of complex conditional logic in Helm
{{- if or (and .Values.rbac.create .Values.serviceAccount.create) (and .Values.rbac.create (not .Values.serviceAccount.create) .Values.serviceAccount.name) }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: {{ template "myapp.fullname" . }}
  labels:
    {{- include "myapp.labels" . | nindent 4 }}
{{- end }}

Or if I provide multiple values files to my chart, which one wins:

helm install myapp ./mychart -f values-dev.yaml -f values-override.yaml --set service.type=NodePort

Ok, what if I want to manage my application and all the application dependencies with a Helm chart. This makes sense, I have an application that itself has dependencies on other stuff so I want to put them all together. So I define my sub-charts or umbrella charts inside of my Chart.yaml.

dependencies:
- name: nginx
  version: "1.2.3"
  repository: "<https://example.com/charts>"
- name: memcached
  version: "1.2.3"
  repository: "<https://another.example.com/charts>"

But assuming I have multiple applications, it's entirely possible that I have 2 services both with a dependency on nginx or whatever like this:

Helm doesn't handle this situation gracefully because template names are global with their templates loaded alphabetically. Basically you need to:

Don't declare a dependency on the same chart more than once (hard to do for a lot of microservices)
If you do have the same chart declared multiple times, has to use the exact same version

The list of issues goes on and on.

Cross-Namespace installation stinks
Chart verification process is a pain and nobody uses it

Let's just go to the front page of artifacthub:

I'll grab elasticsearch cause that seems important.

Seems pretty bad for the Official Elastic helm chart. Certainly ingress-nginx will be right, it's an absolute critical dependency for the entire industry.

Nope. Also how is the maintainer of the chart "Kubernetes" and it's still not marked as a verified publisher. Like Christ how much more verified does it get.

No metadata in chart searching. You can only search by name and description, not by features, capabilities, or other metadata.

Helm doesn't strictly enforce semantic versioning

# Chart.yaml with non-semantic version
apiVersion: v2
name: myapp
version: "v1.2-alpha"

If you uninstall and reinstall a chart with CRDs, it might delete resources created by those CRDs. This one has screwed me multiple times and is crazy unsafe.

I could keep writing for another 5000 words and still wouldn't have outlined all the problems. There isn't a way to make Helm good enough for the task of "package manager for all the critical infrastructure on the planet".

What would a k8s package system look like?

Let's call our hypothetical package system KubePkg, because if there's one thing the Kubernetes ecosystem needs, it's another abbreviated name with a 'K' in it. We would try to copy as much of the existing work inside the Linux ecosystem while taking advantage of the CRD power of k8s. My idea looks something like this:

The packages are bundles like a Linux package:

There's a definition file that accounts for as many of the real scenarios that you actually encounter when installing a thing.

apiVersion: kubepkg.io/v1
kind: Package
metadata:
  name: postgresql
  version: 14.5.2
spec:
  maintainer:
    name: "PostgreSQL Team"
    email: "[email protected]"
  description: "PostgreSQL database server"
  website: "https://postgresql.org"
  license: "PostgreSQL"
  
  # Dependencies with semantic versioning
  dependencies:
    - name: storage-provisioner
      versionConstraint: ">=1.0.0"
    - name: metrics-collector
      versionConstraint: "^2.0.0"
      optional: true
  
  # Security context and requirements
  security:
    requiredCapabilities: ["CHOWN", "SETGID", "SETUID"]
    securityContextConstraints:
      runAsUser: 999
      fsGroup: 999
    networkPolicies:
      - ports:
        - port: 5432
          protocol: TCP
    
  # Resources to be created (embedded or referenced)
  resources:
    - apiVersion: v1
      kind: Service
      metadata:
        name: postgresql
      spec:
        ports:
        - port: 5432
    - apiVersion: apps/v1
      kind: StatefulSet
      metadata:
        name: postgresql
      spec:
        # StatefulSet definition
  
  # Configuration schema using JSON Schema
  configurationSchema:
    type: object
    properties:
      replicas:
        type: integer
        minimum: 1
        default: 1
      persistence:
        type: object
        properties:
          size:
            type: string
            pattern: "^[0-9]+[GMK]i$"
            default: "10Gi"
  
  # Lifecycle hooks with proper sequencing
  hooks:
    preInstall:
      - name: database-prerequisites
        job:
          spec:
            template:
              spec:
                containers:
                - name: init
                  image: postgres:14.5
    postInstall:
      - name: database-init
        job:
          spec:
            # Job definition
    preUpgrade:
      - name: backup
        job:
          spec:
            # Backup job definition
    postUpgrade:
      - name: verify
        job:
          spec:
            # Verification job definition
    preRemove:
      - name: final-backup
        job:
          spec:
            # Final backup job definition
  
  # State management for stateful applications
  stateManagement:
    backupStrategy:
      type: "snapshot"  # or "dump"
      schedule: "0 2 * * *"  # Daily at 2 AM
      retention:
        count: 7
    recoveryStrategy:
      type: "pointInTime"
      verificationJob:
        spec:
          # Job to verify recovery success
    dataLocations:
      - path: "/var/lib/postgresql/data"
        volumeMount: "data"
    upgradeStrategies:
      - fromVersion: "*"
        toVersion: "*"
        strategy: "backup-restore"
      - fromVersion: "14.*.*"
        toVersion: "14.*.*"
        strategy: "in-place"

There's a real signing process that would be required and allow you more control over the process.

apiVersion: kubepkg.io/v1
kind: Repository
metadata:
  name: official-repo
spec:
  url: "https://repo.kubepkg.io/official"
  type: "OCI"  # or "HTTP"
  
  # Verification settings
  verification:
    publicKeys:
      - name: "KubePkg Official"
        keyData: |
          -----BEGIN PUBLIC KEY-----
          MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAvF4+...
          -----END PUBLIC KEY-----
    trustPolicy:
      type: "AllowList"  # or "KeyRing"
      allowedSigners:
        - "KubePkg Official"
        - "Trusted Partner"
    verificationLevel: "Strict"  # or "Warn", "None"

Like how great would it be to have something where I could automatically update packages without needing to do anything on my side.

apiVersion: kubepkg.io/v1
kind: Installation
metadata:
  name: postgresql-main
  namespace: database
spec:
  packageRef:
    name: postgresql
    version: "14.5.2"
  
  # Configuration values (validated against schema)
  configuration:
    replicas: 3
    persistence:
      size: "100Gi"
    resources:
      limits:
        memory: "4Gi"
        cpu: "2"
  
  # Update policy
  updatePolicy:
    automatic: false
    allowedVersions: "14.x.x"
    schedule: "0 2 * * 0"  # Weekly on Sunday at 2am
    approvalRequired: true
  
  # State management reference
  stateRef:
    name: postgresql-main-state
    
  # Service account to use
  serviceAccountName: postgresql-installer

What k8s needs is a system that meets the following requirements:

True Kubernetes Native: Everything is a Kubernetes resource with proper status and events
First-Class State Management: Built-in support for stateful applications
Enhanced Security: Robust signing, verification, and security scanning
Declarative Configuration: No templates, just structured configuration with schemas
Lifecycle Management: Comprehensive lifecycle hooks and upgrade strategies
Dependency Resolution: Linux-like dependency management with semantic versioning
Audit Trail: Complete history of changes with who, what, and when, not what Helm currently provides.
Policy Enforcement: Support for organizational policies and compliance.
Simplified User Experience: Familiar Linux-like package management commands. It seems wild that we're trying to go a different direction from the package systems that have worked for decades.

IPv6 By Default

Try to imagine, across the entire globe, how much time and energy has been invested in trying to solve any one of the following three problems.

I need this pod in this cluster to talk to that pod in that cluster.
There is a problem happening somewhere in the NAT traversal process and I need to solve it
I have run out of IP addresses with my cluster because I didn't account for how many you use. Remember: A company starting with a /20 subnet (4,096 addresses), deploys 40 nodes with 30 pods each, and suddenly realizes they're approaching their IP limit. Not that many nodes!

I am not suggesting the entire internet switches over to IPv6 and right now k8s happily supports IPv6-only if you want and a dualstack approach. But I'm saying now is the time to flip the default and just go IPv6. You eliminate a huge collection of problems all at once.

Flatter, less complicated network topology inside of the cluster.
The distinction between multiple clusters becomes a thing organizations can choose to ignore if they want if they want to get public IPs.
Easier to understand exactly the flow of traffic inside of your stack.
Built-in IPSec

It has nothing to do with driving IPv6 adoption across the entire globe and just an acknowledgement that we no longer live in a world where you have to accept the weird limitations of IPv4 in a universe where you may need 10,000 IPs suddenly with very little warning.

The benefits for organizations with public IPv6 addresses is pretty obvious, but there's enough value there for cloud providers and users that even the corporate overlords might get behind it. AWS never needs to try and scrounge up more private IPv4 space inside of a VPC. That's gotta be worth something.

Conclusion

The common rebuttal to these ideas is, "Kubernetes is an open platform, so the community can build these solutions." While true, this argument misses a crucial point: defaults are the most powerful force in technology. The "happy path" defined by the core project dictates how 90% of users will interact with it. If the system defaults to expecting signed packages and provides a robust, native way to manage them, that is what the ecosystem will adopt.

This is an ambitious list, I know. But if we're going to dream, let's dream big. After all, we're the industry that thought naming a technology 'Kubernetes' would catch on, and somehow it did!

We see this all the time in other areas like mobile developer and web development, where platforms assess their situation and make radical jumps forward. Not all of these are necessarily projects that the maintainers or companies would take on but I think they're all ideas that someone should at least revisit and think "is it worth doing now that we're this nontrivial percentage of all datacenter operations on the planet"?

Questions/feedback/got something wrong? Find me here: https://c.im/@matdevdug

Simple Python Script for FTP Uploads

June 12, 2025

So awhile ago I purchased a Tp-link AX3000 wireless router as a temporary same-day fix to a dying AP. Of course, like all temporary fixes, this one ended up being super permanent. It's a fine wireless router, nothing interesting to report, but one of the features I stumbled upon when I was clicking around the webUI seemed like a great solution for a place to stick random files.

Inside of Advanced Settings, you'll see this pane:

You have a few options for how to expose this USB drive:

I actually didn't find the SMB to work that well in my testing, seemingly disconnecting all the time. But FTP works pretty well. So that's what I ended up using, which was fine except seemingly randomly files were getting corrupted when I moved them over.

FAT32 will never die

Looking at the failing files, I realized they were all over 4 GB and thought "there's no way in 2025 they are formatting this external drive in FAT32, right?" To be clear, I didn't partition this drive. The router offered to wipe it when I plugged it in and I said sure.

However that is exactly what they are doing, which means we have a file size limit of 4 GB per file. This explained the transfer problems and, while still annoying, is not a complicated thing to work around.

Script to transfer stuff to the local FTP

Notes:

LOCAL_DIR will obviously need to get changed
FTP_HOST has a different IP range than the default router range because of specific stuff for me. You'll need to check that.
FTP_PASS required an email address format. I don't know why.
The directory of "G" was assigned to me by the router, so I assume this is a common convention with these routers. I don't know why it puts a directory inside of the drive instead of writing out to the root of the drive. Presumably some Windows convention.

#!/usr/bin/env python3
import os
import ftplib
import hashlib
import sys
import time

# --- Configuration ---
# Adjust these settings to match your environment
LOCAL_DIR = "/mnt/usb/test"
FTP_HOST = "192.168.86.1"
FTP_USER = "anonymous"
FTP_PASS = "[email protected]"
FTP_TARGET_DIR = "G"

# Do not transfer files larger than this size (4 GiB - 1 byte).
MAX_FILE_SIZE = 4294967295

# The size of chunks to use for reading and hashing files.
CHUNK_SIZE = 8192

# --- Helper Functions ---

def get_file_hash(file_path):
    sha256 = hashlib.sha256()
    try:
        with open(file_path, "rb") as f:
            # Read the file in chunks to handle large files efficiently
            for byte_block in iter(lambda: f.read(CHUNK_SIZE), b""):
                sha256.update(byte_block)
        return sha256.hexdigest()
    except IOError as e:
        print(f"  - Error reading file for hashing: {e}")
        return None

def ftp_makedirs(ftp, path):
    """Recursively creates a directory structure on the FTP server."""
    parts = path.strip('/').split('/')
    current_dir = ''
    for part in parts:
        current_dir += '/' + part
        try:
            ftp.mkd(current_dir)
            print(f"  - Created remote directory: {current_dir}")
        except ftplib.error_perm as e:
            # Error 550 often means the directory already exists.
            if "550" in str(e):
                pass
            else:
                print(f"  - FTP error while creating directory {current_dir}: {e}")
                raise

def upload_and_verify(ftp, local_path, remote_filename):
    """
    Uploads a file, verifies its integrity via hashing, and deletes
    the local file upon successful verification.
    """
    
    print(f"  - Calculating hash for local file: {local_path}")
    local_hash = get_file_hash(local_path)
    if not local_hash:
        return False

    print(f"  - Uploading to '{ftp.pwd()}/{remote_filename}'...")
    try:
        with open(local_path, 'rb') as f:
            ftp.storbinary(f'STOR {remote_filename}', f, CHUNK_SIZE)
        print("  - Upload complete.")
    except ftplib.all_errors as e:
        print(f"  - !!! Upload failed: {e}")
        return False

    print("  - Verifying remote file integrity...")
    remote_hash = ""
    try:
        sha256_remote = hashlib.sha256()
        ftp.retrbinary(f'RETR {remote_filename}', sha256_remote.update, CHUNK_SIZE)
        remote_hash = sha256_remote.hexdigest()
    except ftplib.all_errors as e:
        print(f"  - !!! Verification failed. Could not download remote file: {e}")
        return False

    # 4. Compare hashes and delete local file if they match
    print(f"  - Local  Hash: {local_hash}")
    print(f"  - Remote Hash: {remote_hash}")
    if local_hash == remote_hash:
        print("  - ✅ Integrity check PASSED. Hashes match.")
        try:
            os.remove(local_path)
            print(f"  - Successfully deleted local file: {local_path}")
            return True
        except OSError as e:
            print(f"  - !!! Error deleting local file: {e}")
            return False
    else:
        print("  - ❌ Integrity check FAILED. Hashes do not match.")
        print("  - Deleting corrupt file from FTP server...")
        try:
            ftp.delete(remote_filename)
            print(f"  - Remote file '{remote_filename}' deleted.")
        except ftplib.all_errors as e:
            print(f"  - !!! Could not delete corrupt remote file: {e}")
        return False

def main():
    print("Starting FTP transfer process...")
    if not os.path.isdir(LOCAL_DIR):
        print(f"Error: Local directory '{LOCAL_DIR}' does not exist.")
        sys.exit(1)

    print("\n--- Pass 1: Scanning for oversized files ---")
    dirs_to_skip = set()
    for root, _, files in os.walk(LOCAL_DIR):
        for filename in files:
            file_path = os.path.join(root, filename)
            try:
                if os.path.getsize(file_path) > MAX_FILE_SIZE:
                    print(f"Oversized file found: {file_path}")
                    print(f" -> Marking directory for skipping: {root}")
                    dirs_to_skip.add(root)
                    break
            except OSError as e:
                print(f"Could not access {file_path}: {e}")
    print("--- Pass 1 Complete ---")

    # --- Pass 2: Connect to FTP and transfer valid files ---
    print("\n--- Pass 2: Connecting to FTP and transferring files ---")
    try:
        with ftplib.FTP(FTP_HOST, timeout=30) as ftp:
            ftp.login(FTP_USER, FTP_PASS)
            print(f"Successfully connected to {FTP_HOST}.")

            # Create and change to the base target directory
            ftp_makedirs(ftp, FTP_TARGET_DIR)
            ftp.cwd(FTP_TARGET_DIR)
            print(f"Changed to remote base directory: {ftp.pwd()}")

            for root, dirs, files in os.walk(LOCAL_DIR, topdown=True):
                # Check if the current directory is in our skip list
                if any(root.startswith(skip_dir) for skip_dir in dirs_to_skip):
                    print(f"\n⏭️  Skipping directory and its contents: {root}")
                    dirs[:] = []  # Prune subdirectories from the walk
                    continue

                print(f"\nProcessing directory: {root}")

                # Create the corresponding directory structure on the FTP server
                remote_subdir = os.path.relpath(root, LOCAL_DIR)
                remote_path = ftp.pwd()
                if remote_subdir != '.':
                    remote_path = os.path.join(ftp.pwd(), remote_subdir).replace("\\", "/")
                    ftp_makedirs(ftp, remote_path)

                # Process each file in the current valid directory
                for filename in files:
                    local_file_path = os.path.join(root, filename)

                    if os.path.getsize(local_file_path) > MAX_FILE_SIZE:
                        continue

                    print(f"\n📂 Processing file: {filename}")

                    ftp.cwd(remote_path)

                    upload_and_verify(ftp, local_file_path, filename)

    except ftplib.all_errors as e:
        print(f"\nFTP Error: {e}")
    except Exception as e:
        print(f"\nAn unexpected error occurred: {e}")

    print("\nProcess complete.")

if __name__ == "__main__":
    main()

Now I have an easy script that I can use to periodically offload directories full of files that I don't know if I want to delete yet, but I don't want on my laptop.

Write your own Ghost Theme

June 05, 2025 in ghost

In general, Ghost CMS has been a good tool for me. I've been pleased by the speed and reliability of the platform, with the few problems I have run into being fixed by the Ghost team pretty quickly. From the very beginning though I've struggled with the basic approach of the Ghost platform.

At its core, the Ghost CMS tool is a newsletter platform. This makes sense, it's how small content creators actually generate revenue. But I don't need any of that functionality, as I don't want to capture a bunch of users email addresses. I'm lucky enough to not need the $10 a month it costs to host this website on my own and I'd rather not have to think about who I would need to notify if my database got breached.

But it means that most of the themes for Ghost are completely cluttered with junk I don't need. I started working on my own CMS, but other than the more simplistic layout, I couldn't think of anything my CMS did that was better than Ghost or Wordpress. There was less code, but it was code I was going to have to maintain. After going through the source for a bunch of Ghost themes, I realized I could probably get where I wanted to go through the theme work alone.

I didn't find a ton of resources on how to actually crank out a theme, so I figured I would write up the base outline I sketched out as I worked.

Make your own Ghost theme

So Ghost uses the Handlebars library to make templates. Here's the basic layout:

/your-theme-name/
|
├── /assets/
|   ├── /css/
|   |   └── screen.css
|   ├── /js/
|   |   └── main.js
|   └── /fonts/
|       └── ...
|
├── /partials/
|   ├── header.hbs
|   ├── footer.hbs
|   └── ...
|
├── default.hbs
├── index.hbs
├── post.hbs
├── page.hbs
├── tag.hbs
├── author.hbs
└── package.json

This is what they all do:

package.json(required): The theme's "ID card." This JSON file contains metadata like the theme's name, version, author, and crucial configuration settings such as the number of posts per page.
default.hbs(optional but probably required): The main base template. Think of it as the master "frame" for your site. It typically contains the , , tags, your site-wide header and footer, and the crucial {{ghost_head}} and {{ghost_foot}} helpers. All other templates are injected into the {{{body}}} tag of this file.
index.hbs(required): The main template for listing your posts. It's used for your homepage by default and will also be used for tag and author archives if tag.hbs and author.hbs don't exist. It uses the {{#foreach posts}} helper to loop through and display your articles.
post.hbs(required): The template for a single post. When a visitor clicks on a post title from your index.hbs page, Ghost renders the content using this file. It uses the {{#post}} block helper to access all the post's data (title, content, feature image, etc.).
/partials/ (directory): This folder holds reusable snippets of template code, known as partials. It's perfect for elements that appear on multiple pages, like your site header, footer, sidebar, or a newsletter sign-up form. You include them in other files using {{> filename}}
/assets/ (directory): This is where you store all your static assets. It's organized into sub-folders for your CSS stylesheets, JavaScript files, fonts, and images used in the theme's design. You link to these assets using the {{asset}} helper (e.g., {{asset "css/screen.css"}}).
page.hbs (Optional): A template specifically for static pages (like an "About" or "Contact" page). If this file doesn't exist, Ghost will use post.hbs to render static pages instead.
tag.hbs (Optional): A dedicated template for tag archive pages. When a user clicks on a tag, this template will be used to list all posts with that tag. If it's not present, Ghost falls back to index.hbs.
author.hbs (optional): A dedicated template for author archive pages. This lists all posts by a specific author. If it's not present, Ghost falls back to index.hbs

How It All Fits Together: The Template Hierarchy

Ghost uses a logical hierarchy to decide which template to render for a given URL. This allows you to create specific designs for different parts of your site while having sensible defaults.

The Request: A visitor goes to a URL on your site (e.g., your homepage, a post, or a tag archive).
Context is Key: Ghost determines the "context" of the URL. Is it the homepage? A single post? A list of posts by an author?
Find the Template: Ghost looks for the most specific template file for that context.
- Visiting /tag/travel/? Ghost looks for tag.hbs. If it doesn't find it, it uses index.hbs.
- Visiting a static page like /about/? Ghost looks for page.hbs. If it's not there, it uses post.hbs.
Inject into the Frame: Once the correct template is found (e.g., post.hbs), Ghost renders it and injects the resulting HTML into the {{{body}}} helper inside your default.hbs file.

This system provides a clean separation of concerns, making your theme easy to manage and update. You can start with just the three required files (package.json, index.hbs, post.hbs) and add more specific templates as your design requires them.

Source code for this theme

You are more than welcome to use this theme as a starting point. The only part that was complex was the "Share with Mastodon" button that you see, which frankly I'm still not thrilled with. I wish there was a less annoying way to do it than prompting the user for their server, but I can't think of anything.

Checking your theme

So Ghost actually has an amazing checking tool for seeing if your theme will work available here: https://gscan.ghost.org/. It tells you all the problems and missing pieces from your theme and really helped me iterate quickly on the design. Just zip up the theme, upload it and you'll get back a nicely formatted list of problems.

Anyway I found the process of writing my own theme to be surprisingly fun. Hopefully folks like how it looks, but if you hate it I'm still curious to hear why.

TIL Simple Merge of two CSVs with Python

May 06, 2025 in TIL

I generate a lot of CSVs for my jobs, mostly as a temporary storage mechanism for data. So I make report A about this thing, I make report B for that thing and then I produce some sort of consumable report for the organization at large. Part of this is merging the CSVs so I don't need to overload each scripts to do all the pieces.

For a long time I've done this in Excel/LibreOffice, which totally works. But I recently sat down with the pandas library and I had no idea how easy it is use for this particular use case. Turns out this is a pretty idiot-proof way to do the same thing without needing to deal with the nightmare that is Excel.

Steps to Run

Make sure Python is installed
Run python3.13 -m venv venv
source venv/bin/activate
pip install pandas
Change file_one to the first file you want to consider. Same with file_two
The most important thing to consider here: I only want the output if the value in the column is in BOTH files. If you want all the output from file_one and then enrich it with the values from file_two if it is present, change how='inner' to how='left'

import pandas as pd
import os

# Define the filenames
file_one = 'one.csv'
file_two = 'two.csv'
output_file = 'combined_report.csv'

# Define the column names to use for joining
# These should match the headers in your CSVs exactly
deploy_join_col = 'Deployment Name'
stacks_join_col = 'name'

try:
    # Check if input files exist
    if not os.path.exists(file_one):
        raise FileNotFoundError(f"Input file not found: {file_one}")
    if not os.path.exists(file_two):
        raise FileNotFoundError(f"Input file not found: {file_two}")

    # Read the CSV files into pandas DataFrames
    print(f"Reading {file_one}...")
    df_deploy = pd.read_csv(file_one)
    print(f"Read {len(df_deploy)} rows from {file_one}")

    print(f"Reading {file_two}...")
    df_stacks = pd.read_csv(file_two)
    print(f"Read {len(df_stacks)} rows from {file_two}")

    # --- Data Validation (Optional but Recommended) ---
    if deploy_join_col not in df_deploy.columns:
        raise KeyError(f"Join column '{deploy_join_col}' not found in {file_one}")
    if stacks_join_col not in df_stacks.columns:
        raise KeyError(f"Join column '{stacks_join_col}' not found in {file_two}")
    # ----------------------------------------------------

    # Perform the inner merge based on the specified columns
    # 'how="inner"' ensures only rows with matching keys in BOTH files are included
    # left_on specifies the column from the left DataFrame (df_deploy)
    # right_on specifies the column from the right DataFrame (df_stacks)
    print(f"Merging dataframes on '{deploy_join_col}' (from deployment) and '{stacks_join_col}' (from stacks)...")
    df_combined = pd.merge(
        df_deploy,
        df_stacks,
        left_on=deploy_join_col,
        right_on=stacks_join_col,
        how='inner'
    )
    print(f"Merged dataframes, resulting in {len(df_combined)} combined rows.")

    # Sort the combined data by the join column for grouping
    # You can sort by either join column name as they are identical after the merge
    print(f"Sorting combined data by '{deploy_join_col}'...")
    df_combined = df_combined.sort_values(by=deploy_join_col)
    print("Data sorted.")

    # Write the combined and sorted data to a new CSV file
    # index=False prevents pandas from writing the DataFrame index as a column
    print(f"Writing combined data to {output_file}...")
    df_combined.to_csv(output_file, index=False)
    print(f"Successfully created {output_file}")

except FileNotFoundError as e:
    print(f"Error: {e}")
except KeyError as e:
     print(f"Error: Expected column not found in one of the files. {e}")
     print(f"Please ensure the join columns ('{deploy_join_col}' and '{stacks_join_col}') exist and are spelled correctly in your CSV headers.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Just a super easy to hook up script that has saved me a ton of time from having to muck around with Excel.

GitHub Copilot for Vim Review

May 01, 2025 in AI

The impact of Large Language Models (LLMs) on the field of software development is arguably one of the most debated topics in developer circles today, sparking discussions at meetups, in lunchrooms, and even during casual chats among friends. I won't attempt to settle that debate definitively in this post, largely because I lack the foresight required. My track record for predicting the long-term success or failure of new technologies is, frankly, about as accurate as a coin flip. In fact, if I personally dislike a technology, it seems destined to become an industry standard.

However, I do believe I'm well-positioned to weigh in on a much more specific question: Is GitHub Copilot beneficial for me within my primary work environment, Vim? I've used Vim extensively as my main development tool for well over a decade, spending roughly 4-5 hours in it daily, depending on my meeting schedule. My work within Vim involves a variety of technologies, including significant amounts of Python, Golang, Terraform, and YAML. Therefore, while I can't provide a universal answer to whether an LLM is right for you, I can offer concrete opinions based on my direct experience with GitHub Copilot as a dedicated Vim user today.

Testing

So just to prove I really set it up:

It's a real test, I've been using it every day for this time period. I have it set up in what I believe to be the "default configuration".

The Vim plugin I'm using is the official one located here: https://github.com/github/copilot.vim

How (I think) the plugin works

The plugin uses Vimscript to capture the current state of the editor. That includes stuff like:

The entire content of the current buffer (the file being edited).
The current cursor position within the buffer.
The file type or programming language of the current buffer.

The Node.js language server receives the request from the Vim/Neovim plugin. It processes the provided context and constructs a request to the actual GitHub Copilot API running on GitHub's servers. This request includes the code context and other relevant information needed by the Copilot AI model to generate suggestions.

The plugin receives the suggestions from the language server. It then integrates these suggestions into the Vim or Neovim interface, typically displaying them as "ghost text" inline with the user's code or in a separate completion window, depending on the plugin's configuration and the editor's capabilities.

How it feels to use

As you can tell from the output of vim --startuptime vim.log the plugin is actually pretty performant and doesn't add a notable time to my initial launch.

In terms of the normal usage, it works like it says on the box. You start typing and it shows the next line it thinks you might be writing.

The suggestions don't do much on their own. Basically the tool isn't smart enough to even keep track of what it has already suggested. So in this case I've just tab completed and taken all the suggestions and you can tell it immediately gets stuck in a loop.

Now you can use it to "vibe code" inside of Vim. That works by writing a comment describing what you want to do and then just tab accepting the whole block of code. So for example I wrote Write a new function to check if the JWT is encrypted or not. It produced the following.

So I made a somewhat misleading comment on purpose. I was trying to get it to write a function to see if a JWT was actually a JWE. Now this python code is (obviously) wrong. The code is_jwt_encrypted assumes the token will always have exactly three parts separated by dots (header, payload, signature). This is the structure of a standard JSON Web Token (JWT). However, a JSON Web Encryption (JWE), which is what a wrapped encrypted JWT is, has five parts:

Protected Header
Encrypted Key
Initialization Vector
Ciphertext
Authentication Tag

So this gives you a rough idea of the quality of the code snippets it produces. If you are writing something dead simple, the autogenerate will often work and can save you time. However go even a little bit off the golden path and, while Copilot will always give it a shot, the quality is all over the place.

Scores Based on Common Tasks

Reviewing a product like this is extremely hard because it does everything all the time and changes daily with no notice. I've had weeks where it seems like the Copilot intelligence gets cranked way up and weeks where its completely brain dead. However I will go through some common tasks I have to do all the time and rank it on how well it does.

Parsing JSON

90/100

This is probably the thing Copilot is best at. You have a JSON that you are getting from some API and then Copilot helps you fill in the parsing for that so you don't need to type the whole thing out. So just by filling in my imports it already has a good idea of what I'm thinking about here.

So in this example I write the comment with the example JSON object and then it fills in the rest. This code is....ok. However I'd like it to probably check the json_data to see if it matches the expectation before it parses. Changing the comment however changes the code.

This is very useful for me as someone who often needs to consume JSONs from source A and then send JSONs on to target B. Saves a lot of time and I think the quality looks totally acceptable to me. Some notes though:

Python Types greatly improve the quality of the suggestions
You need to check to make sure it doesn't truncate the list. Sometimes Copilot will "give up" like 80% through writing out all the items. It doesn't often make up ones, which is nice, but you do need to make sure everything you expected to be there ends up getting listed.

Database Operations

40/100

I work a lot with databases, like everyone on Earth does. Copilot definitely understands the concepts of databases but your experience can vary wildly depending on what you write and the mood it is in.

I mean this is sort of borderline making fun of me. Obviously I don't want to just check if the file named that exists?

This is better but it's still not good. If there is a file sitting there with the right name that isn't a database, sqlite3.connect will just make it. The except sqlite3.Error part is super shitty. Obviously that's not what I want to do. I probably want to at least log something?

Let me show another example. I wrote Write a method to create a table in the SQLite database if it does not already exist with the specified schema. Then I typed user_ID UUID and let it fill in the rest.

Not great. What it ended up making was even worse.

We're missing error handling, no try/finally blocks with the connection cursor, etc. This is pretty shitty code. My experience is it doesn't get much better the more you use. Some tips:

If you write out the SQL in the comments then you will have a way better time.

CREATE TABLE users (
	contact_id INTEGER PRIMARY KEY,
	first_name TEXT NOT NULL,
	last_name TEXT NOT NULL,
	email TEXT NOT NULL UNIQUE,
	phone TEXT NOT NULL UNIQUE
);

Just that alone seems to make it a lot happier.

Still not amazing but at least closer to correct.

Writing Terraform

70/100

Not much to report with Terraform.

So why the 70/100? I've had a lot of frustrations with Copilot hallucinations with Terraform where it will simply insert arguments that don't exist. I can't reliably reproduce it, but this is something that can really burn a lot of time when you hit it.

My advice with Terraform is to run something like terrascan after which will often catch weird stuff it inserts. https://github.com/tenable/terrascan

However I will admit it saves me a lot of time, especially when writing stuff that is mind-numbing like 1000 DNS entries. So easily worth the risk on this one.

Tips:

Make sure you use the let g:copilot_workspace_folders = ['/path/to/my/project1', '/path/to/another/project2']
That seems to ground the LLM with the rest of the code and allows it to detect things like "what is the cloud account you are using".

Writing Golang

0/100

This is a good summary of my experience with Copilot with Golang.

I don't know why. It will work fine for awhile and then at some point, roughly when the golang file hits around 300-400 lines, seems to just lose it. Maybe there's another plugin I have that's causing a problem with Copilot and Golang, maybe I'm holding it wrong, I have no idea.

There's nothing in the logs I can find that would explain why it seems to break on Golang. I'm not going to file a bug report because I don't consider this my job to fix.

Summary

Is Copilot worth $10 a month? I think that really depends on what your day looks like. If you are someone who is:

Writing microservices where the total LoC rarely exceeds 1000 per microservice
Spends a lot of your time consuming and producing JSONs for other services to receive
Are capable of checking SQL queries and confirming how they need to be fixed
Has good or great test coverage

Then I think this tool might be worth the money. However if your day looks like this:

Spends most of your day inside of a monolith or large codebase carefully adding new features or slightly modifying old features
Doesn't have any or good test coverage
Doesn't have a good database migration strategy.

I'd say stay far away from Copilot for Vim. It's going to end up causing you serious problems that are going to be hard to catch.