Why Kubernetes needs an LTS

December 04, 2023

There is no denying that containers have taken over the mindset of most modern teams. With containers, comes the need to have orchestration to run those containers and currently there is no real alternative to Kubernetes. Love it or hate it, it has become the standard platform we have largely adopted as an industry. If you exceed the size of docker-compose, k8s is the next step in that journey.

Despite the complexity and some of the hiccups around deploying, most organizations that use k8s that I've worked with seem to have positive feelings about it. It is reliable and the depth and width of the community support means you are never the first to encounter a problem. However Kubernetes is not a slow-moving target by infrastructure standards.

Kubernetes follows an N-2 support policy (meaning that the 3 most recent minor versions receive security and bug fixes) along with a 15-week release cycle. This results in a release being supported for 14 months (12 months of support and 2 months of upgrade period). If we compare that to Debian, the OS project a lot of organizations base their support cycles on, we can see the immediate difference.

Red Hat, whose entire existence is based on organizations not being able to upgrade often, shows you at what cadence some orgs can roll out large changes.

Now if Kubernetes adopted this cycle across OSS and cloud providers, I would say "there is solid evidence that it can be done and these clusters can be kept up to date". However cloud providers don't hold their customers to these extremely tight time windows. GCP, who has access to many of the Kubernetes maintainers and works extremely closely with the project, doesn't hold customers to anywhere near these timelines.

Neither does AWS or Azure. The reality is that nobody expects companies to keep pace with that cadence of releases because the tooling to do so doesn't really exist. Validating that a cluster can be upgraded and that it is safe to do so requires the use of third-party tooling or to have a pretty good understanding of what APIs are getting deprecated when. Add in time for validating in staging environments along with the sheer time involved in babysitting a Kubernetes cluster upgrade and a clear problem emerges.

What does upgrading a k8s cluster even look like?

For those unaware of what a manual upgrade looks like, this is the rough checklist.

Check all third-party extensions such as network and storage plugins
Update etcd (all instances)
Update kube-apiserver (all control plane hosts)
Update kube-controller-manager
Update kube-scheduler
Update the cloud controller manager, if you use one
Update kubectl
Drain every node and either replace the node or upgrade the node and then readd and monitor to ensure it continues to work
Run kubectl convert as required on manifests

None of this is rocket science and all of it can be automated, but it still requires someone to effectively be super on top of these releases. Most importantly it is not substantially easier than making a new cluster. If upgrading is, at best, slightly easier than making a new cluster and often quite a bit harder, teams can get stuck unsure what is the correct course of action. However given the aggressive pace of releases, spinning up a new cluster for every new version and migrating services over to it can be really logistically challenging.

Consider that you don't want to be on the .0 of a k8s release, typically .2. You lose a fair amount of your 14 month window waiting for that criteria. Then you spin up the new cluster and start migrating services over to it. For most teams this involves a fair amount of duplication and wasted resources, since you will likely have double the number of nodes running for at least some period in there. CI/CD pipelines need to get modified, docs need to get changed, DNS entries have to get swapped.

None of this is impossible stuff, or even terribly difficult stuff, but it is critical and even with automation the risk of one of these steps failing silently is high risk enough that few folks I know would fire and forget. Instead clusters seem to be in a state of constant falling behind unless the teams are empowered to make keeping up with upgrades a key value they bring to the org.

My experience with this has been extremely bad, often joining teams where a cluster has been left to languish for too long and now we're running into concerns over whether it can be safely upgraded at all. Typically my first three months running an old cluster is telling leadership I need to blow our budget out a bit to spin up a new cluster and cut over to it namespace by namespace. It's not the most gentle onboarding process.

Proposed LTS

I'm not suggesting that the k8s maintainers attempt to keep versions around forever. Their pace of innovation and adding new features is a key reason the platform has thrived. What I'm suggesting is a dead-end LTS with no upgrade path out of it. GKE allowed customers to be on 1.24 for 584 days and 1.26 for 572 days. Azure has a more generous LTS date of 2 years from the GA date and EKS from AWS is sitting at around 800 days that a version is supported from launch to end of LTS.

These are more in line with the pace of upgrades that organizations can safely plan for. I would propose an LTS release with a 24 months of support from GA and an understanding that the Kubernetes team can't offer an upgrade to the next LTS. The proposed workflow for operations teams would be clusters that live for 24 months and then organizations need to migrate off of them and create a new cluster.

This workflow makes sense for a lot of reasons. First creating fresh new nodes at regular intervals is best practice, allowing organizations to upgrade the underlying linux OS and hypervisor upgrades. While you should obviously be upgrading more often than every 2 years, this would be a good check-in point. It also means teams take a look at the entire stack, starting with a fresh ETCD, new versions of Ingress controllers, all the critical parts that organizations might be loathe to poke unless absolutely necessary.

I also suspect that the community would come in and offer a ton of guidance on how to upgrade from LTS to LTS, since this is a good injection point for either a commercial product or an OSS tool to assist with the process. But this wouldn't bind the maintainers to such a project, which I think is critical both for pace of innovation and just complexity. K8s is a complicated collection of software with a lot of moving pieces and testing it as-is already reaches a scale most people won't need to think about in their entire careers. I don't think its fair to put this on that same group of maintainers.

LTS WG

The k8s team is reviving the LTS workgroup, which was disbanded previously. I'm cautiously optimistic that this group will have more success and I hope that they can do something to make a happier middle ground between hosted platform and OSS stack. I haven't seen much from that group yet (the mailing list is empty: https://groups.google.com/a/kubernetes.io/g/wg-lts) and the Slack seems pretty dead as well. However I'll attempt to follow along with them as they discuss the suggestion and update if there is any movement.

I really hope the team seriously considers something like this. It would be a massive benefit to operators of k8s around the world to not have to be in a state of constantly upgrading existing clusters. It would simplify the third-party ecosystem as well, allowing for easier validation against a known-stable target that will be around for a little while. It also encourages better workflows from cluster operators, pushing them towards the correct answer of getting in the habit of making new clusters at regular intervals vs keeping clusters around forever.

AI is Already Killing Books

November 24, 2023

I love reading. It is the thing on this earth that brings me the most joy. I attribute no small part of who I am and how I think to the authors who I have encountered in my life. The speed by which LLMs are destroying this ecosystem is a tragedy that we're not going to understand for a generation. We keep talking about it as an optimization, like writing is a factory and books are the products that fly off the line. I think it's a tragedy that will cause people to give up on the idea of writing as a career, closing off a vital avenue for human expression and communication.

Books, like everything else, has evolved in the face of the internet. For a long time publishers were the ultimate gatekeepers and authors tried to eek out an existence by submitting to anyone who would read their stuff. Most books were commercial failures, but some became massive hits. Then eBooks came out and suddenly authors could bypass the publishers and editors of the world to get directly to readers. This was promised to unleash a wave of quality the world had never seen before.

In practice, to put it kindly, eBooks are a mixed success. Some authors benefited greatly from the situation, able to establish strong followings and keep a much higher percentage of their revenue than they would with a conventional publisher. Most released a book, nobody ever read it and that was it. However there was a medium success, where authors could find a niche and generate a pretty reliable stream of income. Not giant numbers, but even 100 copies sold or borrowed under Kindle Unlimited a month spread out across enough titles can let you survive.

AI-written text is quickly filling these niches, since scammers are able to identity lucrative subsections where it might not be worth a year of a persons life to try and write a book this audience will like, but having a machine generate a book and throw it up there is incredibly cheap. I'm seeing them more and more, these free on Kindle Unlimited books with incredibly specific topics that seem tailored towards getting recommended to users in sub-genres.

There is no feeling of betrayal like thinking you are about to read something that another person slaved over, only to discover you've been tricked. They had an idea, maybe even a good idea and instead of putting in the work and actually sitting there crafting something worth my precious hours on this Earth to read, they wasted my time with LLM dribble. Those too formal, politically neutral, long-winded paragraphs stare back at me as the ultimate indictment of how little of a shit the person who "wrote this" cared about my experience reading it. It's like getting served a microwave dinner at a sit down restaurant.

Maybe you don't believe me, or see the problem. Let me at least try to explain why this matters. Why the relationship between author and reader is important and requires mutual respect. Finally why this destruction is going to matter in the decades to come.

TLDR

Since I know a lot of people aren't gonna read the whole thing (which is fine), let me just bulletpoint my responses to anticipated objections addressed later.

LLMs will let people who couldn't write books before do it. That isn't a perk. Part of the reason people invest so many hours into reading is because we know the author invested far more in writing. The sea of unread, maybe great books, was already huge. This is expanding the problem and breaking the relationship of trust between author and reader.
It's not different from spellcheck or grammar check. It is though and you know that. Those tools made complex lookups easier against a large collection of rules, this is generating whole blobs of text. Don't be obtuse.
They let me get my words down with less work. There is a key thing about any creative area but especially in writing that people forget. Good writing kills its darlings. If you don't care enough about a section to write it, then I don't care enough to read it. Save us both time and just cut it.
Your blog is very verbose. I never said I was a good writer.
The market will fix the problem. The book market relies on a vast army of unpaid volunteers to effectively sacrifice their time and wade through a sea of trash to find the gems. Throwing more books at them just means more gems get lost. Like any volunteer effort, the pool of people doesn't grow at the same rate as the problem.
How big of a problem is it? Considering how often I'm seeing them, it feels big, but it is hard to calculate a number. It isn't just me link

Why Does It Matter?

Allow me to veer into my personal background to provide context on why I care. I grew up in small towns across rural Ohio, places where the people who lived there either had no choice but to stay or chose to stay because of the simple lifestyle and absolute consensus on American Christian values. We said the Pledge of Allegiance aggressively, we all went to church on Sunday, gay people didn't exist and the only non-white people we saw were the migrant farm workers who we all pretended didn't exist living in the trailers around farms surrounding the town. As a kid it was fine, children are neither celebrated or hated in this culture, instead we were mostly left alone.

There is a violent edge to these places that people don't see right away. You aren't encouraged to ask a lot of questions about the world around you. We were constantly flooded with religious messaging, at school, home, church, church camp, weekly classes at night or bible studies, movies and television that was specifically encouraged because they had a religious element. Anything outside of this realm was met with a chilly reaction from most adults, if not outright threats of violence. My parents didn't hit me, but I was very much in the minority of my group. More than once we turned the sound up on a videogame or tv to drown out the sobs of a child being struck with a hand or belt while we were at a friends house.

Small town opinion turns on a dime and around 4th grade it turned on me. Everyone knows your status because there aren't a lot of people so I couldn't just go hang out in a new neighborhood. Suddenly I had a lot of alone time, which I filled with reading. These books didn't just fill time, they made me invisible. I had something to do during lunch, recess, whenever. Soon I had consumed everything within the children's section of the library I was interested in reading and graduated to the adult section.

Adult Section

Bryan Main Library | Williams County Public Library — Not a terribly impressive building that I spent a lot of time in.

I was fortunate enough not to grow up today, where this loneliness and anger might have found an online community. They would reinforce my feelings, confirming that I was in the right and everyone else was in the wrong. If they rejected me, I would have wandered until I found another group. The power of the internet is the ability to self-select for your level of depravity.

Instead, wandering the poorly lit stacks of the only library in town, I came across a book that child me couldn't walk pass. A heavy tomb that seemed to contain exactly the sort of cursed knowledge that had been kept from me my entire life. The Book of the Dead.

The Ancient Egyptian Book Of The Dead by University of Texas Press & Faulkner

The version I read was an old hardcover, tucked away in a corner with a title that was too good to pass up. A book about other religions, old religions? From a Muslim country? I knew I couldn't take it home. If anyone saw me with this it would raise a lot of questions I couldn't answer. Instead I struggled through it sitting at the long wooden tables after school and on the weekends, trying to make sense of what was happening.

The text (for those that are curious: https://www.ucl.ac.uk/museums-static/digitalegypt/literature/religious/bdbynumber.html) is dense and hard to read. It took me forever to get through it, missing a lot of the meaning. I would spend days sitting there writing in my little composition notebook, looking up words and trying to parse hard to read sentences. The Book of the Dead is about two hundred "spells" or maybe "chants" would be a better way to describe it that basically take someone through the process of death. From preservation to the afterlife and finally to judgement, the soul was escorted through the process and each part was touched upon.

The part that blew my mind was the Hymn to Osiris

"(1) Hail to thee, Osiris, lord of eternity, king of the gods, thou who hast many names, thou disposer of created things, thou who hast hidden forms in the temples, thou sacred one, thou KA who dwellest in Tattu, thou mighty (2) one in Sekhem, thou lord to whom invocations are made in Anti, thou who art over the offerings in Annu, thou lord who makest inquisition in two-fold right and truth, thou hidden soul, the lord of Qerert, thou who disposest affairs in the city of the White Wall, thou soul of Ra, thou very body of Ra who restest in (3) Suten-henen, thou to whom adorations are made in the region of Nart, thou who makest the soul to rise, thou lord of the Great House in Khemennu, thou mighty of terror in Shas-hetep, thou lord of eternity, thou chief of Abtu, thou who sittest upon thy throne in Ta-tchesert, thou whose name is established in the mouths of (4) men, thou unformed matter of the world, thou god Tum, thou who providest with food the ka's who are with the company of the gods, thou perfect khu among khu's, thou provider of the waters of Nu, thou giver of the wind, thou producer of the wind of the evening from thy nostrils for the satisfaction of thy heart. Thou makest (5) plants to grow at thy desire, thou givest birth to . . . . . . . ; to thee are obedient the stars in the heights, and thou openest the mighty gates. Thou art the lord to whom hymns of praise are sung in the southern heaven, and unto thee are adorations paid in the northern heaven. The never setting stars (6) are before thy face, and they are thy thrones, even as also are those that never rest. An offering cometh to thee by the command of Seb. The company of the gods adoreth thee, the stars of the tuat bow to the earth in adoration before thee, [all] domains pay homage to thee, and the ends of the earth offer entreaty and supplication. When those who are among the holy ones (7) see thee they tremble at thee, and the whole world giveth praise unto thee when it meeteth thy majesty. Thou art a glorious sahu among the sahu's, upon thee hath dignity been conferred, thy dominion is eternal, O thou beautiful Form of the company of the gods; thou gracious one who art beloved by him that (8) seeth thee. Thou settest thy fear in all the world, and through love for thee all proclaim thy name before that of all other gods. Unto thee are offerings made by all mankind, O thou lord to whom commemorations are made, both in heaven and in earth. Many are the shouts of joy that rise to thee at the Uak[*] festival, and cries of delight ascend to thee from the (9) whole world with one voice. Thou art the chief and prince of thy brethren, thou art the prince of the company of the gods, thou stablishest right and truth everywhere, thou placest thy son upon thy throne, thou art the object of praise of thy father Seb, and of the love of thy mother Nut. Thou art exceeding mighty, thou overthrowest those who oppose thee, thou art mighty of hand, and thou slaughterest thine (10) enemy. Thou settest thy fear in thy foe, thou removest his boundaries, thy heart is fixed, and thy feet are watchful. Thou art the heir of Seb and the sovereign of all the earth;

To a child raised in a heavily Christian environment, this isn't just close to biblical writing, it's the same. The whole world praises and worships him with a father and mother and woe to his foes who challenge him? I had assumed all of this was unique to Christianity. I knew there had been other religions but I didn't know they were saying the exact same things.

As important as the text is the surrounding context the academic sources put the text in. An expert walks me through how translations work, the source of the material, how our understanding has changed over time. As a kid drawn in by a cool title, I'm learning a lot about how to intake information. I'm learning real history has citations, explanations, debates, ambiguity. Real academic writing has a style, which when I stumble across the metaphysical Egyptian magic nonsense makes it easy to spot.

The reason this book mattered is the expert human commentary. The words themselves with some basic context wouldn't have meant anything. It's by understand the amount of work that went into this translation, what it means, what it also could mean, that the importance sets in. That's the human element which creates all the value. You aren't reading old words, you are being taken on a guided tour by someone who has lived with this text for a long time and knows it up and down.

I quickly expanded, growing from this historical text to a wide range of topics. I quickly find there is someone there to meet me at every stage of life. When I'm lonely or angry as a teenager I find those authors and stories that speak to that, put those feelings into a context and bigger picture. This isn't a new experience, people have felt this way going back to the very beginning. So much of the value isn't just the words, it's the sense of a relationship between me and the author. When you encounter this in fiction or in historical text, you come to understand as overwhelming as it feels in that second it is part of being a human being. This person experienced it and lived, you will too.

You also get to experience emotions that you may never experience. A Passage to India was a book I enjoyed a lot as a teen, even though it is about the story of two British women touring around India and clashing with the colonial realities of British history. I know nothing about British culture, the 1920s, all of this is as alien to me as anything else. It's fiction but with so much historical backing you still feel like you are seeing something different, something new.

That's a powerful part of why books work. Even if you the author are just imagining those scenarios, real life bleeds in. You can make text that reads like A Farewell to Arms, but you would miss the point if you did. It's more interesting and more powerful because its Hemingway basically recanting his wartime experience through his characters (obviously pumping up the manliness as he goes). It is when writers draw on their personal lives that it hits hardest.

Instead of finding a community that reinforced how alone and sad I was in that moment, I found evidence it didn't matter. People had survived far worse and ultimately turned out to be fine. You can't read about the complex relationship of fear and respect Egyptians had with the Nile, where too little water was dead and too much was also death, then endlessly fixate on your own problems. Humanity is capable of adaptation and the promise is, so are you.

Why AI Threatens Books

As readers get older and they spend a few decades going through books, they discover authors they like and more importantly styles they like. However you also like to see some experimentation in the craft, maybe with some rough edges. To me it's like people who listen to concert recordings instead of the studio album. Maybe it's a little rougher but there is also genius there from time to time. eBooks quickly became where you found the indie gems that would later get snapped up by publishers.

The key difference between physical and eBooks is bookstores and libraries are curated. They'll stock the shelves with things they like and things that will sell. Indie bookstores tend to veer a little more towards things they like, but in general it's not hard to tell the difference between the stack of books the staff loves and the ones they think the general population will buy. However each one had to get read by a person. That is the key difference between music or film and books.

A music reviewer needs to invest between 30-60 minutes to listen to an album. A movie reviewer somewhere between 1-3 hours. An owner of a bookstore in Chicago broke down his experience pretty well:

Average person: 4 books a year if they read at all

Readers (people who consider it a hobby): 30-50 books a year

Super readers: 80 books

80 books is not a lot of books. Adult novels clock in at about 90,000 words, 200-300 words per minute reading speed, 7-8 hours to get through a book. To combat this discrepancy websites like Goodreads were popularized because frankly you cannot invest 8 hours of your life in shitty eBooks very often. At the very least your investment should hopefully scare off others considering doing the same (or at least they can make an informed choice).

The ebook market also started to not be somewhere you wanted to wade in randomly due to the spike in New Age nonsense writing and openly racist or sexist titles. This book below was found by searching the term "war" and going to the second page. As a kid I would have had to send a money order to the KKK to get my hands on a book like this, but now it's in my virtual bookstore next to everything else. Since Amazon, despite their wealth and power, has no interest in policing their content, you are forced to solve the problem through community effort.

https://www.amazon.com/s?k=WAR&i=stripbooks-intl-ship&page=2&crid=2M4G32CTJBZ6V&qid=1700229573&sprefix=war%2Cstripbooks-intl-ship%2C248&ref=sr_pg_2https://www.amazon.com/s?k=WAR&i=stripbooks-intl-ship&page=2&crid=2M4G32CTJBZ6V&qid=1700229573&sprefix=war%2Cstripbooks-intl-ship%2C248&ref=sr_pg_2

The reason why AI books are so devastating to this ecosystem should be obvious, but let's lay it out. It breaks the emotional connection between reader and writer and creates a sense of paranoia. Is this real or fake? In order to discover it, someone else needs to invest a full work day into reading it to figure out. Then you need to join a community with enough trusted reviewers willing to donate their time for free to tell you whether the book is good or bad. Finally you need to hope that you are a member of the right book reading community to discover the review.

So if we were barely surviving the flood of eBooks and missing tons and tons of good books, the last thing we needed was someone to crank up the volume of books shooting out into the marketplace. The chances that one of the sacred reviewers even finds a new authors book decreases, so the community won't find it and the author will see that they have no audience and will either stop writing or will ensure they don't write another book like the first book. The feedback loop, which was already breaking under the load, completely collapses.

Now that AI books exist, the probability that I will ever blind purchase another eBook on Amazon from an unknown author drops to zero. Now more than ever I entirely rely on the reviews of others. Before I might have wandered through the virtual stacks, but no more. I'm not alone in this assessment, friends and family have reported the same feeling, even if they haven't themselves been burned by an AI book they knew about.

AI books solve a problem that didn't exist, which is this presumption by tech people that what we needed was more people writing books. Instead, like so many technical solutions to problems that the architects never took any time to understand, the result doesn't help smaller players. It places all the power back into publishers and the small cadre of super reviewers since they're willing to invest the time to check for at least some low benchmark of quality.

The sad part is this is unstoppable. eBooks are too easy to make with LLMs and no reliable detection systems exist to screen them before they're uploaded to the market. Amazon has no interest in setting realistic limits to how many books users can upload to the Kindle Store, still letting people upload a laughable three books a day. Google Play Store seems to have no limit, same with Apple Books. It's depressing that another market will become so crowded with trash, but nobody in a position to change it seems to care.

The Future

So where does that leave us? Well kind of back to where we started. If you are excellent at marketing and can get the name of your eBook out there, then people can go directly to it. But similar to how the App Store and Play Store are ruined for new app discoverability, it's a lopsided system which favors existing players and stacks the deck against anyone new. Publishers will still be able to get the authors to do the free market research through the eBook market and then snap up proven winners.

Since readers pay the price for this system by investing money and time into fake books, it both increases the amount of terrible out there and further incentives the push down in eBook price. If there are 600,000 "free" eBooks on Kindle Unlimited and you are trying to complete with a book that took a fraction of the time to produce, you are going to struggle to justify more than the $1.99-$2.99 price point. So not only are you selling a year (or years) of your life for the cost of a large soda, the probability of someone organically finding your book went from "bad" to "grain of sand in the ocean".

Even if there are laws, there is no chance they'll be able to make a meaningful difference unless the laws mandate that AI produced text is watermarked in some distinct way that everyone will immediately remove. So what becomes a "hard but possible" dream turns into a "attempting to become a professional athlete" level of statistical improbability. The end result will be fewer people trying so we get less good stories and instead just endlessly retread the writing of the past.

Help Everyone Do Better Security

October 27, 2023

One interesting thing about the contrast between infrastructure and security is the expectation of open-source software. When a common problem arises we all experience, a company will launch a product to solve this problem. In infrastructure, typically the core tool is open-source and free to use, with some value-add services or hosting put behind licensing and paid support contracts. On the security side, the expectation seems to be that the base technology will be open-source but any refinement is not. If I find a great tool to manage SSH certificates, I have to pay for it and I can't see how it works. If I rely on a company to handle my login, I can ask for their security audits (sometimes) but the actual nuts and bolts of "how they solved this problem" is obscured from me.

Instead of "building on the shoulders of giants", it's more like "You've never made a car before. So you make your first car, load it full of passengers, send it down the road until it hits a pothole and detonates." Then someone will wander by and explain how what you did was wrong. People working on their first car to send down the road become scared because they have another example of how to make the car incorrectly, but are not that much closer to a correct one given the nearly endless complexity. They may have many examples of "car" but they don't know if this blueprint is a good car or a bad car (or an old car that was good and is now bad).

In order to be good at security, one has to see good security first. I can understand in the abstract how SSH certificates should work, but to implement it I would have to go through the work of someone with a deep understanding of the problem to grasp the specifics. I may understand in the abstract how OAuth works, but the low level "how do I get this value/store it correctly/validate it correctly" is different. You can tell me until you are blue in the face how to do logins wrong, but I have very few criteria by which I can tell if I am doing it right.

To be clear there is no shortage of PDFs and checklists telling me how my security should look at an abstract level. Good developers will look at those checklists, look at their code, squint and say "yeah I think that makes sense". They don't necessarily have the mindset of "how do I think like someone attempting to break this code", in part because they may have no idea how the code works. Their code presents the user a screen, they receive a token, that token is used for other things and they got an email address in the process. The massive number of moving parts they just used is obscured from them, code they'll never see.

Just to do session cookies correctly, you need to know about and check the following things:

Is the expiration good and are you checking it on the server?
Have you checked that you never send the Cookie header back to the client and break the security model? Can you write a test for this? How time consuming will that test be?
Have you set the Secure flag? Did you set the SameSite flag? Can you use the HttpOnly flag? Did you set it?
Did you scope the domain and path?
Did you write checks to ensure you aren't logging or storing the cookies wrong?

That is so many places to get just one thing wrong.

We have to come up with a better way of throwing flares up in peoples way. More aggressive deprecation, more frequent spec bumps, some way of communicating to people "the way you have done things is legacy and you should look at something else". On the other side we need a way to say "this is a good way to do it" and "that is a bad way to do it" with code I can see. Pen-testing, scanners, these are all fine, but without some concept of "blessed good examples" it can feel like patching a ship in the dark. I closed that hole, but I don't know how many more there are until a tool or attacker finds it.

I'm gonna go through four examples of critical load-bearing security-related tooling or technology that is set up wrong by default or very difficult to do correctly. This is stuff everyone gets nervous when they touch because it doesn't help you set it up right. If we want people to do this stuff right, the spec needs to be more opinionated about right and wrong and we need to show people what right looks like on a code level.

SSH Keys

This entire field of modern programming is build on the back of SSH keys. Starting in 1995 and continuing now with OpenSSH, the protocol uses an asymmetric encryption process with the Diffie-Hellman (DH) key exchange algorithm to form a shared secret key for the SSH connection. SFTP, deploying code from CI/CD systems, accessing servers, using git, all of this happens largely on the back of SSH keys. Now you might be thinking "wait, SSH keys are great".

At a small scale SSH is easy and effortless. ssh-keygen -t rsa, select where to store it and if you want a passphrase. ssh-copy-id username@remoteserverip to move it to the remote box, assuming you set up the remote box with cloud-init or ansible or whatever. At the end of every ssh tutorial there is a paragraph that reads something like the following: "please ensure you rotate, audit and check all SSH keys for permissions". This is where things get impossible.

SSH keys don't help administrators do the right thing. Here's all the things I don't know about the SSH key I would need to know to do it correctly:

When was the key made? Is this a new SSH key or are they reusing a personal one or one from another job? I have no idea.
Was this key secured with a passhrase? Again like such a basic thing, can I ensure all the keys on my server were set up with a passphrase? Like just include some flag on the public key that says "yeah the private key has a passphrase". I understand you could fake it but the massive gain in security for everyone outweighs the possibility that someone manipulates a public key to say "this has a passphrase".
Expiration. I need a value that I can statelessly query to say "is this public key expired or not" and also to check when enrolling public keys "does this key live too long".

This isn't just a "what-if" conversation. I've seen this and I bet you have too, or would if you looked at your servers.

Many keys on servers are unused and represent access that was never properly terminated or shouldn't have been granted. I find across most jobs it's like 10% of the keys that ever get used.
Nobody knows who has the corresponding private keys. We understand the user who made them, but we don't know where they are now.
Alright so we use certificates! Well except they're special to OpenSSH, make auditing SSH key based access impossible since you don't know what keys the server will accept by looking at it and all the granting and revoking tooling is on you to build.

OpenSSH Certificates solves almost all these problems. You get expiration, limiting commands, limit IP address etc. It's a step forward but we're not using them in small and medium orgs due to the complexity of setup and we need to port some of these security concerns down the chain. It's exactly what I was talking about in the beginning. The default experience is terrible because backwards compatibility and for those 1% who know of the existence of SSH Certificates and can operationally support the creation of this mission-critical tooling, they reap the benefits.

So sure if I set up all of the infrastructure to do all the pieces, I can enforce ssh key rotation. I'll check the public key into object storage, sync it with all my servers, check the date the key was entered and remove it after a certain date. But seriously? We can't make a new version of the SSH key with some metadata? The entire internet operates off SSH keys and they're a half-done idea, fixed through the addition of certificates nobody uses cause writing the tooling to handle the user certificate process is a major project where if you break it, you can't get into the box.

This is a crazy state of affairs. We know SSH keys live in infrastructure forever, we know they're used for way too long all over the place and we know the only way to enforce rotation patterns is through the use of expiration. We also know that passphrases are absolutely essential for the use of keys. Effectively to use SSH keys you need to stick a PAM in there to enforce 2FA like libpam-google-authenticator. BTW, talking about "critical infrastructure not getting a ton of time", this is the repo of the package every tutorial recommends. Maybe nothing substantial has happened in 3 years but feels a little unlikely.

Mobile Device Management/Device Scanning/Network MITM Scanning

Nothing screams "security theater" to me like the absolutely excessive MDM that has come to plague major companies. I have had the "joy" of working for 3 large companies that went all-in on this stuff and each time have gotten the pleasure of rip your hair out levels of frustration. I'm not an admin on my laptop, so now someone who has no idea what my job is or what I need to do it gets to decide what software I get to install. All my network traffic gets scanned, so forget privacy on the device. At random intervals my laptop becomes unusable because every file on the device needs to "get scanned" for something.

Now in theory the way this stuff is supposed to work is a back and forth between security, IT and the users. In practice it's a one-way street that once the stupid shit gets bought and turned on, it never gets turned off. All of the organizational incentives are there to keep piling this worthless crap on previously functional machines and then almost dare the employee to get any actual work done. It just doesn't make any sense to take this heavy of a hand with this stuff.

What about stuff exploiting employee devices?

I mean if you have a well-researched paper which shows that this stuff actually makes a difference, I'd love to see it. Mostly it seems from my reading like vendors repeating sales talking points to IT departments until they accept it as gospel truth, mixed with various audits requiring the tooling be on. Also we know from recent security exploits that social engineering against IT Helpdesk is a new strategy that is paying off, so assuming your IT pros will catch the problems that normal users won't is clearly a flawed strategy.

The current design is so user-hostile and so aggressively invasive that there is just no way to think of it other than "my employer thinks I'm an idiot". So often in these companies you are told the strategies to work around stuff. I once worked with a team where everybody used a decommissioned desktop tucked away in a closet connected to an Ethernet port with normal internet access to do actual work. They were SSHing into it from their locked-down work computers because they didn't have to open a ticket every time they needed to do everything and hid the desktops existence from IT.

I'm not blaming the people turning it on

The incentives here are all wrong. There's no reward in security for not turning on the annoying or invasive feature so rank and file people are happy. On the off chance that is the vector by which you are attacked, you will be held responsible for that decision. So why not turn it all on? I totally understand it, especially when we all know every company has a VIP list of people for whom this shit isn't turned on, so the people who make the decisions about this aren't actually bearing the cost of it being on.

"Don't use your work laptop for personal stuff": hey before you hit me up with this gem, save it. I spend too many hours of my life at work to never have the two overlap. I need to write emails, look up stuff, schedule appointments, so just take this horrible know-it-all attitude and throw it away. People use work devices for personal stuff and telling them not to is a waste of oxygen.

JWTs

You have users and you have services. The users need to access the things they are allowed to access, the services need to be able to talk to each other and share information in a way where you know the information wasn't tampered with. It's JSON, but special limited edition JSON. You have a header, which says what it is (a JWT) and the signing algorithm being used.

{
  "alg": "HS256",
  "typ": "JWT"
}

You have a payload with claims. There are predefined (still optional) claims and then public and private claims. So here are some common ones:

"iss" (Issuer) Claim: identifies the principal that issued the JWT
"sub" (Subject) Claim: The "sub" (subject) claim identifies the principal that is the subject of the JWT.

You can see them all here. The diagram below shows the design source

Seems great. What's the problem?

See that middle part where both things need access to the same secret key? That's the problem. The service that makes the JWT and the service that verifies the JWT are both reading and using the same key, so there's nothing stopping me from making my own JWT with new insane permissions on application 2 and having it get verified. That's only the beginning of the issues with JWTs. This isn't called out to people, so when you are dealing with micro-services or multiple APIs where you pass around JWTs, often there is an assumption of security where one doesn't exist.

Asymmetric JWT implementations exist and work well, but so often people do not think about it or realize such an option exists. There is no reason to keep on-boarding people with this default dangerous design assuming they will "figure out" the correct way to do things later. We see this all over the place with JWTs though.

Looking at the alg claim in the header and using it rather than hardcoding the algorithm that your application uses. Easy mistake to make, I've seen it a lot.
Encryption vs signatures. So often with JWTs people think the payload is encrypted. Can we warn them to use JWEs? This is such a common misunderstanding among people starting with JWTs it seems insane to me to not warn people somehow.
Should I use a JWT? Or a JWE? Should I sign AND encrypt the thing where the JWS (the signed version of the JWT) is the encrypted payload of the JWE? Are normal people supposed to make this decision?
Who in the hell said none should be a supported algorithm? Are you drunk? Just don't let me use a bad one. ("Well it is the right decision for my app because the encryption channel means the JWT doesn't matter" "Well then don't check the signature and move on if you don't care.")
several Javascript Object Signing and Encryption (JOSE) libraries fail to validate their inputs correctly when performing elliptic curve key agreement (the "ECDH-ES" algorithm). An attacker that is able to send JWEs of its choosing that use invalid curve points and observe the cleartext outputs resulting from decryption with the invalid curve points can use this vulnerability to recover the recipient's private key. Oh sure that's a problem I can check for. Thanks for the help.
Don't let the super important claims like expiration be optional. Come on folks, why let people pick and choose like that? It's just gonna cause problems. OpenID Connect went through great lengths to improve the security properties of a JWT. For example, the protocol mandates the use of the exp, iss and aud claims. To do it right, I need those claims, so don't make them optional.

Quick, what's the right choice?

HS256 - HMAC using SHA-256 hash algorithm
HS384 - HMAC using SHA-384 hash algorithm
HS512 - HMAC using SHA-512 hash algorithm
ES256 - ECDSA signature algorithm using SHA-256 hash algorithm
ES256K - ECDSA signature algorithm with secp256k1 curve using SHA-256 hash algorithm
ES384 - ECDSA signature algorithm using SHA-384 hash algorithm
ES512 - ECDSA signature algorithm using SHA-512 hash algorithm
RS256 - RSASSA-PKCS1-v1_5 signature algorithm using SHA-256 hash algorithm
RS384 - RSASSA-PKCS1-v1_5 signature algorithm using SHA-384 hash algorithm
RS512 - RSASSA-PKCS1-v1_5 signature algorithm using SHA-512 hash algorithm
PS256 - RSASSA-PSS signature using SHA-256 and MGF1 padding with SHA-256
PS384 - RSASSA-PSS signature using SHA-384 and MGF1 padding with SHA-384
PS512 - RSASSA-PSS signature using SHA-512 and MGF1 padding with SHA-512
EdDSA - Both Ed25519 signature using SHA-512 and Ed448 signature using SHA-3 are supported. Ed25519 and Ed448 provide 128-bit and 224-bit security respectively.

You are holding it wrong. Don't tell me to issue and use x509 certificates. Trying that for micro-services cut years off my life.

But have you tried XML DSIG?

I need to both give something to the user that I can verify that tells me what they're supposed to be able to do and I need some way of having services pass the auth back and forth. So many places have adopted JWTs because JSON = easy to handle. If there is a right (or wrong) algorithm, guide me there. It is fine to say "this is now depreciated". That's a totally normal thing to tell developers and it happens all the time. But please help us all do the right thing.

Alright I am making a very basic application. It will provide many useful features for users around the world. I just need them to be able to log into the thing. I guess username and password right? I want users to have a nice, understood experience.

No you stupid idiot passwords are fundamentally broken

Well you decide to try anyway. You find this helpful cheat sheet.

Use Argon2id with a minimum configuration of 19 MiB of memory, an iteration count of 2, and 1 degree of parallelism.
If Argon2id is not available, use scrypt with a minimum CPU/memory cost parameter of (2^17), a minimum block size of 8 (1024 bytes), and a parallelization parameter of 1.
For legacy systems using bcrypt, use a work factor of 10 or more and with a password limit of 72 bytes.
If FIPS-140 compliance is required, use PBKDF2 with a work factor of 600,000 or more and set with an internal hash function of HMAC-SHA-256.
Consider using a pepper to provide additional defense in depth (though alone, it provides no additional secure characteristics).

None of these mean anything to you but that's fine. It looks pretty straightforward at first.

>>> from argon2 import PasswordHasher
>>> ph = PasswordHasher()
>>> hash = ph.hash("correct horse battery staple")
>>> hash  # doctest: +SKIP
'$argon2id$v=19$m=65536,t=3,p=4$MIIRqgvgQbgj220jfp0MPA$YfwJSVjtjSU0zzV/P3S9nnQ/USre2wvJMjfCIjrTQbg'
>>> ph.verify(hash, "correct horse battery staple")
True
>>> ph.check_needs_rehash(hash)
False
>>> ph.verify(hash, "Tr0ub4dor&3")
Traceback (most recent call last):
  ...
argon2.exceptions.VerifyMismatchError: The password does not match the supplied hash

Got it. But then you see this.

Rather than a simple work factor like other algorithms, Argon2id has three different parameters that can be configured. Argon2id should use one of the following configuration settings as a base minimum which includes the minimum memory size (m), the minimum number of iterations (t) and the degree of parallelism (p).

    m=47104 (46 MiB), t=1, p=1 (Do not use with Argon2i)
    m=19456 (19 MiB), t=2, p=1 (Do not use with Argon2i)
    m=12288 (12 MiB), t=3, p=1
    m=9216 (9 MiB), t=4, p=1
    m=7168 (7 MiB), t=5, p=1

What the fuck does that mean. Do I want more memory and fewer iterations? That doesn't sound right. Then you end up here: https://www.rfc-editor.org/rfc/rfc9106.html which says I should be using argon2.profiles.RFC_9106_HIGH_MEMORY. Ok but it warns me that it requires 2 GiB, which seems like a lot? How does that scale with a lot of users? Does it change? Should I do low_memory?

Alright I'm sufficiently scared off. I'll use something else.

I've heard about passkeys and they seem easy enough. I'll do that.

Alright well that's ok. I got....most of the big ones.

If you have Windows 10 or up, you can use passkeys. To store passkeys, you must set up Windows Hello. Windows Hello doesn’t currently support synchronization or backup, so passkeys are only saved to your computer. If your computer is lost or the operating system is reinstalled, you can’t recover your passkeys.

Nevermind I can't use passkeys. Good to know.

Well if you put the passkeys in 1password then it works

Great so passkeys cost $5 a month per user and they get to pay for the priviledge of using my site. Sounds totally workable.

OpenID Connect/OAuth

Ok so first I need to figure out what kind of this thing I need. I'll just read through all the initial information I need to make this decision.

Now that I've completed a masters degree in login, it's time for me to begin.

Apple

Facebook/Google/Microsoft

So each one of these requires me to create an account, set up their tokens and embed the button. Not a huge deal, but I can never get rid of any of these and if one was to get deactivated, it would be a problem. See when Login with Twitter stopped being a thing people could use. Plus with Google and Microsoft they also offer email services, so presumably a lot of people will be using their email address, then I've gotta create a flow on the backend where I can associate the same user with multiple email addresses. Fine, no big deal.

I'm also loading Javascript from these companies on my page and telling them who my customers are. This is (of course) necessary, but seems overkill for the problem I'm trying to solve. I need to know that the user is who they say they are, but I don't need to know what the user can do inside of their Google account.

I don't really want this data

Here's the default data I get with Login with Facebook after the user goes through a scary authorization page.

id
first_name
last_name
middle_name
name
name_format
picture
short_name
email

I don't need that. Same with Google

BasicProfile.getId()
BasicProfile.getName()
BasicProfile.getGivenName()
BasicProfile.getFamilyName()
BasicProfile.getImageUrl()
BasicProfile.getEmail()

I'm not trying to say this is bad. These are great tools and I think the Google one especially is well made. I just don't want to prompt users to give me access to data if I don't want the data and I especially don't want the data if I have no idea if its the data you intended to give me. Who hasn't hit the "Login with Facebook" button and wondered "what email is this company going to send to". My Microsoft account is back from when I bought an Xbox OG. I have no idea where it sends messages now.

Fine, Magic Links

I don't know how to hash stuff correctly in such a way that I am confident I won't mess it up. Passkeys don't work yet. I can use OpenID Connect but really it is overkill for this use case since I don't want to operate as the user on the third-party and I don't want access to all the users information since I intend to ask them how they want me to contact them. The remaining option is "magic links".

How do we set up magic links securely?

Short lifespan for the password. The one-time password issued will be valid for 5 minutes before it expires
The user's email is specified alongside login tokens to stop URLs being brute-forced
Each login token will be at least 20 digits
The initial request and its response must take place from the same IP address
The initial request and its response must take place in the same browser
Each one-time link can only be used once
Only the last one-time link issued will be accepted. Once the latest one is issued, any others are invalidated.

The fundamental problem here is that email isn't a reliable system of delivery. It's a best-effort system. So if something goes wrong, takes a long time, etc, there isn't much I can really do to troubleshoot that. My advice to the user would be like "I guess you need to try a different email address".

So in order to do this for actual normal people to use, I have to turn off a lot of those security settings. I can't guarantee people don't sign up on their phones and then go to their laptops (so no IP address or browser check). I can't guarantee when they'll get the email (so no 5 minute check). I also don't know the order in which they're gonna get these emails, so it will be super frustrating for people if I send them 3 emails and the second one is actually the most "recent".

I also have no idea how secure this email account is. Effectively I'm just punting on security because it is hard and saying "well this is your problem now".

I could go on and on and on and on

I could write 20,000 words on this topic and still not be at the end. The word miserable barely does justice to how badly this stuff is designed for people to use. Complexity is an unavoidable side effect of flexibility in software. If your thing can do many things, it is harder to use.

We rely on expertise as a species to assist us with areas outside of our normal functions. I don't know anything about medicine, I go to a doctor. I have no idea how one drives a semi truck or flies a plane or digs a mine. Our ability to let people specialize is a key component to our ability to advance. So it is not reasonable to say "if you do anything with security at all you must become an expert in security".

Part of that is you need to use your skill and intelligence to push me along the right path. Don't say "this is the most recommended and this is less recommended and this one is third recommended". Show me what you want people to build and I bet most teams will jump at the chance to say "oh thank God, I can copy and paste a good example".

Corrections/notes/"I think you are stupid": https://c.im/@matdevdug

Can We Make Idiot-Proof Infrastructure pt1?

October 20, 2023

One complaint I hear all the time online and in real life is how complicated infrastructure is. You either commit to a vendor platform like ECS, Lightsail, Elastic Beanstalk or Cloud Run or you go all in with something like Kubernetes. The first are easy to run but lock you in and also sometimes get abandoned by the vendor (looking at you Beanstalk). Kubernetes runs everywhere but it is hard and complicated and has a lot of moving parts.

The assumption seems to be that with containers there should be an easier way to do this. I thought it was an interesting thought experiment. Could I, a random idiot, design a simpler infrastructure? Something you could adopt to any cloud provider without doing a ton of work, that is relatively future proof and that would scale to the point when something more complicated made sense? I have no idea but I thought it could be fun to try.

Fundamentals of Basic Infrastructure

Here are the parameters we're attempting to work within:

It should require minimal maintenance. You are a small crew trying to get a product out the door and you don't want to waste a ton of time.
You cannot assume you will detect problems. You lack the security and monitoring infrastructure to truly "audit" the state of the world and need to assume that you won't be able to detect a breach. Anything you put out there has to start as secure as possible and pretty much fix itself.
Controlling costs is key. You don't have the budget for surprises and massive spikes in CPU usage is likely a problem and not organic growth (or if it is organic growth, you'll want to likely be involved with deciding what to do about it)
The infrastructure should be relatively portable. We're going to try and keep everything movable without too many expensive parts.
Perfect uptime isn't the goal. Restarting containers isn't a hitless operation and while there are ways to queue up requests and replay them, we're gonna try to not bite off that level of complexity with the first draft. We're gonna drop some requests on the floor, but I think we can minimize that number.

Basic Setup

You've got your good idea, you've written some code and you have a private repo in GitHub. Great, now you need to get the thing out onto the internet. Let's start with some good tips before we get anywhere near to the internet itself.

Semantic Versioning is your friend. If you get into the habit now of structuring commits and cutting releases, you are going to reap those benefits down the line. It seems silly right this second when the entirety of the application code fits inside of your head, but soon that won't be the case if you continue to work on it. I really like Release-Please as a tool to cut releases automatically based on commits and let you use the version number to be a meaningful piece of data for you to work off.
Containers are mandatory. Just don't overthink this and commit early. Don't focus on container disk space usage. Disk space is not our largest concern. We want an easy to work with platform with a minimum amount of surface area for attacks. While Distroless isn't actually....without a linux Distro (I'm not entirely clear why that name was chosen), it is a great place to start. If you can get away with using these, this is what you want to do. Link
Be careful about what dependencies you rely on in the early phase. So many jobs I've had there are a few unmaintained packages that are mission critical impossible to remove load-bearing weights around our necks. If you can do it with the standard library great. When you find a dependency on the internet, look at what you need it to do and see "can I just copy paste the 40 lines of code I need from this" vs adding a new dependency forever. Dependency minimization isn't very cool right now but I think especially when starting out it pays off big.
Healthcheck. You need some route on your app that you can hit which provides a good probability that the application is up and functional. /health or whatever, but this is gonna be pretty key to the rest of this works.

Deployment and Orchestration

Alright so you've made the app, you have some way of tracking major/minor etc. Everything works great on your laptop. How do we put it on the internet.

You want a way to take a container and deploy it out to a Linux host
You don't want to patch or maintain the host
You need to know if the deployment has gone wrong
Either the deployment should roll back automatically or fail safe waiting for intervention
The whole thing needs to be as safe as possible.

Is there a lightweight way to do this? Maybe!

Basic Design

Cloudflare -> Autoscaling Group -> 4 instances setup with Cloud init -> Docker Compose with Watchtower -> DBaaS

When we deploy we'll be hitting the IP addresses of the instances on the Watertower HTTP route with curl and telling it to connect to our private container registry and pull down new versions of our application. We shouldn't need to SSH into the boxes ever and when a box dies or needs to be replaced, we can just delete it and run Terraform again to make a new one. SSL will be static long-lived certificates and we should be able to distribute traffic across different cloud providers however we'd like.

Cloudflare as the Glue

I know, a lot of you are rolling your eyes. "This isn't portable at all!" Let me defend my work at bit. We need a WAF, we need SSL, we need DNS, we need a load balancer and we need metrics. I can do all of that with open-source projects, but it's not easy. As I was writing it out, it started to get (actually) quite difficult to do.

Cloudflare is very cheap for what they offer. We aren't using anything here that we couldn't move somewhere else if needed. It scales pretty well, up to 20 origins (which isn't amazing but if you have hit 20 servers serving customer traffic you are ready to move up in complexity). You are free to change the backend CPU as needed (or even experiment with local machines, mix and match datacenter and cloud, etc). You also get a nice dashboard of what is going on without any work. It's a hard value proposition to fight against, especially when almost all of it is free. I also have no ideological dog in the fight of OSS vs SaaS.

Pricing

Up to 2 origin servers: $5 per month

Additional origins, up to 20: $5 per month per origin

First 500k DNS requests are free

$0.50 per every 500k DNS requests after

Compared to ALB pricing, we can see why this is more idiot proof. There we have 4 dimensions to cost: New connections (per second), Active connections (per minute), Processed bytes (GBs per hour), Rule evaluations (per second). The hourly bill is calculated by taking the maximum LCUs consumed across the four dimensions and we're charged on the highest one. Now ALBs can be much cheaper than Cloudflare, but it's harder to control the cost. If one element starts to explode in price, there isn't a lot you can do to bring it back down.

Cloudflare we're looking at $20 a month and then traffic. So if we get 60,000,000 requests a month we're paying $60 a month in DNS and $20 for the load balancer. For ALB it would largely depend on the type of traffic we're getting and how it is distributed.

BUT there are also much cheaper options. For €7 a month on Hetzner, you can get 25 targets and 20 TB of network traffic. € 1/TB for network traffic above that. So for our same cost we could handle a pretty incredible amount of traffic through Hetzner, but it commits us to them and violates the spirit of this thing. I just wanted to mention it in case someone was getting ready to "actually" me.

Also keep in mind we're just in the "trying ideas out" part of the exercise. Let's define a load balancer.

provider "cloudflare" {
  email   = "[email protected]"
  api_key = "your_api_key"
}

resource "cloudflare_load_balancer" "example_lb" {
  name   = "example-load-balancer.example.com"
  zone_id = "0da42c8d2132a9ddaf714f9e7c920711"
  default_pool_ids = [cloudflare_load_balancer_pool.pool1.id, cloudflare_load_balancer_pool.pool2.id]
  fallback_pool_id = cloudflare_load_balancer_pool.pool1.id
  steering_policy = "random"
  session_affinity = "none"
  proxied = true

  # Add other load balancer settings here from https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/resources/load_balancer
  }

Then we need a monitor.

resource "cloudflare_load_balancer_monitor" "example" {
  account_id     = "f037e56e89293a057740de681ac9abbe"
  type           = "http"
  expected_body  = "alive"
  expected_codes = "2xx"
  method         = "GET"
  timeout        = 7
  path           = "/health"
  interval       = 60
  retries        = 2
  description    = "example http load balancer"
  header {
    header = "Host"
    values = ["example.com"]
  }
  allow_insecure   = false
  follow_redirects = true
  probe_zone       = "example.com"
}

Finally we need some pools

resource "cloudflare_load_balancer_pool" "pool1" {
  account_id = "f037e56e89293a057740de681ac9abbe"
  name       = "pool1"
  monitor = cloudflare_load_balancer_monitor.example.id
  origins {
    name    = "server01"
    address = "d9bb:3880:71b0:5fab:e426:8883:5a75:e82e"
    enabled = false
    header {
      header = "Host"
      values = ["server01"]
    }
  }
  origins {
    name    = "server02"
    address = "9726:61db:23a9:41d5:7eb0:649a:87b0:4291"
    header {
      header = "Host"
      values = ["server02"]
    }
  }
  description        = "example load balancer pool 1"
  enabled            = false
  minimum_origins    = 1
  notification_email = "[email protected]"
  load_shedding {
    default_percent = 55
    default_policy  = "random"
  }
  origin_steering {
    policy = "random"
  }
}

resource "cloudflare_load_balancer_pool" "pool2" {
  account_id = "f037e56e89293a057740de681ac9abbe"
  name       = "pool2"
  monitor = cloudflare_load_balancer_monitor.example.id
  origins {
    name    = "server03"
    address = "3601:03b9:88b7:fa50:8163:818c:eceb:bc14"
    enabled = false
    header {
      header = "Host"
      values = ["server03"]
    }
  }
  origins {
    name    = "server04"
    address = "8118:87ef:6b50:099d:fc4a:e66d:a991:5d20"
    header {
      header = "Host"
      values = ["server04"]
    }
  }
  description        = "example load balancer pool 2"
  enabled            = false
  minimum_origins    = 1
  notification_email = "[email protected]"
  load_shedding {
    default_percent = 55
    default_policy  = "random"
  }
  origin_steering {
    policy = "random"
  }
}

The addresses are just placeholders, but you'll need to swap values etc. This gives us a nice basic load balancer. Note that we don't have session affinity turned on, so we'll need to add Redis or something to help with state server-side. The IP addresses we point to will need to be reserved on the cloud provider side, but we can use IPv6 so hopefully should save us a few dollars a month there.

How much uptime is enough uptime

So there are two paths here we have to discuss before we get much further.

Path 1

When we deploy to a server, we make an API call to Cloudflare to mark the origin as not enabled. Then we wait for the connections to drain, deploy the container, bring it back up, wait for it to be healthy and then we mark it enabled again. This is traditionally the way we would need to do things, if we were targeting zero downtime.

Now we can do this. We have places later that we could stick such a script. But this is gonna be brittle. We'd basically need to do something like the following.

Run a GET against https://api.cloudflare.com/client/v4/user/load_balancers/pools
Take the result, look at the IP addresses, figure out which one is the machine in question and then mark it as not enabled IF all other origins were healthy. We wouldn't want to remove multiple machines at the same time. So we'd then need to hit: https://api.cloudflare.com/client/v4/user/load_balancers/pools/{identifier}/health and confirm the health of the pools.
But "health" isn't an instant concept. There is a delay between the concept of when the origin is unhealthly and I'll know about it, depending on how often I check and retries. So this isn't a perfect system, but it should work pretty well as long as I add a bit of jitter to it.

I think this exceeds what I want to do for the first pass. We can do it, but it's not consistent with the uptime discussion we had before. This is brittle and is going to require a lot of babysitting to get right.

Path 2

We rely on the healthchecks to steer traffic and assume that our deployments are going to be pretty fast, so while we might drop some traffic on the floor, a user (with our random distribution and server-side sessions) should be able to reload the page and hopefully get past the problem. It might not scale forever but it does remove a lot of our complexity.

Let's go with Path 2 for now.

Server setup + WAF

Alright so we've got the load balancer, it sits on the internet and takes traffic. Fabulous stuff. How do we set up a server? To do it cross-platform we have to use cloud-init.

The basics are pretty straight forward. We're gonna use latest debian and we're gonna update it and restart. Then we're gonna install Docker Compose and then finally stick a few files in there to run this. This is all pretty easy, but we do have a problem we need to tackle first. We need some way to do a level of secrets management so we can write out Terraform and cloud-init files, keep them in version control but also not have the secrets just kinda live there.

SOPS

So typically for secret management we want to use whatever our cloud provider gives us, but since we don't have something like that, we'll need to do something more basic.

We'll use age for encryption which is a great simple encryption library. You can install it here. We run age-keygen -o key.txt which gives us our secret file. Then we need to set an environmental variable with the path to the key like this: SOPS_AGE_KEY_FILE=/Users/mathew.duggan/key.txt

For those unfamiliar with how SOPS (installed here) works, basically you generate the age key as shown above and then you can encrypt files through a CLI or with Terraform locally. So:

secrets.json
{
   "username": "admin",
   "password": "password"
}

Turns into:

{
	"username": "ENC[AES256_GCM,data:+bGf/sI=,iv:J47szLfZ5wMWr6Ghc94VAABXs2Ec4Hi+e3ohc2HuF/Q=,tag:XIY1jOgDe9SBDMGxFhLwtw==,type:str]",
	"password": "ENC[AES256_GCM,data:RIHz14crqEk=,iv:H3g7/4Bd5vB/6U+Kf+rIR/xBRIGHGoZeN7U1zi5lgsM=,tag:+vD9BXb18rLhpf/sTsvYEA==,type:str]",
	"sops": {
		"kms": null,
		"gcp_kms": null,
		"azure_kv": null,
		"hc_vault": null,
		"age": [
			{
				"recipient": "age1j6dmaunhspfvh78lgnrtr6zkd7whcypcz6jdwypaydc6gaa79vtq5ryvzf",
				"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSA1YlcvdkpGc3pBbVFiUnhP\nYVJnalp0WlREVjlQZkFROGtvcWN2VWxsUUJnCmYvZ1ZPd3NzTjZxNHd6MEVNcmI1\nTTBZdnFaSEFSaXZRK28rc01VZGRxWHMKLS0tIGpZUjZCNDFDUnIvYXRJTDhtcGlu\nT3JJWlN1YlJYeU1ueEQ1cytDbDFXQ00K70mBEowf/AGgiFFNj3ocv0NfbI1IMJX/\nMJHMKtXPYJsoSKJla6Y+cXMXPe7LNNorSnmqvkNF7rgEMvONMNoEiA==\n-----END AGE ENCRYPTED FILE-----\n"
			}
		],
		"lastmodified": "2023-10-19T13:06:42Z",
		"mac": "ENC[AES256_GCM,data:q8R8Zb+PtpBs6TBPu6VJsQXEKLwi2+WtpE3culIy1obUNdfjWaXyBtC/zbWI5eeh2Z4u//2p49G2bMv0jSzMJZnH4TLIzpHxnd6XFjzu4TqObM6FnI3ZW/SSoPwTRxgHqvooMffm3NO5pxoz3FhnJDHwYk+jTK+JoGxyZF5nBe4=,iv:Ey+so87o/kYbvOaSUXs+vyIrEQXEC39vmswdl0L3Gvw=,tag:5mWJTfBgCFjXVuoYBUiDCA==,type:str]",
		"pgp": null,
		"unencrypted_suffix": "_unencrypted",
		"version": "3.8.1"
	}
}

By running this: sops --encrypt --age age1j6dmaunhspfvh78lgnrtr6zkd7whcypcz6jdwypaydc6gaa79vtq5ryvzf secrets.json > secrets.enc.json

So we can use this with Terraform pretty easily. We run export SOPS_AGE_KEY_FILE=/Users/mathew.duggan/key.txt just to ensure everything is set and then the Terraform looks like the following:

terraform {
  required_providers {
    sops = {
      source = "carlpett/sops"
      version = "~> 0.5"
    }
  }
}

data "sops_file" "secret" {
  source_file = "secrets.enc.json"
}

output "root-value-password" {
  # Access the password variable from the map
  value = data.sops_file.secret.data["password"]
  sensitive = true
}

Now you can use SOPS with AWS, GCP, Azure, or use their own secrets system. I present this only as a "we're small and am looking for a way to easily encrypt configuration files".

Cloud init

So now we're to the last part of the server setup. We'll need to define a cloud-init YAML to set up the host and we'll need to define a Docker Compose file to set up the application that is going to handle all the pulling of images from here. Now thankfully we should be able to reuse this stuff for the foreseeable future.

#cloud-config

package_update: true
package_upgrade: true
package_reboot_if_required: true

groups:
    - docker

users:
    - name: admin
      lock_passwd: true
      shell: /bin/bash
      ssh_authorized_keys:
      - ${init_ssh_public_key}
      groups: docker
      sudo: ALL=(ALL) NOPASSWD:ALL

packages:
  - apt-transport-https
  - ca-certificates
  - curl
  - gnupg-agent
  - software-properties-common
  - unattended-upgrades
  - nginx
  
write_files:
  - owner: root:root
    encoding: b64
    path: /etc/ssl/cloudflare.crt
    content: |
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tDQpNSUlHQ2pDQ0EvS2dBd0lCQWdJSVY1RzZsVmJDTG1Fd0RRWUpLb1pJaHZjTkFRRU5CUUF3Z1pBeEN6QUpCZ05WDQpCQVlUQWxWVE1Sa3dGd1lEVlFRS0V4QkRiRzkxWkVac1lYSmxMQ0JKYm1NdU1SUXdFZ1lEVlFRTEV3dFBjbWxuDQphVzRnVUhWc2JERVdNQlFHQTFVRUJ4TU5VMkZ1SUVaeVlXNWphWE5qYnpFVE1CRUdBMVVFQ0JNS1EyRnNhV1p2DQpjbTVwWVRFak1DRUdBMVVFQXhNYWIzSnBaMmx1TFhCMWJHd3VZMnh2ZFdSbWJHRnlaUzV1WlhRd0hoY05NVGt4DQpNREV3TVRnME5UQXdXaGNOTWpreE1UQXhNVGN3TURBd1dqQ0JrREVMTUFrR0ExVUVCaE1DVlZNeEdUQVhCZ05WDQpCQW9URUVOc2IzVmtSbXhoY21Vc0lFbHVZeTR4RkRBU0JnTlZCQXNUQzA5eWFXZHBiaUJRZFd4c01SWXdGQVlEDQpWUVFIRXcxVFlXNGdSbkpoYm1OcGMyTnZNUk13RVFZRFZRUUlFd3BEWVd4cFptOXlibWxoTVNNd0lRWURWUVFEDQpFeHB2Y21sbmFXNHRjSFZzYkM1amJHOTFaR1pzWVhKbExtNWxkRENDQWlJd0RRWUpLb1pJaHZjTkFRRUJCUUFEDQpnZ0lQQURDQ0Fnb0NnZ0lCQU4yeTJ6b2pZZmwwYktmaHAwQUpCRmVWK2pRcWJDdzNzSG12RVB3TG1xRExxeW5JDQo0MnRaWFI1eTkxNFpCOVpyd2JML0s1TzQ2ZXhkL0x1akpuVjJiM2R6Y3g1cnRpUXpzbzB4emxqcWJuYlFUMjBlDQppaHgvV3JGNE9rWkt5ZFp6c2RhSnNXQVB1cGxESDVQN0o4MnEzcmU4OGpRZGdFNWhxanFGWjNjbENHN2x4b0J3DQpoTGFhem0zTkpKbFVmemRrOTdvdVJ2bkZHQXVYZDVjUVZ4OGpZT09lVTYwc1dxbU1lNFFIZE92cHFCOTFiSm9ZDQpRU0tWRmpVZ0hlVHBOOHROcEtKZmI5TEluM3B1bjNiQzlOS05IdFJLTU5YM0tsL3NBUHE3cS9BbG5kdkEyS3czDQpEa3VtMm1IUVVHZHpWSHFjT2dlYTlCR2pMSzJoN1N1WDkzelRXTDAydTc5OWRyNlhrcmFkL1dTaEhjaGZqalJuDQphTDM1bmlKVURyMDJZSnRQZ3hXT2JzcmZPVTYzQjhqdUxVcGhXLzRCT2pqSnlBRzVsOWoxLy9hVUdFaS9zRWU1DQpscVZ2MFA3OFFyeG94UitNTVhpSndRYWI1RkI4VEcvYWM2bVJIZ0Y5Q21rWDkwdWFSaCtPQzA3WGpUZGZTS0dSDQpQcE05aEIyWmhMb2wvbmY4cW1vTGRvRDVIdk9EWnVLdTIrbXVLZVZIWGd3Mi9BNndNN093cmlueFppeUJrNUhoDQpDdmFBREg3UFpwVTZ6L3p2NU5VNUhTdlhpS3RDekZ1RHU0L1pmaTM0UmZIWGVDVWZIQWI0S2ZOUlhKd01zeFVhDQorNFpwU0FYMkc2Um5HVTVtZXVYcFU1L1YrRFFKcC9lNjlYeXlZNlJYRG9NeXdhRUZsSWxYQnFqUlJBMnBBZ01CDQpBQUdqWmpCa01BNEdBMVVkRHdFQi93UUVBd0lCQmpBU0JnTlZIUk1CQWY4RUNEQUdBUUgvQWdFQ01CMEdBMVVkDQpEZ1FXQkJSRFdVc3JhWXVBNFJFemFsZk5Wemphbm4zRjZ6QWZCZ05WSFNNRUdEQVdnQlJEV1VzcmFZdUE0UkV6DQphbGZOVnpqYW5uM0Y2ekFOQmdrcWhraUc5dzBCQVEwRkFBT0NBZ0VBa1ErVDlucWNTbEF1Vy85MERlWW1RT1cxDQpRaHFPb3I1cHNCRUd2eGJOR1YyaGRMSlk4aDZRVXE0OEJDZXZjTUNoZy9MMUNrem5CTkk0MGkzLzZoZURuM0lTDQp6VkV3WEtmMzRwUEZDQUNXVk1aeGJRamtOUlRpSDhpUnVyOUVzYU5RNW9YQ1BKa2h3ZzIrSUZ5b1BBQVlVUm9YDQpWY0k5U0NEVWE0NWNsbVlISi9YWXdWMWljR1ZJOC85YjJKVXFrbG5PVGE1dHVnd0lVaTVzVGZpcE5jSlhIaGd6DQo2QktZRGwwL1VQMGxMS2JzVUVUWGVUR0RpRHB4WllJZ2JjRnJSRERrSEM2QlN2ZFdWRWlINWI5bUgyQk9ONjB6DQowTzBqOEVFS1R3aTlqbmFmVnRaUVhQL0Q4eW9Wb3dkRkRqWGNLa09QRi8xZ0loOXFyRlI2R2RvUFZnQjNTa0xjDQo1dWxCcVphQ0htNTYzanN2V2Iva1hKbmxGeFcrMWJzTzlCREQ2RHdlQmNHZE51cmdtSDYyNXdCWGtzU2REN3kvDQpmYWtrOERhZ2piaktTaFlsUEVGT0FxRWNsaXdqRjQ1ZWFiTDB0MjdNSlY2MU8vakh6SEwzZGtuWGVFNEJEYTJqDQpiQStKYnlKZVVNdFU3S01zeHZ4ODJSbWhxQkVKSkRCQ0ozc2NWcHR2aERNUnJ0cURCVzVKU2h4b0FPY3BGUUdtDQppWVdpY240Nm5QRGpnVFUwYlgxWlBwVHByeVhidmNpVkw1UmtWQnV5WDJudGNPTERQbFpXZ3haQ0JwOTZ4MDdGDQpBbk96S2daazRSelpQTkF4Q1hFUlZ4YWpuL0ZMY09oZ2xWQUtvNUgwYWMrQWl0bFEwaXA1NUQyL21mOG83MnRNDQpmVlE2VnB5akVYZGlJWFdVcS9vPQ0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQ==
  - owner: root:root
    encoding: b64
    path: /etc/ssl/cert.pem
    content: |
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tDQpNSUlFcGpDQ0E0NmdBd0lCQWdJVUgzZXMwaHVaQy8rTUNxQWRyWXEwTE05UFY4QXdEUVlKS29aSWh2Y05BUUVMDQpCUUF3Z1lzeEN6QUpCZ05WQkFZVEFsVlRNUmt3RndZRFZRUUtFeEJEYkc5MVpFWnNZWEpsTENCSmJtTXVNVFF3DQpNZ1lEVlFRTEV5dERiRzkxWkVac1lYSmxJRTl5YVdkcGJpQlRVMHdnUTJWeWRHbG1hV05oZEdVZ1FYVjBhRzl5DQphWFI1TVJZd0ZBWURWUVFIRXcxVFlXNGdSbkpoYm1OcGMyTnZNUk13RVFZRFZRUUlFd3BEWVd4cFptOXlibWxoDQpNQjRYRFRJek1EY3pNVEUzTXprd01Gb1hEVE00TURjeU56RTNNemt3TUZvd1lqRVpNQmNHQTFVRUNoTVFRMnh2DQpkV1JHYkdGeVpTd2dTVzVqTGpFZE1Cc0dBMVVFQ3hNVVEyeHZkV1JHYkdGeVpTQlBjbWxuYVc0Z1EwRXhKakFrDQpCZ05WQkFNVEhVTnNiM1ZrUm14aGNtVWdUM0pwWjJsdUlFTmxjblJwWm1sallYUmxNSUlCSWpBTkJna3Foa2lHDQo5dzBCQVFFRkFBT0NBUThBTUlJQkNnS0NBUUVBdmtmbjB1eVZ3LzlSYlBDbDQ2dzhIeVZnTXZKREtVUWgvQUk0DQpIODRXRGRzM1hTRmxrbmFIK0FQdmJoM0Rsc3M5NEZnRDVGVVRMdENzQzRtSFpZVlNiRzJqeCtJbjJGcTdTSjdUDQp1QlJUbHBXWmNyVEViRjRBa00wRm53NGwwbEdQeFlZRjRaOG5uZm13YUtvNnlwb0Ftd3draXJWWXU3dWE4Mm01DQp3eWoyZHZKcWNkUExxTXdHRFVkYnlYemdwZE9IaXRBVFFoTE56VmtaOEI1L2RyODcweDR3TE8rRkVOOG92QUprDQpaNVZCRndSOEI5WEs4dUtEcmdBZkxYUVM5UVZ3WHpjcmQxQVp6S1RDVnBlMmlwemFiSGN5TUt1WDdpZjRTRGQ1DQpiZ2Ird1hycGY2dkNRWklDa3REdWJFcDdCVzlCNVhIUnlmMnJ2Yms2VEtjZ2xTbGNRUUlEQVFBQm80SUJLRENDDQpBU1F3RGdZRFZSMFBBUUgvQkFRREFnV2dNQjBHQTFVZEpRUVdNQlFHQ0NzR0FRVUZCd01DQmdnckJnRUZCUWNEDQpBVEFNQmdOVkhSTUJBZjhFQWpBQU1CMEdBMVVkRGdRV0JCU3pwcWpFOEJUK0FKYUg2c3VnRmwxajdqend4REFmDQpCZ05WSFNNRUdEQVdnQlFrNkZOWFhYdzBRSWVwNjVUYnV1RVdlUHdwcERCQUJnZ3JCZ0VGQlFjQkFRUTBNREl3DQpNQVlJS3dZQkJRVUhNQUdHSkdoMGRIQTZMeTl2WTNOd0xtTnNiM1ZrWm14aGNtVXVZMjl0TDI5eWFXZHBibDlqDQpZVEFwQmdOVkhSRUVJakFnZ2c4cUxtMWhkR1IxWjJkaGJpNWpiMjJDRFcxaGRHUjFaMmRoYmk1amIyMHdPQVlEDQpWUjBmQkRFd0x6QXRvQ3VnS1lZbmFIUjBjRG92TDJOeWJDNWpiRzkxWkdac1lYSmxMbU52YlM5dmNtbG5hVzVmDQpZMkV1WTNKc01BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQ3VvUG9KV05VZ0xPRXVmendLRlprMHBvL2tNR29qDQoxYTdCSGEzcWtNWGUrN2J4aW1pQTBvYzcyVEhYSm8zVm82bTIwaGRpbDRiSzVPYzZoTGpiUTFOR2ZXNm84MXk2DQpyUXZEaXBXN3JuL3R3V3hPTkpHTFNDZDZFalpqWXpUUW5EdFBSQWQrVnBwV1BuNUtLZHRSNkM2ZjhaMFlqeldjDQp3b3JLdkRuV2E5b0gycEUzZUNSRUZsc1lRUUtVNWxOYUpibm9nRXNaY2ZDa0MvU0JCaTRaN0lIRnJzWnd1YTU5DQorVDIxUWNOd3BKbExLZ2VRZlpLazMzTFc5MFlyYjRhNStMaTljQzZsVC9MRHdTc20ySkVVVm1nbDJOaC8wV2dpDQpBcHFxUjV5dmUwdUI2M0tTdW90Z2hyWlp0cnNhVW1OYytjRjhneHU4Si8rdXFhaWZQWk83NVZtVw0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQ==
  - owner: root:root
    encoding: b64
    path: /etc/ssl/key.pem
    permissions: '0600'
    content: ${private_ssl_key}
  - owner: admin:docker
    path: /home/admin/docker-compose.yaml
    content: |
    version: "3"
    services:
      app:
        image: ghcr.io/<org>/<image>:<tag>
        restart: unless-stopped
        ports:
          - "8000:2368"
        labels:
          - "com.centurylinklabs.watchtower.enable=true"
      watchtower:
        image: containrrr/watchtower
        command: --debug --http-api-update
        restart: unless-stopped
        environment:
          - WATCHTOWER_HTTP_API_TOKEN=${watchtower_token}
        labels:
          - "com.centurylinklabs.watchtower.enable=false"
        ports:
          - "8080:8080"
        volumes:
          - /var/run/docker.sock:/var/run/docker.sock
          - /home/admin/.docker/config.json:/config.json
    - owner: www-data:www-data
      path: /etc/nginx/sites-available/default
      content: |
        server {
          listen 443 ssl http2;
          listen [::]:443 ssl http2;
          charset UTF-8;
          ssl_session_timeout 5m;
          ssl_prefer_server_ciphers on;
          ssl_ciphers ECDH+AESGCM:ECDH+AES256:ECDH+AES128:DH+3DES:!ADH:!AECDH:!MD5;
          ssl_protocols TLSv1.2;
          ssl_buffer_size 4k;
          ssl_certificate         /etc/ssl/cert.pem;
          ssl_certificate_key     /etc/ssl/key.pem;
          ssl_client_certificate /etc/ssl/cloudflare.crt;
          ssl_verify_client on;
          
          server_name hostname.com www.hostname.com;
          
          location / {
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header Host $host;
            proxy_http_version 1.1;
            proxy_buffering        on;
            proxy_pass http://127.0.0.1:8000;
            proxy_redirect off;
            }
            
          location /v1/update {
            proxy_http_version 1.1;
            proxy_buffering on;
            proxy_pass http://127.0.0.1:8080;
            proxy_redirect off;
            }
          }
  
runcmd:
  - curl -fsSL https://download.docker.com/linux/debian/gpg | apt-key add -
  - add-apt-repository "deb [arch=$(dpkg --print-architecture)] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
  - apt-get update -y
  - apt-get install -y docker-ce docker-ce-cli containerd.io
  - systemctl start docker
  - systemctl enable docker
  - curl -L "https://github.com/docker/compose/releases/download/2.23.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
  - chmod +x /usr/local/bin/docker-compose
  - su admin -c 'docker login -u ${docker_username} -p ${docker_password} ${docker_repository}'
  - su admin -c 'docker compose -f /home/admin/docker-compose.yml up -d'

Now obviously you'll need to modify this and test it, it took some tweaks to get it working on mine and I'm confident there are improvements we could make. However I think we can use it as a sample reference doc with the understanding it is NOT ready to copy and paste.

So here's the basic flow. We're going to use the SSL certificates Cloudflare gives us as well as inserting their certificate for Authenticated Origin Pulls. This ensures all the traffic coming to our server is from Cloudflare. Now we could be traffic from another Cloudflare customer, a malicious one, but at least this gives us a good starting point to limit the traffic. Plus presumably if there is a malicious customer hitting you, at least you can reach out to Cloudflare and they'll do....something.

Now we put it together with Terraform and we have something we can deploy. We'll do Digital Ocean as our example but the cloud provider part doesn't really matter.

secrets.json

{
   "private_ssl_key": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tDQpNSUlFcGpDQ0E0NmdBd0lCQWdJVUgzZXMwaHVaQy8rTUNxQWRyWXEwTE05UFY4QXdEUVlKS29aSWh2Y05BUUVMDQpCUUF3Z1lzeEN6QUpCZ05WQkFZVEFsVlRNUmt3RndZRFZRUUtFeEJEYkc5MVpFWnNZWEpsTENCSmJtTXVNVFF3DQpNZ1lEVlFRTEV5dERiRzkxWkVac1lYSmxJRTl5YVdkcGJpQlRVMHdnUTJWeWRHbG1hV05oZEdVZ1FYVjBhRzl5DQphWFI1TVJZd0ZBWURWUVFIRXcxVFlXNGdSbkpoYm1OcGMyTnZNUk13RVFZRFZRUUlFd3BEWVd4cFptOXlibWxoDQpNQjRYRFRJek1EY3pNVEUzTXprd01Gb1hEVE00TURjeU56RTNNemt3TUZvd1lqRVpNQmNHQTFVRUNoTVFRMnh2DQpkV1JHYkdGeVpTd2dTVzVqTGpFZE1Cc0dBMVVFQ3hNVVEyeHZkV1JHYkdGeVpTQlBjbWxuYVc0Z1EwRXhKakFrDQpCZ05WQkFNVEhVTnNiM1ZrUm14aGNtVWdUM0pwWjJsdUlFTmxjblJwWm1sallYUmxNSUlCSWpBTkJna3Foa2lHDQo5dzBCQVFFRkFBT0NBUThBTUlJQkNnS0NBUUVBdmtmbjB1eVZ3LzlSYlBDbDQ2dzhIeVZnTXZKREtVUWgvQUk0DQpIODRXRGRzM1hTRmxrbmFIK0FQdmJoM0Rsc3M5NEZnRDVGVVRMdENzQzRtSFpZVlNiRzJqeCtJbjJGcTdTSjdUDQp1QlJUbHBXWmNyVEViRjRBa00wRm53NGwwbEdQeFlZRjRaOG5uZm13YUtvNnlwb0Ftd3draXJWWXU3dWE4Mm01DQp3eWoyZHZKcWNkUExxTXdHRFVkYnlYemdwZE9IaXRBVFFoTE56VmtaOEI1L2RyODcweDR3TE8rRkVOOG92QUprDQpaNVZCRndSOEI5WEs4dUtEcmdBZkxYUVM5UVZ3WHpjcmQxQVp6S1RDVnBlMmlwemFiSGN5TUt1WDdpZjRTRGQ1DQpiZ2Ird1hycGY2dkNRWklDa3REdWJFcDdCVzlCNVhIUnlmMnJ2Yms2VEtjZ2xTbGNRUUlEQVFBQm80SUJLRENDDQpBU1F3RGdZRFZSMFBBUUgvQkFRREFnV2dNQjBHQTFVZEpRUVdNQlFHQ0NzR0FRVUZCd01DQmdnckJnRUZCUWNEDQpBVEFNQmdOVkhSTUJBZjhFQWpBQU1CMEdBMVVkRGdRV0JCU3pwcWpFOEJUK0FKYUg2c3VnRmwxajdqend4REFmDQpCZ05WSFNNRUdEQVdnQlFrNkZOWFhYdzBRSWVwNjVUYnV1RVdlUHdwcERCQUJnZ3JCZ0VGQlFjQkFRUTBNREl3DQpNQVlJS3dZQkJRVUhNQUdHSkdoMGRIQTZMeTl2WTNOd0xtTnNiM1ZrWm14aGNtVXVZMjl0TDI5eWFXZHBibDlqDQpZVEFwQmdOVkhSRUVJakFnZ2c4cUxtMWhkR1IxWjJkaGJpNWpiMjJDRFcxaGRHUjFaMmRoYmk1amIyMHdPQVlEDQpWUjBmQkRFd0x6QXRvQ3VnS1lZbmFIUjBjRG92TDJOeWJDNWpiRzkxWkdac1lYSmxMbU52YlM5dmNtbG5hVzVmDQpZMkV1WTNKc01BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQ3VvUG9KV05VZ0xPRXVmendLRlprMHBvL2tNR29qDQoxYTdCSGEzcWtNWGUrN2J4aW1pQTBvYzcyVEhYSm8zVm82bTIwaGRpbDRiSzVPYzZoTGpiUTFOR2ZXNm84MXk2DQpyUXZEaXBXN3JuL3R3V3hPTkpHTFNDZDZFalpqWXpUUW5EdFBSQWQrVnBwV1BuNUtLZHRSNkM2ZjhaMFlqeldjDQp3b3JLdkRuV2E5b0gycEUzZUNSRUZsc1lRUUtVNWxOYUpibm9nRXNaY2ZDa0MvU0JCaTRaN0lIRnJzWnd1YTU5DQorVDIxUWNOd3BKbExLZ2VRZlpLazMzTFc5MFlyYjRhNStMaTljQzZsVC9MRHdTc20ySkVVVm1nbDJOaC8wV2dpDQpBcHFxUjV5dmUwdUI2M0tTdW90Z2hyWlp0cnNhVW1OYytjRjhneHU4Si8rdXFhaWZQWk83NVZtVw0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQ",
   "watchtower_token": "tx#okr#n+8_wpf%#n9cxr@30vi7wy_@*@69bw+smfic&k^zb8h",
   "docker_username": "username",
   "docker_password": "password",
   "docker_repository": "repository"
}

Base64 encoded private key for SSL along with the watchtower token to access the API and everything else

Terraform file

terraform {
  required_providers {
    digitalocean = {
      source  = "digitalocean/digitalocean"
      version = "2.30.0"
    }
    sops = {
      source  = "carlpett/sops"
      version = "~> 0.5"
    }
    cloudflare = {
      source  = "cloudflare/cloudflare"
      version = "4.17.0"
    }
  }
}

variable "ssh_public_key" {
  type        = string
  description = "SSH public key"
  default     = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCeogciUcb1roDZWVXaTFrMSqU66qlb4YT2GhDMZQm+cM6kxAgl5GY72Yiuir/Sml8pHMvTRPV5ezg+17gSntnBtIbf3wNwuB0F/21l7vGS2XteY6p557cRHZjSFuc2uPiysnI21FfZCrsEJ7uM3Ebyd/zJ394URcWQm54NtVh/QxuHzfuK9QCbxhlsXXFAfTnrWvLVGQkq/R+fjtKy12o42Y59JIsZT4aORSGujDiagBysGOCXonYqRhs9gmdZPkcKUe3r8j6fZRY2l8/QX3D6zhDZ8x74Gi70ojuvR8oCsWs9tB2sF/XQi806G/s/mbhh6hcj7ALyo5Th+jw7I8rj matdevdug@matdevdug-ThinkPad-X1-Carbon-5th"
}

provider "digitalocean" {
  token = "secret_api_key"
}

data "sops_file" "secret" {
  source_file = "secrets.enc.json"
}

locals {
  virtual_machines = {
    "server01" = { vm_size = "s-4vcpu-8gb", zone = "nyc1" },
    "server02" = { vm_size = "s-4vcpu-8gb", zone = "nyc1" },
    "server03" = { vm_size = "s-4vcpu-8gb", zone = "nyc1" },
    "server04" = { vm_size = "s-4vcpu-8gb", zone = "nyc1" }
  }
}

resource "digitalocean_droplet" "web" {
  for_each = local.virtual_machines
  name     = each.key
  image    = "debian-12-x64"
  size     = each.value.vm_size
  region   = each.value.zone
  user_data = templatefile("${path.module}/cloud-init.yaml", {
    init_ssh_public_key = file(var.ssh_public_key)
    private_ssl_key     = data.sops_file.secret.data["private_ssl_key"]
    watchtower_token    = data.sops_file.secret.data["watchtower_token"]
    docker_username     = data.sops_file.secret.data["docker_username"]
    docker_password     = data.sops_file.secret.data["docker_password"]
    docker_repository   = data.sops_file.secret.data["docker_repository"]
  })
}

resource "digitalocean_reserved_ip" "example" {
  for_each   = digitalocean_droplet.web
  droplet_id = each.value.id
  region     = each.value.region
}

Hooking it all together

So we'll need to go back to the Cloudflare terraform and set the reserved_ips we get from the cloud provider as the IPs for the origins. Then we should be able to go through, set Authenticated Origin Pulls up as well as SSL to "Strict" in the Cloudflare control panel. Finally since we have Watchtower set up, all we need to deploy a new version of the application is to write a simple deploy script that curls each one of our servers IP addresses with the Watchtower HTTP Token set to tell it to pull a new version of our container from our registry and deploy it. Read more about that here.

In my testing (which was somewhat limited), even though the scripts needed tweaks and modifications, the underlying concept actually worked pretty well. I was able to see all my traffic coming through Cloudflare easily, the SSL components all worked and whenever I wanted to upgrade a host it was pretty simple to stop traffic to the host in the web UI, reboot or destroy and run Terraform again and then send traffic to it again.

In terms of encryption while my age solution wasn't perfect I think it'll hold together reasonably well. It is a secret value which you can safely commit to source control and rotate the secret pretty easily whenever you want.

Next Steps

Put the whole thing together in a structured Terraform module so it's more reliable and less prone to random breakage
Write out a bunch of different cloud provider options to make it easier to switch between them
Write a simple CLI to remove an origin from the load balancer before running the deploy and then confirming the origin is healthy before sticking it back in (for the requirement of zero-downtime deployments)
Taking a second pass at the encryption story.

Going through this is a useful exercise in explaining why these infrastructure products are so complicated. They're complicated because its hard to do and has a lot of moving parts. Even with the heavy use of existing tooling, this thing turned out to be more complicated than I expected.

Hopefully this has been an interesting thought experiment. I'm excited to take another pass at this idea and potentially turn it into a more usable product. If this was helpful (or if I missed something based), I'm always open to feedback. Especially if you thought of an optimization! https://c.im/@matdevdug

Terraform Cloud Review

September 13, 2023 in DevOps

If I were told to go off and make a hosted Terraform product, I would probably end up with a list of features that looked something like the following:

Extremely reliable state tracking
Assistance with upgrading between versions of Terraform and providers and letting users know when it looked safe to upgrade and when there might be problems between versions
Consistent running of Terraform with a fresh container image each time, providers and versions cached on the host VM so the experience is as fast as possible
As many linting, formatting and HCL optimizations I can offer, configurable on and off
Investing as much engineering work as I can afford in providing users an experience where, unlike with the free Terraform, if a plan succeeds on Terraform Cloud, the Apply will succeed
Assisting with Workspace creation. Since we want to keep the number of resources low, seeing if we can leverage machine learning to say "we think you should group these resources together as their own workspace" and showing you how to do that
Figure out some way for organizations to interact with the Terraform resources other than just running the Terraform CLI, so users can create richer experiences for their teams through easy automation that feeds back into the global source of truth that is my incredibly reliable state tracking
Try to do whatever I can to encourage more resources in my cloud. Unlimited storage, lots of workspaces, helping people set up workspaces. The more stuff in there the more valuable it is for the org to use (and also more logistically challenging for them to cancel)

This is me would be a product I would feel confident charging a lot of money for. Terraform Cloud is not that product. It has some of these features locked behind the most expensive tiers, but not enough of them to justify the price.

I've written about my feelings around the Terraform license change before. I won't bore you with that again. However since now the safest way to use Terraform is to pay Hashicorp, what does that look like? As someone who has used Terraform for years and Terraform Cloud almost daily for a year, it's a profoundly underwhelming experience.

Currently it is a little-loved product with lots of errors and sharp edges. This is as v0.1 of a version of this as I could imagine, except the pace of development has been glacial. Terraform Cloud is a "good enough" platform that seems to understand that if you could do better, you would. Like a diner at 2 AM on the side of the highway, it's primary selling point is the fact that it is there. That and the license terms you will need to accept soon.

Terraform Cloud - Basic Walkthrough

At a high level Terraform Cloud allows organizations to centralize their Projects and Workspaces and store that state with Hashicorp. It also gives you access to a Registry for you to host your own privacy Terraform modules and use them in your workspaces. The top level options look as follows:

You may be wondering "What does Usage do?" I have no idea, as the web UI has never worked for me even though I appear to have all the permissions one could have. I have seen the following since getting my account:

I'm not sure what access I lack or if the page was intended to work. It's very mysterious in that way.

There is Explorer, which lets you basically see "what versions of things do I use across the different repos". You can't do anything with that information, like I can't say "alright well upgrade these two to the version that everyone else uses". It's also a beta feature and not one that existed when I first started using the platform.

Finally there are the Workspaces, where you spend 99% of your time.

You get some ok stats here. Up in the top left you see "Needs Attention", "Errors", "Running", "Hold" and then "Applied." Even though you may have many Workspaces, you cannot change how many you see here. 20 is the correct number I guess.

Creating a Workspace

Workspaces are either based on a repo, CLI driven or you call the API. You tell it what VCS, what repo, if you want to use the root of the repo or a sub-directory (which is good because soon you'll have too many resources for one workspace for everything). You tell it Auto Apply (which is checked by default) or Manual and when to trigger a run (whenever a change, whenever specific files in a path change or whenever you push a tag). That's it.

You can see all the runs, what their status is and basically what resources have changed or will change. Any plan that you run from your laptop also show up here. Now you don't need to manage your runs here, you can still do local, but then there is absolutely no reason to use this product. Almost all of the features rely on your runs being handled by Hashicorp here inside of a Workspace.

Workspace flow

Workspaces show you when the run was, how long the plan took, what resources are associated with this (10 resources at a time even though you might have thousands. Details links you to the last run, there are tags and run triggers. Run triggers allow you to link workspaces together, so this workspace would dependent on the output of another workspace.

The settings are as follows:

Runs is pretty straight forward. States allow you to inspect the state changes directly. So you can see the full JSON of a resource and roll back to this specific state version. This can be nice for reviewing what specifically changed on each resource, but in my experience you don't get much over looking at the actual code. But if you are in a situation where something has suddenly broken and you need a fast way of saying "what was added and what was removed", this is where you would go.

NOTE: BE SUPER CAREFUL WITH THIS

The state inspector has the potential to show TONS of sensitive data. It's all the data in Terraform in the raw form. Just be aware it exists when you start using the service and take a look to ensure there isn't anything you didn't want there.

Variables are variables and the settings allow you to lock the workspace, apply Sentinel settings, set an SSH key for downloading private modules and finally if you want changes to the VCS to trigger an action here. So for instance, when you merge in a PR you can trigger Terraform Cloud to automatically apply this workspace. Nothing super new here compared to any CI/CD system, but still it is all baked in.

That's it!

No-Code Modules

One selling point I heard a lot about, but haven't actually seen anyone use. The idea is good though, where you write premade modules and push them to your private registry. Then members of your organization can just run them to do things like "stand up a template web application stack". Hashicorp has a tutorial here that I ran though and found it to work pretty much as expected. It isn't anywhere near the level of power that I would want, compared to something like Pulumi, but it is a nice step forward for automating truly constant tasks (like adding domain names to an internal domain or provisioning some SSL certificate for testing).

Dynamic Credentials

You can link Terraform Cloud and Vault, if you use it, so you no longer need to stick long-living credentials inside of the Workspace to access cloud providers. Instead you can leverage Vault to get short-lived credentials that improve the security of the Workspaces. I ran through this and did have problems getting it worked for GCP, but AWS seemed to work well. It requires some setup inside of the actual repository, but it's a nice security improvement vs leaving production credentials in this random web application and hoping you don't mess up the user scoping.

User scoping is controlled primarily through "projects", which basically trickle down to the user level. You make a project, which has workspaces, that have their own variables and then assign that to a team or business unit. That same logic is reflected inside of credentials.

Private Registry

This is one thing Hashicorp nailed. It's very easy to hook up Terraform Cloud to allow your workspaces to access internal modules backed by your private repositories. It supports the same documentation options as public modules, tracks downloads and allows for easy versioning control through git tags. I have nothing but good things to say about this entire thing.

Sharing between organizations is something they lock at the top tier, but this seems like a very niche usecase so I don't consider it to be too big of a problem. However if you are someone looking to produce a private provider or module for your customers to use, I would reach out to Hashicorp and see how they want you to do that.

The primary value for this is just to easily store all of your IaC logic in modules and then rely on the versioning inside of different environments to roll out changes. For instance, we do this for things like upgrading a system. Make the change, publish the new version to the private registry and then slowly roll it out. Then you can monitor the rollout through git grep pretty easily.

Pricing

$0.00014 per hour per resource. So a lot of money when you think "every IAM custom role, every DNS record, every SSL certificate, every single thing in your entire organization". You do get a lot of the nice features at this "standard" tier, but I'm kinda shocked they don't unlock all the enterprise features at this price point. No-code provisioning is only available at the higher levels, as well as Drift detection, Continuous validation (checks between runs to see if anything has changed ) as well as Ephemeral workspaces. The last one is a shame, because it looks like a great feature. Set up your workspace to self-destruct at regular intervals so you can nuke development environments. I'd love to use that but alas.

Problems

Oh the problems. So the runners sometimes get "stuck", which seems to usually happen after someone cancels a job in the web UI. You'll run into an issue, try to cancel a job, fix the problem and rerun the runner only to have it get stuck forever. I've sat there and watched it try to load the modules for 45 minutes. There isn't any way I have seen to tell Terraform Cloud "this runner is broken, go get me another one". Sometimes they get stuck for an unknown reason.

Since you need to make all the plans and applies remotely to get any value out of the service, it can also sometimes cause traffic jams in your org. If you work with Terraform a lot, you know you need to run plans pretty regularly. Since you need to wait for a runner every single time, you can end up wasting a lot of time sitting there waiting for another job to finish. Again I'm not sure what triggers you getting another runner. You can self host, but then I'm truly baffled at what value this tool brings.

Even if that was an option for you and you wanted to do it, its locked behind the highest subscription tier. So I can't even say "add a self-hosted runner just for plans" so I could unstick my team. This seems like an obvious add, along with a lot more runner controls so I could see what was happening and how to avoid getting it jammed up.

Conclusion

I feel bad this is so short, but there just isn't anything else to write. This is a super bare-bones tool that does what it says on the box for a lot of money. It doesn't give you a ton of value over Spacelift or or any of the others. I can't recommend it, it doesn't work particularly well and I haven't enjoyed my time with it. Managing it vs using an S3 bucket is an experience I would describe as "marginally better". It's nice that it handles contention across team mates for me, but so do all the others at a lower price.

I cannot think of a single reason to recommend this over Spacelift, which has better pricing, better tooling and seems to have a better runner system except for the license change. Which was clearly the point of the license change. However for those evaluating options, head elsewhere. This thing isn't worth the money.

We need a different name for non-technical tech conferences

September 06, 2023

I recently returned from Google Cloud Next. Typically I wouldn't go to a vendor conference like this, since they're usually thinly veiled sales meetings wearing the trench-coat of a conference. However I've been to a few GCP events and found them to be technical and well-run, so I rolled the dice and hopped on the 11 hour flight from London to San Francisco.

We all piled into Moscone Center and I was pretty hopeful. There were a lot of engineers from Google and other reputable orgs, the list of talks we had signed up for before showing up sounded good, or at least useful. I figured this could be a good opportunity to get some idea of where GCP was going and perhaps hear about some large customers technical workarounds to known limitations and issues with the platform. Then we got to the keynote.

AI. The only topic discussed and the only thing anybody at the executive level cared about was AI. This would become a theme, a constant refrain among every executive-type I spoke to. AI was going to replace customer service, programmers, marketing, copy writers, seemingly every single person in the company except for the executives. It seemed only the VPs and the janitors were safe. None of the leaders I spoke to afterwards seemed to appreciate my observation that if they spent most of their day in meetings being shown slide decks, wouldn't they be the easiest to replace with a robot? Or maybe their replacement could be a mop with sunglasses leaned against an office chair if no robot was available.

I understand keynotes aren't for engineers, but the sense I got from this was "nothing has happened in GCP anywhere else except for AI". This isn't true, like objectively I know new things have been launched, but it sends a pretty clear message that it's not a priority if nobody at the executive level seems to care about them. This is also a concern because Google famously has institutional ADHD with an inability to maintain long-term focus on slowly incrementing and improving a product. Instead it launches amazing products, years ahead of the competition then, like a child bored with a toy, drops them into the backyard and wanders away. But whatever, let's move on from the keynote.

Over the next few days what I was to experience was an event with some fun moments, mostly devoid of any technical discussion whatsoever. Rarely were talks geared towards technical staff, when technical questions came up during the recorded events they were almost never answered. Most importantly there was no presentation I heard that even remotely touched on long-known missing features of GCP when compared to peers or roadmaps. When I asked technical questions, often Google employees would come up to me after the talk with the answer, which I appreciate. But everyone at home and in the future won't get that experience and miss out on the benefit.

Most talks were the GCP products marketing page turned into slides, with a seemingly mandatory reference to AI in each one. Several presenters joked about "that was my required AI callout", which started funny but as time went on I began to worry...maybe they were actually required to mention AI? There are almost no live demos (pre-recorded which is ok but live is more compelling), zero code shown, mostly a tour of existing things the GCP web console could do along with a few new features. I ended up getting more value from finding the PMs of various products on the floor and subjecting to these poor souls to my many questions.

This isn't just a Google problem. Every engineer I spoke to about this talked about a similar time they got burned going to "not a conference conference". From AWS to Salesforce and Facebook, these organizations pitch people on getting facetime with engineers and concrete answers to questions. Instead they're opportunity to try and pitch you on more products, letting executives feel loved by ensuring they get one-on-one time from senior folks in the parent company. They sound great but mostly it's an opportunity to collect stickers.

We need to stop pretending these types of conferences are technical conferences. They're not. It's an opportunity for non-technical people inside of your organization who interact with your technical SaaS providers to get facetime with employees of that company and ask basic questions in a shame-free environment. That has value and should be something that exists, but you should also make sure engineers don't wander into these things.

Here are the 7 things I think you shouldn't do if you call yourself a tech conference.

7 Deadly Sins of "Tech" Conferences

Discussing internal tools that aren't open source and that I can't see or use. It's great if X corp has worked together with Google to make the perfect solution to a common problem. It doesn't mean shit to me if I can't use it or at least see it and ask questions about it. Don't let it into the slide deck if it has zero value to the community outside of showing that "solving this problem is possible".
Not letting people who work with customers talk about common problems. I know, from talking to Google folks and from lots of talks with other customers, common issues people experience with GCP products. Some are misconfigurations or not understanding what the product is good at and designed to do. If you talk about a service, you need to discuss something about "common pitfalls" or "working around frequently seen issues".
Pretending a sales pitch is a talk. Nothing makes me see red like halfway through a talk, inviting the head of sales onto the stage to pitch me on their product. Jesus Christ, there's a whole section of sales stuff, you gotta leave me alone in the middle of talks.
Not allowing a way for people to get questions into the livestream. Now this isn't true for every conference, but if this is the one time a year people can ask questions of the PM for a major product and see if they intend to fix a problem, let me ask that question. I'll gladly submit it beforehand and let people vote on it, or whatever you want. It can't be a free-for-all but there has to be something.
Skipping all specifics. If you are telling me that X service is going to solve all my problems and you have 45 minutes, don't spend 30 explaining how great it is in the abstract. Show me how it solves those problems in detail. Some of the Google presenters did this and I'm extremely grateful to them, but it should have been standard. I saw the "Google is committed to privacy and safety" generic slides so many times across different presentations that I remembered the stock photo of two women looking at code and started trying to read what she had written. I think it was Javascript.
Blurring the line between presenter and sponsor. Most well-run tech conferences I've been to make it super clear when you are hearing from a sponsor vs when someone is giving an unbiased opinion. A lot of these not-tech tech conferences don't, where it sounds like a Google employee is endorsing a third-party solution who has also sponsored the event. For folks new to this environment, it's misleading. Is Google saying this is the only way they endorse doing x?
Keeping all the real content behind NDAs. Now during Next there were a lot of super useful meetings that happened, but I wasn't in them. I had to learn about them from people at the bar who had signed NDAs and were invited to learn actual information. If you aren't going to talk about roadmap or any technical details or improvements publically, don't bother having the conference. Release a PDF with whatever new sales content you want me to read. The folks who are invited to the real meetings can still go to those. No judgement, you don't want to have those chats publically, but don't pretend you might this year.

One last thing: if you are going to have a big conference with people meeting with your team, figure out some way you want them to communicate with that team. Maybe temporary email addresses or something? Most people won't use them, but it means a lot to people to think they have a way of having some line of communication with the company. If they get weird then just deactivate the temp email. It's weird to tell people "just come find me afterwards". Where?

What are big companies supposed to do?

I understand large companies are loathe to share details unless forced to. I also understand that companies hate letting engineers speak directly to the end users, for fear that the people who make the sausage and the people who consume the sausage might learn something terrible about how its made. That is the cost of holding a tech conference about your products. You have to let these two groups of people interact with each other and ask questions.

Now obviously there are plenty of great conferences based on open-source technology or about more general themes. These tend to be really high quality and I've gone to a ton I love. However there is value, as we all become more and more dependent on cloud providers, to letting me know more about what this platform is moving towards. I need to know what platforms like GCP are working on so I know what is the technology inside the stack on the rise and which are on the decline.

Instead these conferences are for investors and the business community instead of anyone interested in the products. The point of Next was to show the community that Google is serious about AI. Just like the point of the last Google conference was to show investors that Google is serious about AI. I'm confident the next conference on any topic Google has will also be asked to demonstrate their serious committment to AI technology.

You can still have these. Call them something else. Call them "leadership conferences" or "vision conferences". Talk to Marketing and see what words you can slap in there that conveys "you are an important person we want to talk about our products with" that also tells me, a technical peon, that you don't want me there. I'll be overjoyed not to fly 11 hours and you'll be thrilled not to have me asking questions of your engineers. Everybody wins.

Terraform is dead; Long live Pulumi?

August 18, 2023

The best tools in tech scale. They're not always easy to learn, they might take some time to get good with but once you start to use them they just stick with you forever. On the command line, things like gawk and sed jump to mind, tools that have saved me more than once. I've spent a decade now using Vim and I work with people who started using Emacs in university and still use it for 5 hours+ a day. You use them for basic problems all the time but when you need that complexity and depth of options, they scale with your problem. In the cloud when I think of tools like this, things like s3 and SQS come to mind, set and forget tooling that you can use from day 1 to day 1000.

Not every tool is like this. I've been using Terraform at least once a week for the last 5 years. I have led migrating two companies to Infrastructure as Code with Terraform from using the web UI of their cloud provider, writing easily tens of thousands of lines of HCL along the way. At first I loved Terraform, HCL felt easy to write, the providers from places like AWS and GCP are well maintained and there are tons of resources on the internet to get you out of any problem.

As the years went on, our relationship soured. Terraform has warts that, at this point, either aren't solvable or aren't something that can be solved without throwing away a lot of previous work. In no particular orders, here are my big issues with Terraform:

It scales poorly. Terraform often starts with dev stage and prod as different workspaces. However since both terraform plan and terraform apply make API calls to your cloud provider for each resource, it doesn't take long for this to start to take a long time. You run plan a lot when working with Terraform, so this isn't a trivial thing.
Then you don't want to repeat yourself, so you start moving more complicated logic into Modules. At this point the environments are completely isolated state files with no mixing, if you try to cross accounts things get more complicated. The basic structure you quickly adopt looks like this.

At some point you need to have better DRY coverage, better environment handling, different backends for different environments and you need to work with multiple modules concurrently. Then you explore Terragrunt which is a great tool, but is now another tool on top of the first Infrastructure as code tool and it works with Terraform Cloud but it requires some tweaks to do so.
Now you and your team realize that Terraform can destroy the entire company if you make a mistake, so you start to subdivide different resources out into different states. Typically you'll have the "stateless resources" in one area and the "stateful" resources in another, but actually dividing stuff up into one or another isn't completely straightforward. Destroying an SQS queue is really bad, but is it stateful? Kubernetes nodes don't have state but they're not instantaneous to fix either.
HCL isn't a programming language. It's a fine alternative to YAML or JSON, but it lacks a lot of the tooling you want when dealing with more complex scenarios. You can do many of the normal things like conditionals, joins, trys, loops, for_each, but they're clunky and limited when compared to something like Golang or Python.
The tooling around HCL is pretty barebones. You get some syntax checking, but otherwise it's a lot of switching tmux panes to figure out why it worked one place and didn't work another place.
terraform validate and terraform plan don't mean the thing is going to work. You can write something, it'll pass both check stages and fail on apply. This can be really bad as your team needs to basically wait for you to fix whatever you did so the infrastructure isn't in an inconsistent place or half working. This shouldn't happen in theory but its a common problem.
If an apply fails, it's not always possible to back out. This is especially scary when there are timeouts, when something is still happening inside of the providers stack but now Terraform has given up on knowing what state it was left in.
Versioning is bad. Typically whatever version of Terraform you started with is what you have until someone decides to try to upgrade and hope nothing breaks. tfenv becomes a mission critical tool. Provider version drift is common, again typically "whatever the latest version was when someone wrote this module".

License Change

All of this is annoying, but I've learned to grumble and live with it. Then HashiCorp decided to pull the panic lever of "open-source" companies which is a big license change. Even though Terraform Cloud, their money-making product, was never open-source, they decided that the Terraform CLI needed to fall under the BSL. You can read it here. The specific clause people are getting upset about is below:

You may make production use of the Licensed Work,
provided such use does not include offering the Licensed Work to third parties on a hosted or embedded basis which is competitive with HashiCorp's products.

Now this clause, combined with the 4 year expiration date, effectively kills the Terraform ecosystem. Nobody is going to authorize internal teams to open-source any complementary tooling with the BSL in place and there certainly isn't going to be any competitive pressure to improve Terraform. While it doesn't, at least how I read it as not a lawyer, really impact most usage of Terraform as just a tool that you run on your laptop, it does make the future of Terraform development directly tied to Terraform Cloud. This wouldn't be a problem except Terraform Cloud is bad.

Terraform Cloud

I've used it for a year, it's extremely bare-bones software. It picks the latest version when you make the workspace of Terraform and then that's it. It doesn't help you upgrade Terraform, it doesn't really do any checking or optimizations, structure suggestions or anything else you need as Terraform scales. It sorta integrates with Terragrunt but not really. Basically it is identical to the CLI output of Terraform with some slight visual dressing. Then there's the kicker: the price.

$0.00014 per resource per hour. This is predatory pricing. First, because Terraform drops in value to zero if you can't put everything into Infrastructure as Code. HashiCorp knows this, hence the per-resource price. Second because they know it's impossible for me, the maintainer of the account, to police. What am I supposed to do, tell people "no you cannot have a custom IAM policy because we can't have people writing safe scoped roles"? Maybe I should start forcing subdomain sharing, make sure we don't get too spoiled with all these free hostnames. Finally it's especially grating because we're talking about sticking small collections of JSON onto object storage. There's no engineering per resource, no scaling concerns on HashiCorp's side and disk space is cheap to boot.

This combined with the license change is enough for me. I'm out. I'll deal with some grief to use your product, but at this point HashiCorp has overplayed the value of Terraform. It's a clunky tool that scales poorly and I need to do all the scaling and upgrade work myself with third-party tools, even if I pay you for your cloud product. The per-hour pricing is just the final nail in the coffin from HashiCorp.

I asked around for an alternative and someone recommend Pulumi. I've never heard of them before, so I thought this could be a super fun opportunity to try them out.

Pulumi

Pulumi and Terraform are similar, except unlike Terraform with HCL, Pulumi has lots of scale built in. Why? Because you can use a real programming language to write your Infrastructure as Code. It's a clever concept, letting you scale up the complexity of your project from writing just YAML to writing Golang or Python.

Here is the basic outline of how Pulumi structures infrastructure.

You write programs inside of projects with Nodejs, Python, Golang, .Net, Java or YAML. Programs define resources. You then run the programs inside of stacks, which are different environments. It's nice that Pulumi comes with the project structure defined vs Terraform you define it yourself. Every stack has its own state out of the box which again is a built-in optimization.

Installation was easy and they had all the expected install options. Going through the source code I was impressed with the quality, but was concerned about the 1,718 open issues as of writing this. Clicking around it does seem like they're actively working on them and it has your normal percentage of "not real issues but just people opening them as issues" problem. Also a lot of open issues with comments suggests an engaged user base. The setup on my side was very easy and I opted not to use their cloud product, mostly because it has the same problem that Terraform Cloud has.

A Pulumi Credit is the price for managing one resource for one hour. If using the Team Edition, each credit costs $0.0005. For billing purposes, we count any resource that's declared in a Pulumi program. This includes provider resources (e.g., an Amazon S3 bucket), component resourceswhich are groupings of resources (e.g., an Amazon EKS cluster), and stacks which contain resources (e.g., dev, test, prod stacks).

You consume one Pulumi Credit to manage each resource for an hour. For example, one stack containing one S3 bucket and one EC2 instance is three resources that are counted in your bill. Example: If you manage 625 resources with Pulumi every month, you will use 450,000 Pulumi Credits each month. Your monthly bill would be $150 USD = (450,000 total credits - 150,000 free credits) * $0.0005.

My mouth was actually agape when I got to that monthly bill. I get 150k credits for "free" with Teams which is 200 resources a month. That is absolutely nothing. That's "my DNS records live in Infrastructure as Code". But paying per hour doesn't even unlock all the features! I'm limited on team size, I don't get SSO, I don't get support. Also you are the smaller player, how do you charge more than HashiCorp? Disk space is real cheap and these files are very small. Charge me $99 a month per runner or per user or whatever you need to, but I don't want to ask the question "are we putting too much of our infrastructure into code". It's either all in there or there's zero point and this pricing works directly against that goal.

Alright so Pulumi Cloud is out. Maybe the Enterprise pricing is better but that's not on the website so I can't make a decision based on that. I can't mentally handle getting on another sales email list. Thankfully Pulumi has state locking with S3 now according to this so this isn't a deal-breaker. Let's see what running it just locally looks like.

Pulumi Open-Source only

Thankfully they make that pretty easy. pulumi login --local means your state is stored locally, encrypted with a passphrase. To use s3 just switch that to pulumi login s3:// Now managing state locally or using S3 isn't a new thing, but it's nice that switching between them is pretty easy. You can start local, grow to S3 and then migrate to their Cloud product as you need. Run pulumi new python for a new blank Python setup.

❯ pulumi new python
This command will walk you through creating a new Pulumi project.

Enter a value or leave blank to accept the (default), and press <ENTER>.
Press ^C at any time to quit.

project name: (test) test
project description: (A minimal Python Pulumi program)
Created project 'test'

stack name: (dev)
Created stack 'dev'
Enter your passphrase to protect config/secrets:
Re-enter your passphrase to confirm:

Installing dependencies...

Creating virtual environment...
Finished creating virtual environment
Updating pip, setuptools, and wheel in virtual environment...

I love that it does all the correct Python things. We have a venv, we've got a requirements.txt and we've got a simple configuration file. Working with it was delightful. Setting my Hetzner API key as a secret was easy and straight-forward with: pulumi config set hcloud:token XXXXXXXXXXXXXX --secret. So what does working with it look like. Let's look at an error.

❯ pulumi preview
Enter your passphrase to unlock config/secrets
    (set PULUMI_CONFIG_PASSPHRASE or PULUMI_CONFIG_PASSPHRASE_FILE to remember):
Previewing update (dev):
     Type                 Name               Plan     Info
     pulumi:pulumi:Stack  matduggan.com-dev           1 error


Diagnostics:
  pulumi:pulumi:Stack (matduggan.com-dev):
    error: Program failed with an unhandled exception:
    Traceback (most recent call last):
      File "/opt/homebrew/bin/pulumi-language-python-exec", line 197, in <module>
        loop.run_until_complete(coro)
      File "/opt/homebrew/Cellar/[email protected]/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
        return future.result()
               ^^^^^^^^^^^^^^^
      File "/Users/mathew.duggan/Documents/work/pulumi/venv/lib/python3.11/site-packages/pulumi/runtime/stack.py", line 137, in run_in_stack
        await run_pulumi_func(lambda: Stack(func))
      File "/Users/mathew.duggan/Documents/work/pulumi/venv/lib/python3.11/site-packages/pulumi/runtime/stack.py", line 49, in run_pulumi_func
        func()
      File "/Users/mathew.duggan/Documents/work/pulumi/venv/lib/python3.11/site-packages/pulumi/runtime/stack.py", line 137, in <lambda>
        await run_pulumi_func(lambda: Stack(func))
                                      ^^^^^^^^^^^
      File "/Users/mathew.duggan/Documents/work/pulumi/venv/lib/python3.11/site-packages/pulumi/runtime/stack.py", line 160, in __init__
        func()
      File "/opt/homebrew/bin/pulumi-language-python-exec", line 165, in run
        return runpy.run_path(args.PROGRAM, run_name='__main__')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "<frozen runpy>", line 304, in run_path
      File "<frozen runpy>", line 240, in _get_main_module_details
      File "<frozen runpy>", line 159, in _get_module_details
      File "<frozen importlib._bootstrap_external>", line 1074, in get_code
      File "<frozen importlib._bootstrap_external>", line 1004, in source_to_code
      File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
      File "/Users/mathew.duggan/Documents/work/pulumi/__main__.py", line 14
        )], user_data="""
                      ^
    SyntaxError: unterminated triple-quoted string literal (detected at line 17)

We get all the super clear output of a Python error message, we still get the secrets encryption and we get all the options of Python when writing the file. However things get a little unusual when I go to inspect the state files.

Local State Files

For some reason when I select local, Pulumi doesn't store the state files in the same directory as where I'm working. Instead it stores them as a user preference at ~/.pulumi which is odd. I understand I selected local, but it's weird to assume I don't want to store the state in git or something. It is also storing a lot of things in my user directory: 358 directories, 848 files. Every template is its own directory.

How can you set it up to work correctly?

rm -rf ~/.pulumi
mkdir test && cd test
mkdir pulumi
pulumi login file://pulumi/
pulumi new --force python
cd ~/.pulumi
336 directories, 815 files

If you go back into the directory and go to /test/pulumi/.pulumi you do see the state files. The force flag is required to let it create the new project inside a directory with stuff already in it. It all ends up working but it's clunky.

Maybe I'm alone on this, but I feel like this is unnecessarily complicated. If I'm going to work locally, the assumption should be I'm going to sit this inside of a repo. Or at the very least I'm going to expect the directory to be a self-contained thing. Also don't put stuff at $HOME/.pulumi. The correct location is ~/.config. I understand nobody follows that rule but the right places to put it are: in the directory where I make the project or in ~/.config.

S3-compatible State

Since this is the more common workflow, let me talk a bit about S3 remote backend. I tried to do a lot of testing to cover as many use-cases as possible. The lockfile works and is per stack, so you do have that basic level of functionality. Stacks cannot reference each other's outputs unless they are in the same bucket as far as I can tell, so you would need to plan for one bucket. Sharing stack names across multiple projects works, so you don't need to worry that every project has a dev, stage and prod. State encryption is your problem, but that's pretty easy to deal with in modern object storage.

The login process is basically pulumi login 's3://?region=us-east-1&awssdk=v2&profile=' and for GCP pulumi login gs://. You can see all the custom backend setup docs here. I also moved between custom backends, going from local to s3 and from s3 to GCP. It all functioned like I would expect, which was nice.

Otherwise nothing exciting to report. In my testing it worked as well as local, and trying to break it with a few folks working on the same repo didn't reveal any obvious problems. It seems as reliable as Terraform in S3, which is to say not perfect but pretty good.

Daily use

Once Pulumi was set up to use object storage, I tried to use it to manage a non-production project in Google Cloud along with someone else who agreed to work with me on it. I figured with at least two people doing the work, the experience would be more realistic.

Compared to working with Terraform, I felt like Pulumi was easier to use. Having all of the options and autocomplete of an IDE available to me when I wanted it really sped things up, plus handling edge cases that previously would have required a lot of very sensitive HCL were very simple with Python. I also liked being able to write tests for infrastructure code, which made things like database operations feel less dangerous. In Terraform the only safety check is whoever is looking at the output, so having another level of checking before potentially destroying resources was nice.

While Pulumi does provide more opinions on how to structure it, even with two of us there were quickly some disagreements on the right way to do things. I prefer more of a monolithic design and my peer prefers smaller stacks, which you can do but I find chaining together the stack output to be more work than its worth. I found the micro-service style in Pulumi to be a bit grating and easy to break, while the monolithic style was much easier for me to work in.

Setting up a CI/CD pipeline wasn't too challenging, basing everything off of this image. All the CI/CD docs on their website presuppose you are using the Cloud product, which again makes sense and I would be glad to do if they changed the pricing. However rolling your own isn't hard, it works as expected, but I want to point out one sticking point I ran into that isn't really Pulumi's fault so much as it is "the complexity of adding in secrets support".

Pulumi Secrets

So Pulumi integrates with a lot of secret managers, which is great. It also has its own secret manager which works fine. The key things to keep in mind are: if you are adding a secret, make sure you flag it as a secret to keep it from getting printed on the output. If you are going to use an external secrets manager, set aside some time to get that working. It took a bit of work to get the permissions such that CI/CD and everything else worked as expected, especially with the micro-service design where one program relied on the output of another program. You can read the docs here.

Unexpected Benefits

Here are some delightful (maybe obvious) things I ran into while working with Pulumi.

We already have experts in these languages. It was great to be able to ask someone with years of Python development experience "what is the best way to structure large Python projects". There is so much expertise and documentation out there vs the wasteland that is Terraform project architecture.
Being able to use a database. Holy crap, this was a real game-changer to me. I pulled down the GCP IAM stock roles, stuck them in SQLite and then was able to query them depending on the set of permissions the service account or user group required. Very small thing, but a massive time-saver vs me going to the website and searching around. It also lets me automate the entire process of Ticket -> PR for IAM role.

You can set up easy APIs. Making a website that generates HCL to stick into a repo and then make a PR? Nightmare. Writing a simple Flask app that runs Pulumi against your infrastructure with scoped permissions? Not bad at all. If your org does something like "add a lot of DNS records" or "add a lot of SSH keys", this really has the potential to change your workday. Also it's easy to set up an abstraction for your entire Infrastructure. Pulumi has docs on how to get started with all of this here. Slack bots, simple command-line tools, all of it was easy to do.
Tests. It's nice to be able to treat infrastructure like its important.
Getting better at a real job skill. Every hour I get more skilled in writing Golang, I'm more valuable to my organization. I'm also just getting more hours writing code in an actual programming language, which is always good. Every hour I invest in HCL is an hour I invested in something that no other tool will ever use.
Speed seemed faster than Terraform. I don't know why that would be, but it did feel like especially on successive previews the results just came much faster. This was true on our CI/CD jobs as well, timing them against Terraform it seemed like Pulumi was faster most of the time. Take this with a pile of salt, I didn't do a real benchmark and ultimately we're hitting the same APIs, so I doubt there's a giant performance difference.

Conclusion

Do I think Pulumi can take over the Terraform throne? There's a lot to like here. The product is one of those great ideas, a natural evolution from where we started in DevOps to where we want to go. Moving towards treating infrastructure like everything else is the next logical leap and they have already done a lot of the ground work. I want Pulumi to succeed, I like it as a product.

However it needs to get out of its own way. The pricing needs a rethink, make it a no-brainier for me to use your cloud product and get fully integrated into it. If you give me a reliable, consistent bill I can present to leadership, I don't have to worry about Pulumi as a service I need to police. The entire organization can be let loose to write whatever infra they need, which benefits us and Pulumi as we'll be more dependent on their internal tooling.

If cost management is a big issue, have me bring my own object storage and VMs for runners. Pulumi can still thrive and be very successfully without being a zero-setup business. This is a tool for people who maintain large infrastructures. We can handle some infrastructure requirements if that is the sticking point.

Hopefully the folks running Pulumi see this moment as the opportunity it is, both for the field at large to move past markup languages and for them to make a grab for a large share of the market.

If there is interest I can do more write-ups on sample Flask apps or Slack bots or whatever. Also if I made a mistake or you think something needs clarification, feel free to reach out to me here: https://c.im/@matdevdug.

Adventures in IPv6 Part 2

August 08, 2023

As I discussed in Part 1 I've converted this site over to pure IPv6. Well at least as pure as I could get away with. I still have some problems though, chief among them that I cannot send emails with the Ghost CMS. I've switched from Mailgun to Scaleway which does have IPv6 for their SMTP service.

smtp.tem.scw.cloud has IPv6 address 2001:bc8:1201:21:d6ae:52ff:fed0:418e
smtp.tem.scw.cloud has IPv6 address 2001:bc8:1201:21:d6ae:52ff:fed0:6aac

I've also confirmed that my docker-compose stack running Ghost can successfully reach IPv6 external addresses with no issues.

matdevdug-busy-1      | PING google.com (2a00:1450:4002:411::200e): 56 data bytes
matdevdug-busy-1      | 64 bytes from 2a00:1450:4002:411::200e: seq=0 ttl=113 time=15.079 ms
matdevdug-busy-1      | 64 bytes from 2a00:1450:4002:411::200e: seq=1 ttl=113 time=14.607 ms
matdevdug-busy-1      | 64 bytes from 2a00:1450:4002:411::200e: seq=2 ttl=113 time=14.540 ms
matdevdug-busy-1      | 64 bytes from 2a00:1450:4002:411::200e: seq=3 ttl=113 time=14.593 ms
matdevdug-busy-1      |
matdevdug-busy-1      |
matdevdug-busy-1      | --- google.com ping statistics ---
matdevdug-busy-1      | 4 packets transmitted, 4 packets received, 0% packet loss
matdevdug-busy-1      | round-trip min/avg/max = 14.540/14.704/15.079 ms

I've also confirmed that Scaleway is reachable by the container no problem with the domain name, so it isn't a DNS problem.

PING smtp.tem.scw.cloud(ff6ad116-d710-4726-b5d3-1687dceb56cb.fr-par-2.baremetal.scw.cloud (2001:bc8:1201:21:d6ae:52ff:fed0:6aac)) 56 data bytes
64 bytes from ff6ad116-d710-4726-b5d3-1687dceb56cb.fr-par-2.baremetal.scw.cloud (2001:bc8:1201:21:d6ae:52ff:fed0:6aac): icmp_seq=1 ttl=53 time=23.1 ms
64 bytes from ff6ad116-d710-4726-b5d3-1687dceb56cb.fr-par-2.baremetal.scw.cloud (2001:bc8:1201:21:d6ae:52ff:fed0:6aac): icmp_seq=2 ttl=53 time=22.2 ms
64 bytes from ff6ad116-d710-4726-b5d3-1687dceb56cb.fr-par-2.baremetal.scw.cloud (2001:bc8:1201:21:d6ae:52ff:fed0:6aac): icmp_seq=3 ttl=53 time=22.2 ms
64 bytes from ff6ad116-d710-4726-b5d3-1687dceb56cb.fr-par-2.baremetal.scw.cloud (2001:bc8:1201:21:d6ae:52ff:fed0:6aac): icmp_seq=4 ttl=53 time=22.1 ms

--- smtp.tem.scw.cloud ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 22.086/22.397/23.063/0.388 ms

At this point I have three theories.

It's an SMTP problem. Possible, but unlikely given how long SMTP has supported IPv6. A quick check by running it over bash by following the instructions here shows that works fine.
Something is blocking the port.

telnet smtp.tem.scw.cloud 587
Trying 2001:bc8:1201:21:d6ae:52ff:fed0:6aac...
Connected to smtp.tem.scw.cloud.
Escape character is '^]'.
220 smtp.scw-tem.cloud ESMTP Service Ready

Alright it's not that.

3. Nodemailer is being stupid. It looks like Ghost relies on Nodemailer so let's check it out. Let's install Node and NPM on my debian junk machine.

sudo apt install npm
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  eslint gyp handlebars libjs-async libjs-events libjs-inherits libjs-is-typedarray libjs-prettify libjs-regenerate libjs-source-map
  libjs-sprintf-js libjs-typedarray-to-buffer libjs-util libnode-dev libssl-dev libuv1-dev node-abbrev node-agent-base node-ajv node-ajv-keywords
  node-ampproject-remapping node-ansi-escapes node-ansi-regex node-ansi-styles node-anymatch node-aproba node-archy node-are-we-there-yet
  node-argparse node-arrify node-assert node-async node-async-each node-babel-helper-define-polyfill-provider node-babel-plugin-add-module-exports
  node-babel-plugin-lodash node-babel-plugin-polyfill-corejs2 node-babel-plugin-polyfill-corejs3 node-babel-plugin-polyfill-regenerator node-babel7
  node-babel7-runtime node-balanced-match node-base64-js node-binary-extensions node-brace-expansion node-braces node-browserslist node-builtins
  node-cacache node-camelcase node-caniuse-lite node-chalk node-chokidar node-chownr node-chrome-trace-event node-ci-info node-cli-table node-cliui
  node-clone node-clone-deep node-color-convert node-color-name node-colors node-columnify node-commander node-commondir node-concat-stream
  node-console-control-strings node-convert-source-map node-copy-concurrently node-core-js node-core-js-compat node-core-js-pure node-core-util-is
  node-css-loader node-css-selector-tokenizer node-data-uri-to-buffer node-debbundle-es-to-primitive node-debug node-decamelize
  node-decompress-response node-deep-equal node-deep-is node-defaults node-define-properties node-defined node-del node-delegates node-depd
  node-diff node-doctrine node-electron-to-chromium node-encoding node-end-of-stream node-enhanced-resolve node-err-code node-errno node-error-ex
  node-es-abstract node-es-module-lexer node-es6-error node-escape-string-regexp node-escodegen node-eslint-scope node-eslint-utils
  node-eslint-visitor-keys node-espree node-esprima node-esquery node-esrecurse node-estraverse node-esutils node-events node-fancy-log
  node-fast-deep-equal node-fast-levenshtein node-fetch node-file-entry-cache node-fill-range node-find-cache-dir node-find-up node-flat-cache
  node-flatted node-for-in node-for-own node-foreground-child node-fs-readdir-recursive node-fs-write-stream-atomic node-fs.realpath
  node-function-bind node-functional-red-black-tree node-gauge node-get-caller-file node-get-stream node-glob node-glob-parent node-globals
  node-globby node-got node-graceful-fs node-gyp node-has-flag node-has-unicode node-hosted-git-info node-https-proxy-agent node-iconv-lite
  node-icss-utils node-ieee754 node-iferr node-ignore node-imurmurhash node-indent-string node-inflight node-inherits node-ini node-interpret
  node-ip node-ip-regex node-is-arrayish node-is-binary-path node-is-buffer node-is-extendable node-is-extglob node-is-glob node-is-number
  node-is-path-cwd node-is-path-inside node-is-plain-obj node-is-plain-object node-is-stream node-is-typedarray node-is-windows node-isarray
  node-isexe node-isobject node-istanbul node-jest-debbundle node-jest-worker node-js-tokens node-js-yaml node-jsesc node-json-buffer
  node-json-parse-better-errors node-json-schema node-json-schema-traverse node-json-stable-stringify node-json5 node-jsonify node-jsonparse
  node-kind-of node-levn node-loader-runner node-locate-path node-lodash node-lodash-packages node-lowercase-keys node-lru-cache node-make-dir
  node-memfs node-memory-fs node-merge-stream node-micromatch node-mime node-mime-types node-mimic-response node-minimatch node-minimist
  node-minipass node-mkdirp node-move-concurrently node-ms node-mute-stream node-n3 node-negotiator node-neo-async node-nopt
  node-normalize-package-data node-normalize-path node-npm-bundled node-npm-package-arg node-npm-run-path node-npmlog node-object-assign
  node-object-inspect node-once node-optimist node-optionator node-osenv node-p-cancelable node-p-limit node-p-locate node-p-map node-parse-json
  node-path-dirname node-path-exists node-path-is-absolute node-path-is-inside node-path-type node-picocolors node-pify node-pkg-dir node-postcss
  node-postcss-modules-extract-imports node-postcss-modules-values node-postcss-value-parser node-prelude-ls node-process-nextick-args node-progress
  node-promise-inflight node-promise-retry node-promzard node-prr node-pump node-punycode node-quick-lru node-randombytes node-read
  node-read-package-json node-read-pkg node-readable-stream node-readdirp node-rechoir node-regenerate node-regenerate-unicode-properties
  node-regenerator-runtime node-regenerator-transform node-regexpp node-regexpu-core node-regjsgen node-regjsparser node-repeat-string
  node-require-directory node-resolve node-resolve-cwd node-resolve-from node-resumer node-retry node-rimraf node-run-queue node-safe-buffer
  node-schema-utils node-semver node-serialize-javascript node-set-blocking node-set-immediate-shim node-shebang-command node-shebang-regex
  node-signal-exit node-slash node-slice-ansi node-source-list-map node-source-map node-source-map-support node-spdx-correct node-spdx-exceptions
  node-spdx-expression-parse node-spdx-license-ids node-sprintf-js node-ssri node-string-decoder node-string-width node-strip-ansi node-strip-bom
  node-strip-json-comments node-supports-color node-tapable node-tape node-tar node-terser node-text-table node-through node-time-stamp
  node-to-fast-properties node-to-regex-range node-tslib node-type-check node-typedarray node-typedarray-to-buffer
  node-unicode-canonical-property-names-ecmascript node-unicode-match-property-ecmascript node-unicode-match-property-value-ecmascript
  node-unicode-property-aliases-ecmascript node-unique-filename node-uri-js node-util node-util-deprecate node-uuid node-v8-compile-cache
  node-v8flags node-validate-npm-package-license node-validate-npm-package-name node-watchpack node-wcwidth.js node-webassemblyjs
  node-webpack-sources node-which node-wide-align node-wordwrap node-wrap-ansi node-wrappy node-write node-write-file-atomic node-y18n node-yallist
  node-yargs node-yargs-parser terser webpack
Suggested packages:
  node-babel-eslint node-esprima-fb node-inquirer libjs-angularjs libssl-doc node-babel-plugin-polyfill-es-shims node-babel7-debug javascript-common
  livescript chai node-jest-diff node-opener
Recommended packages:
  javascript-common build-essential node-tap
The following NEW packages will be installed:
  eslint gyp handlebars libjs-async libjs-events libjs-inherits libjs-is-typedarray libjs-prettify libjs-regenerate libjs-source-map
  libjs-sprintf-js libjs-typedarray-to-buffer libjs-util libnode-dev libssl-dev libuv1-dev node-abbrev node-agent-base node-ajv node-ajv-keywords
  node-ampproject-remapping node-ansi-escapes node-ansi-regex node-ansi-styles node-anymatch node-aproba node-archy node-are-we-there-yet
  node-argparse node-arrify node-assert node-async node-async-each node-babel-helper-define-polyfill-provider node-babel-plugin-add-module-exports
  node-babel-plugin-lodash node-babel-plugin-polyfill-corejs2 node-babel-plugin-polyfill-corejs3 node-babel-plugin-polyfill-regenerator node-babel7
  node-babel7-runtime node-balanced-match node-base64-js node-binary-extensions node-brace-expansion node-braces node-browserslist node-builtins
  node-cacache node-camelcase node-caniuse-lite node-chalk node-chokidar node-chownr node-chrome-trace-event node-ci-info node-cli-table node-cliui
  node-clone node-clone-deep node-color-convert node-color-name node-colors node-columnify node-commander node-commondir node-concat-stream
  node-console-control-strings node-convert-source-map node-copy-concurrently node-core-js node-core-js-compat node-core-js-pure node-core-util-is
  node-css-loader node-css-selector-tokenizer node-data-uri-to-buffer node-debbundle-es-to-primitive node-debug node-decamelize
  node-decompress-response node-deep-equal node-deep-is node-defaults node-define-properties node-defined node-del node-delegates node-depd
  node-diff node-doctrine node-electron-to-chromium node-encoding node-end-of-stream node-enhanced-resolve node-err-code node-errno node-error-ex
  node-es-abstract node-es-module-lexer node-es6-error node-escape-string-regexp node-escodegen node-eslint-scope node-eslint-utils
  node-eslint-visitor-keys node-espree node-esprima node-esquery node-esrecurse node-estraverse node-esutils node-events node-fancy-log
  node-fast-deep-equal node-fast-levenshtein node-fetch node-file-entry-cache node-fill-range node-find-cache-dir node-find-up node-flat-cache
  node-flatted node-for-in node-for-own node-foreground-child node-fs-readdir-recursive node-fs-write-stream-atomic node-fs.realpath
  node-function-bind node-functional-red-black-tree node-gauge node-get-caller-file node-get-stream node-glob node-glob-parent node-globals
  node-globby node-got node-graceful-fs node-gyp node-has-flag node-has-unicode node-hosted-git-info node-https-proxy-agent node-iconv-lite
  node-icss-utils node-ieee754 node-iferr node-ignore node-imurmurhash node-indent-string node-inflight node-inherits node-ini node-interpret
  node-ip node-ip-regex node-is-arrayish node-is-binary-path node-is-buffer node-is-extendable node-is-extglob node-is-glob node-is-number
  node-is-path-cwd node-is-path-inside node-is-plain-obj node-is-plain-object node-is-stream node-is-typedarray node-is-windows node-isarray
  node-isexe node-isobject node-istanbul node-jest-debbundle node-jest-worker node-js-tokens node-js-yaml node-jsesc node-json-buffer
  node-json-parse-better-errors node-json-schema node-json-schema-traverse node-json-stable-stringify node-json5 node-jsonify node-jsonparse
  node-kind-of node-levn node-loader-runner node-locate-path node-lodash node-lodash-packages node-lowercase-keys node-lru-cache node-make-dir
  node-memfs node-memory-fs node-merge-stream node-micromatch node-mime node-mime-types node-mimic-response node-minimatch node-minimist
  node-minipass node-mkdirp node-move-concurrently node-ms node-mute-stream node-n3 node-negotiator node-neo-async node-nopt
  node-normalize-package-data node-normalize-path node-npm-bundled node-npm-package-arg node-npm-run-path node-npmlog node-object-assign
  node-object-inspect node-once node-optimist node-optionator node-osenv node-p-cancelable node-p-limit node-p-locate node-p-map node-parse-json
  node-path-dirname node-path-exists node-path-is-absolute node-path-is-inside node-path-type node-picocolors node-pify node-pkg-dir node-postcss
  node-postcss-modules-extract-imports node-postcss-modules-values node-postcss-value-parser node-prelude-ls node-process-nextick-args node-progress
  node-promise-inflight node-promise-retry node-promzard node-prr node-pump node-punycode node-quick-lru node-randombytes node-read
  node-read-package-json node-read-pkg node-readable-stream node-readdirp node-rechoir node-regenerate node-regenerate-unicode-properties
  node-regenerator-runtime node-regenerator-transform node-regexpp node-regexpu-core node-regjsgen node-regjsparser node-repeat-string
  node-require-directory node-resolve node-resolve-cwd node-resolve-from node-resumer node-retry node-rimraf node-run-queue node-safe-buffer
  node-schema-utils node-semver node-serialize-javascript node-set-blocking node-set-immediate-shim node-shebang-command node-shebang-regex
  node-signal-exit node-slash node-slice-ansi node-source-list-map node-source-map node-source-map-support node-spdx-correct node-spdx-exceptions
  node-spdx-expression-parse node-spdx-license-ids node-sprintf-js node-ssri node-string-decoder node-string-width node-strip-ansi node-strip-bom
  node-strip-json-comments node-supports-color node-tapable node-tape node-tar node-terser node-text-table node-through node-time-stamp
  node-to-fast-properties node-to-regex-range node-tslib node-type-check node-typedarray node-typedarray-to-buffer
  node-unicode-canonical-property-names-ecmascript node-unicode-match-property-ecmascript node-unicode-match-property-value-ecmascript
  node-unicode-property-aliases-ecmascript node-unique-filename node-uri-js node-util node-util-deprecate node-uuid node-v8-compile-cache
  node-v8flags node-validate-npm-package-license node-validate-npm-package-name node-watchpack node-wcwidth.js node-webassemblyjs
  node-webpack-sources node-which node-wide-align node-wordwrap node-wrap-ansi node-wrappy node-write node-write-file-atomic node-y18n node-yallist
  node-yargs node-yargs-parser npm terser webpack
0 upgraded, 349 newly installed, 0 to remove and 1 not upgraded.
Need to get 13.8 MB of archives.
After this operation, 106 MB of additional disk space will be used.
Do you want to continue? [Y/n]

Jesus Christ NPM, what is happening

Now that I have that nightmare factory installed.

"use strict";
const nodemailer = require("nodemailer");

const transporter = nodemailer.createTransport({
  host: "smtp.tem.scw.cloud",
  port: 587,
  // Just so I don't need to worry about it
  secure: false,
  auth: {
    // TODO: replace `user` and `pass` values from <https://forwardemail.net>
    user: 'scaleway-user-name',
    pass: 'scaleway-password'
  }
});

// async..await is not allowed in global scope, must use a wrapper
async function main() {
  // send mail with defined transport object
  const info = await transporter.sendMail({
    from: '"Dead People 👻" <[email protected]>', // sender address
    to: "[email protected]", // list of receivers
    subject: "Hello", // Subject line
    text: "Hello world", // plain text body
    html: "<b>Hello world?</b>", // html body
  });

  console.log("Message sent: %s", info.messageId);
}

main().catch(console.error);

Looks like Nodemailer doesn't seem to understand this is an IPv6 box.

node example.js
Error: connect ENETUNREACH 51.159.99.81:587 - Local (0.0.0.0:0)
    at internalConnect (node:net:1060:16)
    at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
    at node:net:1244:9
    at process.processTicksAndRejections (node:internal/process/task_queues:77:11) {
  errno: -101,
  code: 'ESOCKET',
  syscall: 'connect',
  address: '51.159.99.81',
  port: 587,
  command: 'CONN'
}

It looks like this should have been fixed here: https://github.com/nodemailer/nodemailer/pull/1311 but clearly isn't. What happens if I just manually set the IPv6 address.

Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 2001:bc8:1201:21:d6ae:52ff:fed0:6aac is not in the cert's list:

However if you set it to use an IP for host and a DNS entry for hostname, everything seems to work great.

"use strict";
const nodemailer = require("nodemailer");

const transporter = nodemailer.createTransport({
  host: "2001:bc8:1201:21:d6ae:52ff:fed0:6aac",
  port: 587,
  secure: false,
  tls: {
    rejectUnauthorized: true,
    servername: "smtp.tem.scw.cloud"},
  auth: {
    user: 'scaleway-username',
    pass: 'scaleway-password'
  }
});

// async..await is not allowed in global scope, must use a wrapper
async function main() {
  // send mail with defined transport object
  const info = await transporter.sendMail({
    from: '"Test" <[email protected]>', // sender address
    to: [email protected]", // list of receivers
    subject: "Hello ✔", // Subject line
    text: "Hello world?", // plain text body
    html: "<b>Hello world?</b>", // html body
  });

  console.log("Message sent: %s", info.messageId);
}

main().catch(console.error);

Alright well issue submitted here: https://github.com/TryGhost/Ghost/issues/17627

It is a little alarming that the biggest Node email package doesn't work with IPv6 and seemingly only one person noticed and tried to fix it. Well whatever, we have a workaround.

Python

Alright let's try to fix the pip problems I was seeing before in various scripts.

pip3 install requests
error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.

    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.

    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.

    See /usr/share/doc/python3.11/README.venv for more information.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

Right I forgot Python was doing this now. Fine, I'll use venv, not a problem. I guess first I compile a version of Python if I want the latest? I don't see any newer ARM packages out there. Alright, compiling Python.

sudo apt install build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev libsqlite3-dev wget libbz2-dev

wget https://www.python.org/ftp/python/3.11.4/Python-3.11.4.tgz

cd Python-3.11.4/

sudo make -j 2

sudo make altinstall

Alright now pip works great on the latest version inside of a venv. My scripts all seem to work fine and there appears to be no issues. Whatever problem there was before is resolved. Specific shoutout to requests where I'm doing some strange things with network traffic and it seems to have no problems.

Conclusion

So the amount of work to get a pretty simple blog up was nontrivial, but we're here now. I have a patch for Ghost that I can apply to the container, Python seems to be working fine/great now and Docker seems to work as long as I use a user-created network with IPv6 strictly defined. The Docker default bridge also works if you specify the links inside of the docker-compose file, but that seems to be depricated so let's not waste too much time on that. For those looking for instructions on the Docker part I just followed the guide outlined here.

Now that everything is up and running it seems fine, but again if you are thinking of running an IPv6 only server infrastructure, set aside a lot of time for problem solving. Even simple applications like this require a lot of research to get up and running successfully with outbound network functioning and everything linked up in the correct way.

IPv6 Is A Disaster (but we can fix it)

August 04, 2023 in Networking

IP addresses have been in the news a lot lately and not for good reasons. AWS has announced they are charging $.005 per IPv4 address per hour, joining other cloud providers in charging for the luxury of a public IPv4 address. GCP charges $.004, same with Azure and Hetzner charges €0.001/h. Clearly the era of cloud providers going out and purchasing more IPv4 space is coming to an end. As time goes on, the addresses are just more valuable and it makes less sense to give them out for free.

So the writing is on the wall. We need to switch to IPv6. Now I was first told that we were going to need to switch to IPv6 when I was in high school in my first Cisco class and I'm 36 now, to give you some perspective on how long this has been "coming down the pipe". Up to this point I haven't done much at all with IPv6, there has been almost no market demand for those skills and I've never had a job where anybody seemed all that interested in doing it. So I skipped learning about it, which is a shame because it's actually a great advancement in networking.

Now is the second best time to learn though, so I decided to migrate this blog to IPv6 only. We'll stick it behind a CDN to handle the IPv4 traffic, but let's join the cool kids club. What I found was horrifying: almost nothing works out of the box. Major dependencies cease functioning right away and workarounds cannot be described as production ready. The migration process for teams to IPv6 is going to be very rocky, mostly because almost nobody has done the work. We all skipped it for years and now we'll need to pay the price.

Why is IPv6 worth the work?

I'm not gonna do a thing about what is IPv4 vs IPv6. There are plenty of great articles on the internet about that. Let's just quickly recap though "why would anyone want to make the jump to IPv6".

Address space (obviously)
Smaller number of header fields (8 vs 13 on v4)

Faster processing: No more checksum, so routers don't have to do a recalculation for every packet.
Faster routing: More summary routes and hierarchical routes. (Don't know what that is? No stress. Summary route = combining multiple IPs so you don't need all the addresses, just the general direction based on the first part of the address. Ditto with routes, since IPv6 is globally unique you can have small and efficient backbone routing.)
QoS: Traffic Class and Flow Label fields make QoS easier.
Auto-addressing. This allows IPv6 hosts on a LAN to connect without a router or DHCP server.
You can add IPsec to IPv6 with the Authentication Header and Encapsulating Security Payload.

Finally the biggest one: because IPv6 addresses are free and IPv4 ones are not.

Setting up an IPv6-Only Server

The actual setup process was simple. I provisioned a Debian box and selected "IPv6". Then I got my first surprise. My box didn't get an IPv6 address. I was given a /64 of addresses, which is 18,446,744,073,709,551,616. It is good to know that my small ARM server could scale to run all the network infrastructure for every company I've ever worked for on all public addresses.

Now this sounds wasteful but when you look at how IPv6 works, it really isn't. Since IPv6 is much less "chatty" than IPv4, even if I had 10,000 hosts on this network it doesn't matter. As discussed here it actually makes sense to keep all the IPv6 space, even if at first it comes across as insanely wasteful. So just don't think about how many addresses are getting sent to each device.

Important: resist the urge to optimize address utilization. Talking to more experienced networking folks, this seems to be a common trap people fall into. We've all spent so much time worrying about how much space we have remaining in an IPv4 block and designing around that problem. That issue doesn't exist anymore. A /64 prefix is the smallest you should configure on an interface.

Attempting to stick a smaller prefix, which is something I've heard people try, like a /68 or a /96 can break stateless address auto-configuration. Your mentality should be a /48 per site. That's what the Regional Internet Registries hands out when allocating IPv6. When thinking about network organization, you need to think about the nibble boundary. (I know, it sounds like I'm making shit up now). It's basically a way to make IPv6 easier to read.

Let's say you have 2402:9400:10::/48. You would divide it up as follows if you wanted only /64 for each box as a flat network.

Subnet #	Subnet Address
0	2402:9400:10::/64
1	2402:9400:10:1::/64
2	2402:9400:10:2::/64
3	2402:9400:10:3::/64
4	2402:9400:10:4::/64
5	2402:9400:10:5::/64

A /52 works a similar way.

Subnet #	Subnet Address
0	2402:9400:10::/52
1	2402:9400:10:1000::/52
2	2402:9400:10:2000::/52
3	2402:9400:10:3000::/52
4	2402:9400:10:4000::/52
5	2402:9400:10:5000::/52

You can still at a glance know which subnet you are looking at.

Alright I've got my box ready to go. Let's try to set it up like a normal server.

Problem 1 - I can't SSH in

This was a predictable problem. Neither my work or home ISP supports IPv6. So it's great that I have this box set up, but now I can't really do anything with it. Fine, I attach an IPv4 address for now, SSH in and I'll set up cloudflared to run a tunnel. Presumably they'll handle the conversion on their side.

Except that isn't how Cloudflare rolls. Imagine my surprise when the tunnel collapses when I remove the IPv4 address. By default the cloudflared utility assumes IPv4 and you need to go in and edit the systemd service file to add: --edge-ip-version 6. After this, the tunnel is up and I'm able to SSH in.

Problem 2 - I can't use GitHub

Alright so I'm on the box. Now it's time to start setting up stuff. I run my server setup script and it immediately fails. It's trying to access the installation script for hishtory, a great shell history utility I use on all my personal stuff. It's trying to pull the install file from GitHub and failing. "Certainly that can't be right. GitHub must support IPv6?"

Nope. Alright fine, seems REALLY bad that the service the entire internet uses to release software doesn't work with IPv6, but you know Microsoft is broke and also only cares about fake AI now, so whatever. I ended up using the TransIP Github Proxy which worked fine. Now I have access to Github. But then Python fails with urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable>. Alright I give up on this. My guess is the version of Python 3 in Debian doesn't like IPv6, but I'm not in the mood to troubleshoot it right now.

Problem 3 - Can't set up Datadog

Let's do something more basic. Certainly I can set up Datadog to keep an eye on this box. I don't need a lot of metrics, just a few historical load numbers. Go to Datadog, log in and start to walk through the process. Immediately collapses. The simple setup has you run curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh. Now S3 supports IPv6, so what the fuck?

curl -v https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh
*   Trying [64:ff9b::34d9:8430]:443...
*   Trying 52.216.133.245:443...
* Immediate connect fail for 52.216.133.245: Network is unreachable
*   Trying 54.231.138.48:443...
* Immediate connect fail for 54.231.138.48: Network is unreachable
*   Trying 52.217.96.222:443...
* Immediate connect fail for 52.217.96.222: Network is unreachable
*   Trying 52.216.152.62:443...
* Immediate connect fail for 52.216.152.62: Network is unreachable
*   Trying 54.231.229.16:443...
* Immediate connect fail for 54.231.229.16: Network is unreachable
*   Trying 52.216.210.200:443...
* Immediate connect fail for 52.216.210.200: Network is unreachable
*   Trying 52.217.89.94:443...
* Immediate connect fail for 52.217.89.94: Network is unreachable
*   Trying 52.216.205.173:443...
* Immediate connect fail for 52.216.205.173: Network is unreachable

It's not S3 or the box, because I can connect to the test S3 bucket AWS provides just fine.

curl -v  http://s3.dualstack.us-west-2.amazonaws.com/
*   Trying [2600:1fa0:40bf:a809:345c:d3f8::]:80...
* Connected to s3.dualstack.us-west-2.amazonaws.com (2600:1fa0:40bf:a809:345c:d3f8::) port 80 (#0)
> GET / HTTP/1.1
> Host: s3.dualstack.us-west-2.amazonaws.com
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 307 Temporary Redirect
< x-amz-id-2: r1WAG/NYpaggrPl3Oja4SG1CrcBZ+1RIpYKivAiIhiICtfwiItTgLfm6McPXXJpKWeM848YWvOQ=
< x-amz-request-id: BPCVA8T6SZMTB3EF
< Date: Tue, 01 Aug 2023 10:31:27 GMT
< Location: https://aws.amazon.com/s3/
< Server: AmazonS3
< Content-Length: 0
<
* Connection #0 to host s3.dualstack.us-west-2.amazonaws.com left intact

Fine I'll do it the manual way through apt.

0% [Connecting to apt.datadoghq.com (18.66.192.22)]

Goddamnit. Alright Datadog is out. It's at this point I realize the experiment of trying to go IPv6 only isn't going to work. Almost nothing seems to work right without proxies and hacks. I'll try to stick as much as I can on IPv6 but going exclusive isn't an option at this point.

NAT64

So in order to access IPv4 resources from IPv6 you need to go through a NAT64 service. I ended up using this one: https://nat64.net/. Immediately all my problems stopped and I was able to access resources normally. I am a little nervous about relying exclusively on what appears to be a hobby project for accessing critical internet resources, but since nobody seems to care upstream of me about IPv6 I don't think I have a lot of choice.

I am surprised there aren't more of these. This is the best list I was able to find:

Most of them seem to be gone now. Dresel's link doesn't work, Trex in my testing had problems, August Internet is gone, most of the Go6lab test devices are down, Tuxis worked but they launched the service in 2019 and seem to have no further interaction with it. Basically Kasper Dupont seems to be the only person on the internet with any sort of widespread interest in allowing IPv6 to actually work. Props to you Kasper.

Basically one person props up this entire part of the internet.

Kasper Dupont

So I was curious about Kasper and emailed him to ask a few questions. You can see that back and forth below.

Me: I found the Public NAT64 service super useful in the transition but would love to know a little bit more about why you do it.

Kasper: I do it primarily because I want to push IPv6 forward. For a few years
I had the opportunity to have a native IPv6-only network at home with
DNS64+NAT64, and I found that to be a pleasant experience which I
wanted to give more people a chance to experience.

When I brought up the first NAT64 gateway it was just a proof of
concept of a NAT64 extension I wanted to push. The NAT64 service took
off, the extension - not so much.

A few months ago I finally got native IPv6 at my current home, so now
I can use my own service in a fashion which much more resembles how my
target users would use it.

Me: You seem to be one of the few remaining free public services like this on the internet and would love to know a bit more about what motivated you to do it, how much it costs to run, anything you would feel comfortable sharing.

Kasper: For my personal products I have a total of 7 VMs across different
hosting providers. Some of them I purchase from Hetzner at 4.51 Euro
per month: https://hetzner.cloud/?ref=fFum6YUDlpJz

The other VMs are a bit more expensive, but not a lot.

Out of those VMs the 4 are used for the NAT64 service and the others
are used for other IPv6 transition related services. For example I
also run this service on a single VM: http://v4-frontend.netiter.com/

I hope to eventually make arrangements with transit providers which
will allow me to grow the capacity of the service and make it
profitable such that I can work on IPv6 full time rather than as a
side gig. The ideal outcome of that would be that IPv4-only content
providers pay the cost through their transit bandwidth payments.

Me: Any technical details you would like to mention would also be great

Kasper: That's my kind of audience :-)

I can get really really technical.

I think what primarily sets my service aside from other services is
that each of my DNS64 servers is automatically updated with NAT64
prefixes based on health checks of all the gateways. That means the
outage of any single NAT64 gateway will be mostly invisible to users.
This also helps with maintenance. I think that makes my NAT64 service
the one with the highest availability among the public NAT64 services.

The NAT64 code is developed entirely by myself and currently runs as a
user mode daemon on Linux. I am considering porting the most
performance critical part to a kernel module.

This site

Alright so I got the basics up and running. In order to pull docker containers over IPv6 you need to add: registry.ipv6.docker.com/library/ to the front of the image name. So for instance:
image: mysql:8.0 becomes image: registry.ipv6.docker.com/library/mysql:8.0

Docker warns you this setup isn't production ready. I'm not really sure what that means for Docker. Presumably if it were to stop you should be able to just pull normally?

Once that was done, we set up the site as an AAAA DNS record and allowed Cloudflare to proxy, meaning they handle the advertisement of IPv6 and bring the traffic to me. One thing I did modify from before was previously I was using Caddy webserver but since I now have a hard reliance on Cloudflare for most of my traffic, I switched to Nginx. One nice thing you can do now that you know all traffic is coming from Cloudflare is switch how SSL works.

Now I have an Origin Certificate from Cloudflare hard-loaded into Nginx with Authenticated Origin Pulls set up so that I know for sure all traffic is running through Cloudflare. The certificate is signed for 15 years, so I can feel pretty confident sticking it in my secrets management system and not thinking about it ever again. For those that are interested there is a tutorial here on how to do it: https://www.digitalocean.com/community/tutorials/how-to-host-a-website-using-cloudflare-and-nginx-on-ubuntu-22-04

Alright the site is back up and working fine. It's what you are reading right now, so if it's up then the system works.

Unsolved Problems

My containers still can't communicate with IPv4 resources even though they're on an IPv6 network with an IPv6 bridge. The DNS64 resolution is working, and I've added fixed-cidr-v6 into Docker. I can talk to IPv6 resources just fine, but the NAT64 conversion process doesn't work. I'm going to keep plugging away at it.
Before you ping me I did add NAT with ip6tables.
SMTP server problems. I haven't been able to find a commercial SMTP service that has an AAAA record. Mailgun and SES were both duds as were a few of the smaller ones I tried. Even Fastmail didn't have anything that could help me. If you know of one please let me know: https://c.im/@matdevdug

Why not stick with IPv4?

Putting aside "because we're running out of addresses" for a minute. If we had adopted IPv6 earlier, the way we do infrastructure could be radically different. So often companies use technology like load balancers and tunnels not because they actually need anything that these things do, but because they need some sort of logical division between private IP ranges and a public IP address they can stick in an DNS A record.

If you break a load balancer into its basic parts, it is doing two things. It is distributing incoming packets onto the back-end servers and it s checking the health of those servers and taking unhealthy ones out of the rotation. Nowadays they often handle things like SSL termination and metrics, but it's not a requirement to be called a load balancer.

There are many ways to load balance, but the most common are as follows:

Round-robin of connection requests.
Weighted Round-Robin with different servers getting more or less.
Least-Connection with servers that have the fewest connections getting more requests.
Weighted Least-Connection, same thing but you can tilt it towards certain boxes.

What you notice is there isn't anything there that requires, or really even benefits from a private IP address vs a public IP address. Configuring the hosts to accept traffic from only one source (the load balancer) is pretty simple and relatively cheap to do, computationally speaking. A lot of the infrastructure designs we've been forced into, things like VPCs, NAT gateways, public vs private subnets, all of these things could have been skipped or relied on less.

The other irony is that IP whitelisting, which currently is a broken security practice that is mostly a waste of time as we all use IP addresses owned by cloud providers, would actually be something that mattered. The process for companies to purchase a /44 for themselves would have gotten easier with demand and it would have been more common for people to go and buy a block of IPs from American Registry for Internet Numbers (ARIN), Réseaux IP Européens Network Coordination Centre (RIPE), or Asia-Pacific Network Information Centre (APNIC).

You would never need to think "well is Google going to buy more IP addresses" or "I need to monitor GitHub support page to make sure they don't add more later". You'd have one block they'd use for their entire business until the end of time. Container systems wouldn't need to assign internal IP addresses on each host, it would be trivial to allocate chunks of public IPs for them to use and also advertise over standard public DNS as needed.

Obviously I'm not saying private networks serve no function. My point is a lot of the network design we've adopted isn't based on necessity but on forced design. I suspect we would have ended up designing applications with the knowledge that they sit on the open internet vs relying entirely on the security of a private VPC. Given how security exploits work this probably would have been a benefit to overall security and design.

So even if cost and availability isn't a concern for you, allowing your organization more ownership and control over how your network functions has real measurable value.

Is this gonna get better?

So this sucks. You either pay cloud providers more money or you get a broken internet. My hope is that the folks who don't want to pay push more IPv6 adoption, but it's also a shame that it has taken so long for us to get here. All these problems and issues could have been addressed gradually and instead it's going to be something where people freak out until the teams that own these resources make the required changes.

I'm hopeful the end result might be better. I think at the very least it might open up more opportunities for smaller companies looking to establish themselves permanently with an IP range that they'll own forever, plus as IPv6 gets more mainstream it will (hopefully) get easier for customers to live with. But I have to say right now this is so broken it's kind of amazing.

If you are a small company looking to not pay the extra IP tax, set aside a lot of time to solve a myriad of problems you are going to encounter.

Thoughts/corrections/objections: [email protected]

Serverless Functions Post-Mortem

July 28, 2023

Around 2016, the term "serverless functions" started to take off in the tech industry. In short order, it was presented as the undeniable future of infrastructure. It's the ultimate solution to redundancy, geographic resilience, load balancing and autoscaling. Never again would we need to patch, tweak or monitor an application. The cloud providers would do it, all we had to do is hit a button and deploy to internet.

I was introduced to it like most infrastructure technology is presented to me, which is as a veiled threat. "Looks like we won't need as many Operations folks in the future with X" is typically how executives discuss it. Early in my career this talk filled me with fear, but now that I've heard it 10+ times, I adopt a "wait and see" mentality. I was told the same thing about VMs, going from IBM and Oracle to Linux, going from owning the datacenter to renting a cage to going to the cloud. Every time it seems I survive.

Even as far as tech hype goes, serverless functions picked up steam fast. Technologies like AWS Lambda and GCP Cloud Functions were adopted by orgs I worked at very fast compared to other technology. Conference after conference and expert after expert proclaimed that serverless was inevitable. It felt like AWS Lambda and others were being adopted for production workloads at a breakneck pace.

Then, without much fanfare, it stopped. Other serverless technologies like GKE Autopilot and ECS are still going strong, but the idea of a serverless function replacing the traditional web framework or API has almost disappeared. Even cloud providers pivoted, positioning the tools as more "glue between services" than the services themselves. The addition of being able to run Docker containers as functions seemed to help a bit, but it remains a niche component of the API world.

What happened? Why were so many smart people wrong? What can we learn as a community about hype and marketing around new tools?

Promise of serverless

Above we see a serverless application as initially pitched. Users would ingress through the API Gateway technology, which handles everything from traffic management, CORS, authorization and API version management. It basically serves as the web server and framework all in one. Easy to test new versions with multiple versions of the same API at the same time, easy to monitor and easy to set up.

After that comes the actual serverless function. These could be written in whatever language you wanted and could run for up to 15 minutes as of 2023. So instead of having, say, a Rails application where you are combining the Model-View-Controller into a monolith, you can break it into each route and use different tools to solve for each situation.

This suggests how one might structure a new PHP applications for instance.

Since these were only invoked in response to a request coming from a user, it was declared a cost savings. You weren't paying for server resources you weren't using, unlike traditional servers where you would provision the expected capacity beforehand based on a guess. The backend would also endlessly scale, meaning it would be impossible to overwhelm the service with traffic. No more needing to worry about DDoS or floods of traffic.

Finally at the end would be a database managed by your cloud provider. All in all you aren't managing any element of this process, so no servers or software updates. You could deploy a thousand times a day and precisely control the rollout and rollback of code. Each function could be written in the language that best suited it. So maybe your team writes most things in Python or Ruby but then goes back through for high volume routes and does those in Golang.

Combined with technologies like S3 and DynamoDB along with SNS you have a compelling package. You could still send messages between functions with SNS topics. Storage was effectively unlimited with S3 and you had a reliable and flexible key-value store with DynamoDB. Plus you ditched the infrastructure folks, the monolith, any dependency on the host OS and you were billed by your cloud provider for your actual usage based on the millisecond.

Initial Problems

The initial adoption of serverless was challenging for teams, especially teams used to monolith development.

Local development. Typically a developer pulls down the entire application they're working on and runs it on their device to be able to test quickly. With serverless, that doesn't really work since the application is potentially thousands of different services written in different languages. You can do this with serverless functions but it's way more complicated.
Hard to set resources correctly. How much memory did this function need under testing can be very different from how much it needs under production. Developers tended to set their limits high to avoid problems, wiping out much of the cost savings. There is no easy way to adjust functions based on real-world data outside of doing it by hand one by one.
AWS did make this process easier with AWS Lambda Power Tuning but you'll still need to roll out the changes yourself function by function. Since even a medium sized application can be made up of 100+ functions, this is a non-trivial thing to do. Plus these aren't static things, changes can get rolled out that dramatically change the memory usage with no warning
Is it working? Observability is harder with a distributed system vs a monolith and serverless just added to that. Metrics are less useful as are old systems like uptime checks. You need, certainly in the beginning, to rely on logs and traces a lot more. For smaller teams especially, the monitoring shift from "uptime checks + grafana" to a more complex log-based profile of health was a rough adjustment.

All these problems were challenges but it seems many were able to get through it with momentum intact. We started to see a lot of small applications launch that were serverless function based, from APIs to hobby developer projects. All of this is reflected by the Datadog State of Serverless report for 2020 which you can see here.

At this point everything seems great. 80% of AWS container users have adopted Lambda in some capacity, paired with SQS and DynamoDB. NodeJS and Python are the dominant languages, which is a little eyebrow raising. This suggests that picking the right language for the job didn't end up happening, instead picking the language easiest for the developer. But that's fine, that is also an optimization.

What happened? What went wrong?

Production Problems

Across the industry we started to hear feedback from teams that had gone hard into serverless functions backing back out. I started to see problems in my own teams that had adopted serverless. The following trends came up in no particular order.

Latency. Traditional web frameworks and containers are fast at processing requests, typically hitting latency in database calls. Serverless functions were slow depending on the last time you invoked them. This led to teams needing to keep "functions warm." What does this mean?

When the function gets a request it downloads the code and gets ready to run it. After that for a period of time, the function is just ready to rerun until it is recycled and the process needs to be run again. The way around this at first was typically an EventBridge rule to keep the function running every minute. This kind of works but not really.

Later Provisioned Concurrency was added, which is effectively....a server. It's a VM where your code is already loaded. You are limited per account to how many functions you can have set to be Provisioned Concurrency, so it's hardly a silver bullet. Again none of this happens automatically, so its up to someone to go through and carefully tune each function to ensure it is in the right category.

Scaling. Serverless functions don't scale to infinity. You can scale concurrency levels up every minute by an additional 500 microVMs. But it is very possible for one function to eat all of the capacity for every other function. Again it requires someone to go through and understand what Reserved Concurrency each function needs and divide that up as a component of the whole.

In addition, serverless functions don't magically get rid of database concurrency limits. So you'll hit situations where a spike of traffic somewhere else kills your ability to access the database. This is also true of monoliths, but it is typically easier to see when this is happening when the logs and metrics are all flowing from the same spot.

In practice it is far harder to scale serverless functions than an autoscaling group. With autoscaling groups I can just add more servers and be done with it. With serverless functions I need an in-depth understanding of each route of my app and where those resources are being spent. Traditional VMs give you a lot of flexibility in dealing with spikes, but serverless functions don't.

There are also tiers of scaling. You need to think of KMS throttling, serverless function concurrency limit, database connection limits, slow queries. Some of these don't go away with traditional web apps, but many do. Solutions started to pop up but they often weren't great.

Teams switched from always having a detailed response from the API to just returning a 200 showing that the request had been received. That allowed teams to stick stuff into an SQS queue and process it later. This works unless there is a problem in processing, breaking the expectations from most clients that 200 means the request was successful, not that the request had been received.

Functions often needed to be rewritten as you went, moving everything you could to the initialization phase and keeping all the connection logic out of the handler code. The initial momentem of serverless was crashing into the rewrites as teams learned painful lesson after painful lesson.

Price. Instead of being fire and forget, serverless functions proved to be very expensive at scale. Developers don't think of routes of an API in terms of how many seconds they need to run and how much memory they use. It was a change in thinking and certainly compared to a flat per-month EC2 pricing, the spikes in traffic and usage was an unpleasant surprise for a lot of teams.

Combined with the cost of RDS and API Gateway and you are looking at a lot of cash going out every month.

The other cost was the requirement that you have a full suite of cloud services identical to production for testing. How do you test your application end to end with serverless functions? You need to stand up the exact same thing as production. Traditional applications you could test on your laptop and run tests against it in the CI/CD pipeline before deployment. Serverless stacks you need to rely a lot more on Blue/Green deployments and monitoring failure rates.

Slow deployments. Pushing out a ton of new Lambdas is a time-consuming process. I've waited 30+ minutes for a medium-sized application. God knows how long people running massive stacks were waiting.
Security. Not running the server is great, but you still need to run all the dependencies. It's possible for teams to spawn tons of functions with different versions of the same dependencies, or even choosing to use different libraries. This makes auditing your dependency security very hard, even with automation checking your repos. It is more difficult to guarantee that every compromised version of X dependency is removed from production than it would be for a smaller number of traditional servers.

Why didn't this work?

I think three primary mistakes were made.

The complexity of running a server in a modern cloud platform was massively overstated. Especially with containers, running a linux box of some variety and pushing containers to it isn't that hard. All the cloud platform offer load balancers, letting you offload SSL termination, so really any Linux box with Podman or Docker can run listening on that port until the box has some sort of error.

Setting up Jenkins to be able to monitor Docker Hub for an image change and trigger a deployment is not that hard. If the servers are just doing that, setting up a new box doesn't require the deep infrastructure skills that serverless function advocates were talking about. The "skill gap" just didn't exist in the way that people were talking about.
People didn't think critically about price. Serverless functions look cheap, but we never think about how many seconds or minute a server is busy. That isn't how we've been conditioned to think about applications and it showed. Often the first bill was a shocker, meaning the savings from maintenance had to be massive and they just weren't.
Really hard to debug problems. Relying on logs and X-Ray to figure out what went wrong is just much harder than pulling the entire stack down to your laptop and triggering the same requests. It is a new skill and one that people had not developed up to that point. The first time you have a long-running production issue that would have been trivial to fix in the old monolith application design style that persists for a long time in the serverless function world, the enthusiasm from leadership evaporates very quickly.

Conclusion

Serverless functions fizzled out and it's important for us as an industry to understand why the hype wasn't real. Important questions were skipped over in an attempt to increase buy-in to cloud platforms and simplify the deployment and development story for teams. Hopefully this provides us a chance to be more skeptical of promises like this in the future. We should have adopted a much more wait and see to this technology instead of rushing straight in and hitting all the sharp edges right away.

Currently serverless functions live as what they're best at, which is either glue between different services, triggering longer-running jobs or as very simple platforms that allow for tight cost control by single developers who are putting together something for public use. If you want to use something serverless for more, you would be better off looking at something like ECS with Fargate or Cloud Run in GCP.