I recently read this great piece by Timur Tukaev discussing how to approach the growing complexity of Kubernetes. You can read it here. Basically as Kubernetes continues to expand to be the everything platform, the amount of functionality it contains is taking longer and longer to learn.
Right now every business that adopts Kubernetes is basically rolling their own bespoke infrastructure. Timur's idea is to try and solve this problem by following the Linux distro model. You'd have groups of people with similar needs work together to make an out-of-the-box Kubernetes setup geared towards their specific needs. I wouldn't start from a blank cluster, but a cluster already configured for my specific usecase (ML, web applications, batch job processing).
I understand the idea, but I think the root cause of all of this is simply a lack of a meaningful package management system for Kubernetes. Helm has done the best it can, but practically speaking it's really far from where we would need to be in order to have something even approaching a Linux package manager.
More specifically we need something between the very easy to use but easy to mess up Helm and the highly bespoke and complex to write Operator concept.
Centralized State Management
Maintain a robust, centralized state store for all deployed resources, akin to a package database.
Provide consistency checks to detect and reconcile drifts between the desired and actual states.
Advanced Dependency Resolution
Implement dependency trees with conflict resolution.
Ensure dependencies are satisfied dynamically, including handling version constraints and providing alternatives where possible.
Granular Resource Lifecycle Control
Include better support for orchestrating changes across interdependent Kubernetes objects.
Secure Packaging Standards
Enforce package signing and verification mechanisms with a centralized trust system.
Include better support for orchestrating changes across interdependent Kubernetes objects.
Native Support for Multi-Cluster Management
Allow packages to target multiple clusters and namespaces with standardized overrides.
Provide tools to synchronize package versions across clusters efficiently.
Rollback Mechanisms
Improve rollback functionality by snapshotting cluster states (beyond Helm’s existing rollback features) and ensuring consistent recovery even after partial failures.
Declarative and Immutable Design
Introduce a declarative approach where the desired state is managed directly (similar to GitOps) rather than relying on templates.
Integration with Kubernetes APIs
Directly leverage Kubernetes APIs like Custom Resource Definitions (CRDs) for managing installed packages and versions.
Provide better integration with Kubernetes-native tooling (e.g., kubectl, kustomize).
Again a lot of this is Operators, but Operators are proving too complicated for normal people to write. I think we could reuse a lot of that work, keep that functionality and create something similar to what the Operator allows you to do with less maintenance complexity.
Still I'd love to see the Kubernetes folks do anything in this area. The current state of the world is so bespoke and frankly broken there is a ton of low hanging fruit in this space.
Giving advice is always tricky—especially now, in a world where everything seems to change every week. Recently, I had the chance to chat with some folks who’ve either just graduated or are about to graduate and are looking for jobs in tech. It was an informal, unstructured conversation, which, frankly, was refreshing.
A lot of the exchange surprised me in good ways. But one thing stood out: how little they actually knew about working for a tech company.
Most of what they knew came from one of three places:
1. Corporate blog posts gushing about how amazing life is for employees.
2. LinkedIn (because those who don't work post on LinkedIn about work).
3. Random anecdotes from people they’d bumped into after graduating and starting their first job.
Spoiler alert: Much of this information is either lies, corporate fantasy, or both.
If history is written by the victors, then the narrative about life inside tech companies is written (or at least approved) by the marketing department. It’s polished, rehearsed, and utterly useless for preparing you for the reality of it all. That’s what I’m here to do: share a bit of that reality.
I’ve been fortunate enough to work at both Fortune 500 companies and scrappy startups with 50 people, across the US and Europe. I’m not claiming to know everything (far from it). In fact, I’d strongly encourage others to chime in and add their own insights. The more voices, the better.
That said, I think I’ve been around long enough to offer a perspective worth sharing.
One last thing: this advice is engineering-specific. I haven’t worked in other roles, so while some of this might apply to other fields, don’t hold me accountable if it doesn’t. Consider yourself warned.
Where Should I Apply?
A lot of newcomers gravitate toward companies they’ve heard of—and honestly, that’s not the approach I’d recommend. The bigger and fancier a tech company is, the less their tech stack resembles anything you’ll ever see anywhere else.
Take it from someone who’s watched ex-Googlers and former Facebook folks struggle to understand GitHub: starting at the top isn’t always the blessing it sounds like. (Seriously, just because you worked on a world-class distributed system doesn’t mean you know how to use Postgres.)
Finding your “sweet spot” takes time. For me, it’s companies with 200 to 1,000 employees. That size means they’re big enough to have actual resources but not so bloated with internal politics that all your time is spent in meetings about meetings. Bonus: you might even get to ship something users will see!
Here’s a pro tip: Don’t assume name recognition = better place to work. Plenty of people at Apple hate their jobs. Sometimes, smaller companies will give you way better experience and way fewer existential crises.
And remember: your first job isn’t your forever job. The goal here is just to get something solid on your resume and learn the ropes. That’s it.
Interview
Congratulations! You’ve slogged through a million online applications and finally landed your first interview. Exciting, right? Before you dive in, here are some tips for navigating this bizarre ritual, especially if you’ve never done it before.
Interviews ≠ The Job
Let’s get this straight: interviews have almost nothing to do with the job you’ll actually be doing. They’re really just opportunities for engineers to ask whatever random questions they think are good “signals.”
Someone once said that technical interviews are like “showing you can drive a race car so you can get a job driving a garbage truck.” They weren’t wrong.
Protect Your Time
Here’s the deal: some companies will waste your time. A lot of it.
I’ve been through this circus—take-home tests, calls with unrelated questions, and in-person interviews where no one has even glanced at my earlier work. It’s maddening.
• Take-Home Assignments: Fine, but they should replace some in-person tests, not stack on top of them.
• In-Person Interviews: If they insist on multiple rounds, the interviewers better be sharing notes. If not, walk away.
Remember: the average interview doesn’t lead to a job. Don’t ditch other opportunities because you’re on round three with a “Big Company.” You’re not as close to an offer as you might think.
Watch for the Sunk-Cost Fallacy
If you find yourself doing two take-homes, six interviews, and a whiteboard session, ask yourself: Is this really worth it?
I’ve ended interview processes midstream when I realized they were wasting my time—and I’ve never regretted it. On the flip side, every time I’ve stuck it out for those marathon processes, the resulting job was…meh. Not worth it.
Ask the Hard Questions
This is your chance to interview them. Don’t hold back:
On-Call Work: Ask to see recent pages. Are they fixing real problems, or are you signing up for a never-ending nightmare?
Testing: What’s their test coverage like? Are the tests actually useful, or are they just there to pass “sometimes”?
Turnover: What’s the average tenure? If everyone’s left within 18 months, that’s a big, waving red flag.
Embrace the Leetcode Circus
I know, I know—Leetcode is frustrating and ridiculous, but there’s no way around it. Companies love to make you grind through live coding problems. Just prepare for it, get through it, and move on.
Failure Is Normal
Failing an interview means absolutely nothing. It stings, sure, especially after investing so much time. But rejection is just part of the process. Don’t dwell on it.
Common Questions
Why are interviews so stupid if they're also really expensive to do?
Because interview time is treated like “free time.” Your time, their time—it’s all apparently worthless. Why? Probably something they heard in a TED talk.
Should I contribute to an open-source project to improve my resume?
No. Companies that rely on open-source rarely contribute themselves, so why should you? Unless you want to be an unpaid idealist, skip it.
I’ve always dreamed of working at [Insert Company]. Should I show my enthusiasm?
Absolutely not. Recruiters can smell enthusiasm, and it works against you. For some reason, acting mildly disinterested is the best way to convince a company you’re a great fit. Don’t ask me why—it just works.
Getting Started
One of the biggest mistakes newcomers make in tech is believing that technical decisions are what truly matter. Spoiler: they’re not.
Projects don’t succeed or fail because of technical issues, and teams don’t thrive or collapse based on coding prowess alone. The reality is this: human relationships are the most critical factor in determining your success. Everything else is just noise that can (usually) be worked around or fixed.
Your Manager Is Key
When starting out, the most important relationship you’ll have is with your manager. Unfortunately, management in tech is…well, often terrible.
Most managers fall into two categories:
1. The Engineer Turned Manager: An engineer who got told, “Congrats, you’re a manager now!” without any training.
2. The People Manager™: Someone who claims to be an “expert” at managing people but doesn’t understand (or care about) how software development actually works.
Both types can be problematic, but the former engineer is usually the lesser evil. Great managers do exist, but they’re rare. Teams with good managers tend to have low turnover, so odds are, as a junior, you’re more likely to get stuck with a bad one.
The way you know you have a good manager is they understand the role is one of service. It's not about extracting from you, it's about enabling you to do the thing you know how to do. They do the following stuff:
They don't shy away from hard tasks
They understand the tasks their team is working on and can effortlessly answer questions about them
Without thinking about it you would say they add value to most interactions they are in
They have a sense of humor. It's not all grim productivity.
Let’s meet the cast of bad managers you might encounter—and how to survive them.
1800s Factory Tycoon
This manager runs their team like an industrial assembly line and sees engineers as interchangeable cogs. Their dream? Climbing the corporate ladder as quickly as possible.
Signs You Work for a Tycoon
Obsession with “Keeping Things Moving”: Complaints or objections? Clearly, someone’s trying to sabotage the conveyor belt.
“Everyone Is Replaceable”: They dismiss concepts like institutional knowledge or individual skill levels. An engineer is just someone who “writes code at a computer.”
No Onboarding: “Here’s the codebase. It’s self-documenting.” Then they wander off.
Fear of Change: Nothing new gets approved unless it meets impossible criteria:
Zero disruption to current work.
Requires no training.
Produces a shiny metric showing immediate improvement.
No one else needs to do anything differently.
I Am The Star. They're the face of every success and the final word in every debate. You exist to implement their dream and when the dream doesn't match what they imagined it's because you suck.
Tips for Dealing With The Tycoon
Stay out of their way.
Expect zero career support—build your own network and contacts.
Be patient. Tycoons often self-destruct when their lack of results catches up to them. A new manager will eventually step in.
Sad Engineer
This person was a good engineer who got promoted into management…without any guidance. They mean well but often struggle with the non-technical aspects of leadership.
• Can’t Stay Out of the Code: They take on coding tasks, but no one on the team feels comfortable critiquing their PRs.
• Poor Communication: They’ll multitask during meetings, avoid eye contact, or act distracted—leaving their team feeling undervalued.
• Technical Debates > Team Conflicts: Great at discussing system architecture, useless at resolving interpersonal issues.
• No Work/Life Balance: They’re often workaholics, unintentionally setting a toxic example for their team.
Tips For Dealing with the SE
Be direct. Subtle hints about what you need or dislike will go over their head.
Use their technical expertise to your advantage—they love helping with code.
Don’t get attached. SEs rarely stick with management roles for long.
Jira as a Religion (JaaR)
The JaaR manager believes in the divine power of process. They dream of turning the chaos of software development into a predictable, assembly-line-like utopia. Spoiler: it never works.
• Obsessed with Estimates: They treat software delivery like Amazon package tracking, demanding updates down to the minute.
• Can’t Say No: They agree to every request from other teams, leaving their own team drowning in extra work.
• The Calendar Always Wins: Quality doesn’t matter; deadlines are sacred. If you ship a mess, that’s fine—as long as it’s on time.
• Meetings. Endless Meetings.: Every decision requires a meeting, a slide deck, and 20 attendees. Progress crawls.
Tips for Surviving the JaaR
• Find the Real Decision-Makers: JaaRs defer to others. Identify who actually calls the shots and work with them directly.
• Play Their Game: Turn everything into a ticket. They love tickets. Detailed ones make them feel productive, even if nothing gets done.
Your Team
The Core Reality of Technical Work
Technical work is often driven by passion. Many engineers enjoy solving problems and building things, even in their free time. However, one universal truth often emerges: organizational busywork expands to fill the available working day. No matter how skilled or motivated your team is, you will face an inevitable struggle against the growing tide of meetings, updates, and bureaucracy.
Every action you and your teammates do is designed to try and delay this harsh reality for as long as possible. Nobody escapes it, eventually you will be crushed by endless amounts of status updates, tickets, syncs and Slack channels. You'll know it's happened when you are commuting home and can't remember anything you actually did that week.
Teams will often pitch one of the following in an attempt to escape the looming black hole of busywork. Don't let yourself get associated with one of these ideas that almost never works out.
Common Losing Concepts in Team Discussions
“If we adopt X, all our problems go away.” Tools and technologies can solve some problems but always introduce new challenges. Stay cautious when someone pitches a “silver bullet” solution.
“We’re being left behind!” Tech loves fads and you don't have to participate in every one. Don't stress about "well if I don't work with Y technology then nobody will ever hire me". It's not true. Productivity in software development doesn't grow by leaps and bounds year over year regardless of how "fast" you adopt new junk
“We need to tackle the backlog.” Backlogs often consist of:
Important but complex tasks requiring effort and time.
Low-value tasks that remain because no one has questioned their necessity.
If a thing brings a lot of value to users and isn't terribly painful to write, developers will almost always snap it up for the dopamine rush of making it. This means addressing a backlog without critically evaluating its contents is a waste of resources.
4. “We need detailed conventions for everything.”
While consistency is good, overengineering processes can be counterproductive. Practical examples and lightweight guidelines often work better than rigid rules.
5. “Let’s rewrite everything in [new language].”
Rewrites are rarely worth the cost. Evolutionary changes and refactoring are almost always more effective than starting over.
Team Dynamics
The Golden Rule
People can either be nice or good at their jobs, both are equally valuable.
Teams need balance. While “10x engineers” may excel at writing code, they often disrupt team dynamics by overengineering or pursuing unnecessary rewrites. A harmonious team with diverse strengths typically achieves more than one overloaded with “superstars.”
Tips for Thriving on a Team
1. Quietly Grind on the Problem
As a junior developer, expect to spend a lot of time simply reading code and figuring things out. While asking for help is encouraged, much of your growth comes from independently wrestling with problems.
Start with small, low-risk changes to learn the codebase and deployment process.
Accept that understanding a system takes time, and no explanation can replace hands-on exploration.
2. Understand How Your Team Communicates
Some teams use Slack for everything; others rely on email or tools like GitHub. Learn the preferred methods of communication and actively participate.
3. Create a “How We Make Money” Diagram
Understanding your company’s business model helps you prioritize your work effectively. Map out how the company generates revenue and identify critical systems.
• Focus on less-sensitive parts of the system for experimentation or learning.
• Warn teammates when working on the “make money” areas to avoid unnecessary risks.
4. Plan Before Coding
The more complex a problem, the more time you should spend planning. Collaboration and context are key to successful implementation.
• Discuss proposed changes thoroughly.
• Understand the historical context behind existing decisions.
Assume most of the decisions you are looking at were made by a person as smart or smarter than you and work from there has been a good mental framework when joining a team for me.
Take Care Of Yourself
Tech work can be unpredictable—projects get canceled, layoffs happen, or sometimes a lucky break might land you a new opportunity. Regardless, this is your career, and it’s essential to make sure you’re learning and advancing at each job, because you never know when you might need to move on.
All Offices Suck
Open-office layouts are a disaster for anyone who values focus, especially for new employees. You’re constantly bombarded by conversations and interruptions, often with no warning. The office environment may look sleek and modern, but it’s rarely conducive to concentration or productivity.
If you need to drown out the sound of people talking or noise with headphones, your leadership cares more about how an office looks to a visitor than how it functions. Try to find a quiet spot where you can sit and work on your tasks.
Monitor Turnover High employee turnover can be one of the most damaging issues for a team. Not only does it drain time and resources through offboarding, interviewing, and training, but it also kills morale and focus.
Turnover, through voluntary leavings or layoffs, is one of the most destructive things to your team dynamics and overall morale. Turnover by itself costs about 20% of all the hours invested by a team (offboarding, interviewing, bringing someone new on), but there's a more sinister angle.
Teams with high turnover don't care about the future. You'll often get trapped in a death spiral of watching a looming problem grow and grow but unable to get anybody invested because they're already planning their exit.
Now high-turnover teams also promote really quickly, so this could be a good opportunity to get some more senior titles on your resume. But be warned that these teams are incredibly chaotic and often are shut down with little warning by upper management.
High turnover also says management is shitty at cost-benefit analysis. It takes forever to onboard engineers, especially in niche or high-complexity sectors.
Try to find out why people are leaving.
Is this "just a job" to them? Fine, let it be that way to you.
Do they feel disposable or ignored? That suggests more serious management issues that go up the chain.
"Loyalty is for idiots". People naturally want to be loyal to organizations and teams, so if the sentiment is "only a sucker would be loyal to this place", understand you don't likely have a long career here.
Corporate Goals Don't Matter
A lot of companies assume that whatever their stated high-level goals are somehow matter to you at the bottom. They don't and it's a little crazy to think they would.
Managers align more strongly with these high-level goals because that is the goal of their team. They don't matter to you because that's not what drives your happiness. Doing work that feels fulfilling, interesting and positive matters. Seeing the profit numbers go up doesn't do anything for you.
Programming Is Lonely
It can be hard for people to transition from university to a silent tech office where people rarely speak outside of meetings.
Try to make friends across departments. I personally always loved hanging out during breaks with the sales folks, mostly because their entire lives are social contact (and they're often naturally funny people). It's important to escape your group from time to time.
Desired End State
Our goal with all of this is to get ourselves onto a good team. You'll know you are on a good team when you spend most of your time just clearing the highway of problems, such is your pace of progress. People are legitimately happy, even if the subject matter is boring. Some of my favorite teams in my career worked on mind-numbingly mundane things but we just had a great time doing it.
Once you find one of these, my advice is to ride it out for as long as possible. They don't last forever, management will come in and destroy it if given enough time. But these are the groups where you will do the most learning and develop the most valuable connections. These are the people who you will hire (or will hire you) five or ten years after you stop working together. That's how incredible the bond of a high-functioning group is and that's what you need to find.
But it takes awhile. Until you do, you just need to keep experimenting. Don't stay at a job you don't like for more than 2-3 years. Instead take those learnings and move on to the next chance.
Ultimately, your career is about learning, growing, and being part of a team that values you. The road can be long, and it’s okay to experiment with different roles and environments until you find the right fit. Stay focused on what matters to you, take care of yourself, and don’t be afraid to move on if the job isn’t fulfilling your needs.
Sometimes life gets you down. Maybe it's a crushing political situation in your home country, perhaps you read the latest scientific data about global warming or hey sometimes you just need to make something stupid to remind yourself why you ever enjoyed doing this. Whatever the reason, let's take a load off and make a pointless Flask app. You can do it too!
Pokemon TCG Pocket Friend Website
I want to find some friends for the mobile game Pokemon TCG Pocket, but I don't want to make a new Reddit account and I don't want to join a Discord. So let's make one. It's a pretty good, straightforward one-day kind of problem.
Why Flask?
Python Flask is the best web framework for dumb ideas that you want to see turned into websites with as little work as possible. Designed for people like me who can hold no more than 3 complex ideas in their heads at a time, it feels like working with Rails if Rails didn't try to constantly wrench the steering wheel away from you and drive the car.
Rails wants to drive for awhile
It's easy to start using, pretty hard to break and extremely easy to troubleshoot.
We're gonna try to time limit this pretty aggressively. I don't want to put in a ton of time on this project, because I think a critical part of a fun project is to get something out onto the Internet as quickly as possible. The difference between fun projects and work projects is the time gap between "idea" and "thing that exists for people to try". We're also not going to obsess about trying to get everything perfectly right. Instead we'll take some small steps to try and limit the damage if we do something wrong.
Let me just see what you made and skip the tutorial
Feel FREE to use this as the beginning template for anything fun that you make and please let me know if you make something cool I can try.
Note:
This is not a "how do I Flask" tutorial. This is showing you how you can use Flask to do fun stuff quickly, not the basics of how the framework operates. There's a good Flask tutorial you'll have to do in order to do the stuff I'm talking about: https://flask.palletsprojects.com/en/stable/tutorial/
Getting Started
Alright let's set this bad-boy up. We'll kick it off with my friend venv. Assuming you got Python from The Internet somewhere, let's start writing some routes.
Run it with python hello.py and enjoy your hello world.
Let's start writing stuff
Basically Flask apps have a few parts. There's a config, the app, templates and static. But before we start all that, let's just quickly define what we actually need.
We need an index.html as the /
We need a /sitemap.xml for search engines
Gonna need a /register for people to add their codes
Probably want some sort of /search
If we have user accounts you probably want a /profile
Finally gonna need a /login and /logout
So to store all that junk we'll probably want a database but not something complicated because it's friend codes and we're not looking to make something serious here. SQLite it is! Also nice because we're trying to bang this out in one day so easier to test.
At a basic level Flask apps work like this. You define a route in your app.py (or whatever you want to call it.
Then inside of your templates directory you have some Jinja2 templates that will get rendered back to the client. Here is my index.html
{% extends "base.html" %}
{% block content %}
<div class="container mt-4">
<h1 class="text-center text-danger">Pokémon TCG Friend Finder</h1>
<p>Welcome to the Pokémon TCG Friend Finder, where you can connect with players from all over the world!</p>
<div class="mt-4">
<h4>How to Find Friend Codes:</h4>
<p>To browse friend codes shared by other players, simply visit our <a class="btn btn-primary btn-sm" href="{{ url_for('find_friends') }}">Find Friends</a> page. No registration is required!</p>
</div>
<div class="mt-4">
<h4>Want to Share Your Friend Code?</h4>
<p>If you'd like to share your own friend code and country, you need to register for an account. It's quick and free!</p>
<p>
{% if current_user.is_authenticated %}
<a class="btn btn-primary" href="{{ url_for('find_friends') }}">Visit Find Friends</a>
{% else %}
<a class="btn btn-success" href="{{ url_for('register') }}">Register</a> or
<a class="btn btn-info" href="{{ url_for('login') }}">Log in</a> to get started!
{% endif %}
</p>
</div>
<div class="mt-4">
<h4>Spread the Word:</h4>
<p>Let others know about this platform and grow the Pokémon TCG community!</p>
<div class="share-buttons">
<a href="#" onclick="shareOnFacebook()" title="Share on Facebook">
<img src="{{ url_for('static', filename='images/facebook.png') }}" alt="Share on Facebook" style="width: 64px;">
</a>
<a href="#" onclick="shareOnTwitter()" title="Share on Twitter">
<img src="{{ url_for('static', filename='images/twitter.png') }}" alt="Share on Twitter" style="width: 64px;">
</a>
<a href="#" onclick="shareOnReddit()" title="Share on Reddit">
<img src="{{ url_for('static', filename='images/reddit.png') }}" alt="Share on Reddit" style="width: 64px;">
</a>
</div>
</div>
</div>
<!-- JavaScript for sharing -->
<script>
const url = encodeURIComponent(window.location.href);
const title = encodeURIComponent("Check out Pokémon TCG Friend Finder!");
function shareOnFacebook() {
window.open(`https://www.facebook.com/sharer/sharer.php?u=${url}`, '_blank');
}
function shareOnTwitter() {
window.open(`https://twitter.com/intent/tweet?url=${url}&text=${title}`, '_blank');
}
function shareOnReddit() {
window.open(`https://www.reddit.com/submit?url=${url}&title=${title}`, '_blank');
}
</script>
{% endblock %}
Some quick notes:
I am using Bootstrap because Bootstrap let's people who are not good at frontend do one of those really quickly: https://getbootstrap.com/
Basically that's it. You make a route on Flask that points to a template, the template is populated from data from your database and you proudly display it for the world to see.
Instead let me run you through what I did that isn't "in the box" with Flask and why I think it helps.
Recommendations to do this real fast
Start with it inside of a container from the beginning.
FROM python:3.12-slim
# Create a non-root user
RUN groupadd -r nonroot && useradd -r -g nonroot nonroot
WORKDIR /app
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY . .
RUN chown -R nonroot:nonroot /app
USER nonroot
ENTRYPOINT ["./gunicorn.sh"]
You are going to have to use a different HTTP server for Flask anyway, gunicorn is.....one of those. So you might as well practice like you play. Here is the compose file
Change the volumes to be wherever you want the database mounted. This is for local development but switching it to "prod" should be pretty straight forward.
config is just "the stuff you are using to configure your application"
import os
class Config:
SECRET_KEY = os.environ.get("SECRET_KEY") or "secretttssss"
SQLALCHEMY_DATABASE_URI = f"sqlite:///{os.getenv('DATABASE_PATH', '/data/users.db')}"
SQLALCHEMY_TRACK_MODIFICATIONS = False
WTF_CSRF_ENABLED = True
if os.getenv('FLASK_ENV') == 'development':
DEBUG = True
else:
DEBUG = False
SERVER_NAME = "poketcg.club"
Finally the models stuff is just the database things broken out to their own file.
from flask_sqlalchemy import SQLAlchemy
from flask_bcrypt import Bcrypt
from flask_login import UserMixin, LoginManager
db = SQLAlchemy()
bcrypt = Bcrypt()
login_manager = LoginManager()
class User(db.Model, UserMixin):
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(150), unique=True, nullable=False)
password = db.Column(db.String(150), nullable=False)
friend_code = db.Column(db.String(50))
country = db.Column(db.String(50))
friend_requests = db.Column(db.Integer, default=0)
You'll probably want to do a better job of defining the data you are inputting into the database than I did, but move fast break things etc.
Logs are pretty much all you are gonna have
Python logging library is unfortunately relatively basic, but important to note that this is going to be pretty much the only way you will know if something is working or not.
That's writing out to a log file and also stdout. You can choose either/or depending on what you want, with the understanding that it's more container-y to run them just as stdout.
Monitor Basic Response Times
So when I'm just making the app and I want to see "how long it takes to do x" I'll add a very basic logging element to track "how long did Flask thing the request took".
This doesn't tell you a lot but it usually tells me "whoa that took WAY too long something is wrong". It's pretty easy to put OpenTelemetry into Flask but that's sort of overkill for what we're talking about.
Skipping Emails and Password Resets
One thing that consumes a ton of time when working on something like this is coming up with the account recovery story. I've written a ton on this before so I won't bore you with that again, but my recommendation for fun apps is just to skip it.
In terms of account management make it super easy for the user to delete their account.
Deploying to Production
The most straightforward way to do this is Docker Compose with a file that looks something like the following:
You can decide how complicated or simple you want to make this, but you should be able to (pretty easily) set this up on anything from a Pi to a $5 a month server.
See? Wasn't that hard!
So is my website a giant success? Absolutely not. I've only gotten a handful of users on it and I'm not optimistic anyone will ever use it. But I did have a ton of fun making it, so honestly mission success.
For as long as I’ve been around tech enthusiasts, there has been a recurring “decentralization dream.” While the specifics evolve, the essence remains the same: everyone would own a domain name and host their digital identity. This vision promises that people, liberated from the chore of digital maintenance, would find freedom in owning their slice of the internet. The basic gist is at some point people would wake up to how important online services are to them and demand some ownership over how they work.
This idea, however, always fails. From hosting email and simple HTML websites in my youth to the current attempts at decentralized Twitter- or YouTube-like platforms, the tech community keeps waiting for everyday people to take the baton of self-hosting. They never will—because the effort and cost of maintaining self-hosted services far exceeds the skill and interest of the audience. The primary “feature” of self-hosting is, for most, a fatal flaw: it’s a chore. It’s akin to being “free” to change the oil in your car—it’s an option, but not a welcome one for most.
Inadvertently, self-hosting advocates may undermine their own goal of better privacy and data ownership. By promoting open-source, self-hosted tools as the solution for those concerned about their privacy, they provide an escape valve for companies and regulators alike. Companies can claim, “Well, if users care about privacy, they can use X tool.” This reduces the pressure for legal regulation. Even Meta’s Threads, with its integration of ActivityPub, can claim to be open and competitive, deflecting criticism and regulation—despite this openness being largely flowing from Threads to ActivityPub and not the other way around.
What people actually need are laws. Regulations like the GDPR must become the international standard for platforms handling personal data. These laws ensure a basic level of privacy and data rights, independent of whether a judge forces a bored billionaire to buy your favorite social network. Suggesting self-hosting as a solution in the absence of such legal protections is as naive as believing encrypted messaging platforms alone can protect you from government or employer overreach.
What do users actually deserve?
We don't need to treat this as a hypothetical. What citizens in the EU get is the logical "floor" of what citizens around the world should demand.
Right to access
What data do you have on me?
How long do you keep it?
Why do you have it? What purpose does it serve?
Right to Rectification
Fix errors in your personal data.
Right to be Forgotten
There's no reason when you leave a platform that they should keep your contribution forever.
Right to Data Portability
Transfer your data to another platform in a standardized machine-readable format.
Right to Withdraw Consent
Opt out of data collection whenever you want, even if you originally agreed.
These are not all GDPR rights, but they form the backbone of what allows users to engage with platforms confidently, knowing they have levers to control their data. Regulations like these are binding and create accountability—something neither self-hosting nor relying on tech billionaires can achieve.
Riding this roller coaster of "I need digital platforms to provide me essential information and access" and trying to balance it with "whatever rich bored people are doing this week" has been a disaster. It's time to stop pretending these companies are our friends and force them to do the things they say they'll do when they're attempting to attract new users.
The fallacies of decentralization as a solution
The decentralization argument often assumes that self-hosted platforms or volunteer-driven networks are inherently superior. But this isn’t practical:
Self-hosting platforms are fragile.
Shutting down a small self-hosted platform running on a VPS provider is pretty trivial. These are basically paid for by one or two people and they would be insane to fight any challenge, even a bad one. How many self-hosted platforms would stand up to a threatening letter from a lawyer, much less an actual government putting pressure on their hosting provider?
Even without external pressure there isn't any practical way to fund these efforts. You can ask for donations, but that's not a reliable source of revenue for a cost that will only grow over time. At a certain size the maintainer will need to form a nonprofit in order to continue collecting the donations, a logistical and legal challenge well outside of the skillset of the people we're talking about.
It's effectively free labor. You are taking a job, running a platform, removing the pay for that job, adding in all the complexity of running a nonprofit and adding in the joys of being the CPA, the CEO, the sysadmin, etc. At some point people get sick, they lose interest, etc.
Decentralization doesn’t replace regulation.
While decentralization aligns with the internet’s original ethos, it doesn’t negate the need for legal protections. Regulations like GDPR raise the minimum level of privacy and security, while decentralization remains an optional enhancement. You lose nothing by moving the floor up.
Regulation is not inherently bad.
A common refrain among technical enthusiasts is a libertarian belief that market pressures and having a superior technical product will "win out" and legislation is bad because it constrains future development. You saw this a lot in the US tech press over the EU move from proprietary chargers to USB-C, a sense of "well when the next big thing comes we won't be able to use it because of silly government regulation".
Global legislation forces all companies—not just a niche few catering to privacy enthusiasts—to respect users’ rights. Unlike market-driven solutions or self-hosting, laws are binding and provide universal protections.
It is impossible for an average user to keep track of who owns which platforms and what their terms of service are now. Since they can be changed with almost no notice, whatever "protections" they can provide are laughably weak. In resisting legislation you make the job of large corporations easier, not harder.
The reality of privacy as a privilege
Right now, privacy often depends on technical skills, financial resources, or sheer luck:
• I value privacy and have money: You can pay for premium platforms like Apple or Fastmail. These platforms could change the rules whenever they want to but likely won't because their entire brand is based on the promise of privacy.
• I value privacy and have technical skills: You can self-host and manage your own services.
• I value privacy but lack money and technical skills: You’re left hoping that volunteers or nonprofits continue offering free tools—and that they don’t disappear overnight. Or you try and keep abreast of a constant churning ecosystem where companies change hands all the time and the rules change whenever they want.
This is a gatekeeping problem. Privacy should not be a luxury or dependent on arbitrary skill sets. Everyone deserves it.
It actually makes a difference
As someone who has experienced the difference between the U.S. and the EU’s approach to privacy, I can attest to how much better life is with stronger regulations. GDPR isn’t perfect, but it provides a foundation that improves quality of life for everyone. Instead of treating regulation as burdensome or unrealistic, we should view it as essential.
The dream of a decentralized internet isn’t inherently wrong, but waiting for it to materialize as a universal solution is a mistake. Laws—not utopian ideals—are the only way to ensure that users everywhere have the protections they deserve. It’s time to stop pretending companies will prioritize ethics on their own and instead force them to.
Every few years I will be on a team and the topic of quantum computing will come up. Inevitably the question will get asked "well is there something we are supposed to be doing about that or is it just a looming threat?" We will all collectively stare at each other and shrug, then resume writing stuff exactly like we were writing it before.
In 2024 it would be hard to make a strong justification for worrying a lot about post-quantum cryptography in a world where your most likely attack vector is someone breaking into your company Slack and just asking for access to something. However it is a question developers like to worry about because it involves a lot of math and cool steampunk looking computers. It's definitely a more interesting problem than how to get everyone to stop blindly approving access to the company Confluence.
Looks like something in Star Trek someone would trip and pull a bunch of wires out of.
Since I get asked the question every few year and I basically have no idea what I'm talking about, I figured I'll do the research now and then refer back to this in the future when someone asks and I need to look clever in a hurry.
TL/DR: The tooling to create post-quantum safe secrets exists and mostly works, but for normal developers dealing with data that is of little interest 12 months after it is created, I think this is more a "nice to have". That said, these approaches are different enough from encryption now that developers operating with more important data would be well-served in investing the time in doing the research now on how to integrate some of these. Now that the standard is out I suspect there will be more professional interest in supporting these approaches and the tooling will get more open source developer contributions.
Think of a conventional computer like a regular Pokémon player. This player makes decisions based on clear rules, one at a time, and can only do one move at a time.
In the Pokémon card game, you have:
A limited number of cards (like a computer’s memory)
You play one card at a time (like a computer performing one calculation at a time)
You follow a clear set of rules (like how classical computers follow step-by-step instructions)
Every time you want to pick a card, attack, or use a move, you do it one by one in a specific order, just like a classical computer processes 0s and 1s in a step-by-step manner. If you want to calculate something or figure out the best strategy, you would test one option, then another, and so on, until you find the right solution. This makes conventional computers good at handling problems that can be broken down into simple steps.
Quantum Computers:
Now, imagine a quantum computer is like a player who can somehow look at all the cards in their deck at the same time and choose the best one without flipping through each card individually.
In the quantum world:
Instead of playing one card at a time, it’s like you could play multiple cards at once, but in a way that combines all possibilities (like a super-powered move).
You don’t just pick one strategy, you could explore all possible strategies at once. It’s as if you’re thinking of all possible moves simultaneously, which could lead to discovering new ways to win the game much faster than in a regular match.
Quantum computers rely on something called superposition, which is like having your Pokémon be both active and benched at the same time, until you need to make a decision. Then, they “collapse” into one state—either active or benched.
This gives quantum computers the ability to solve certain types of problems much faster because they are working with many possibilities at once, unlike classical computers that work on problems step-by-step.
Why Aren't Quantum Computers More Relevant To Me?
We'll explain this with Pokemon cards again.
The deck of cards (which represents the quantum system) in a quantum player’s game is extremely fragile. The cards are like quantum bits (qubits), and they can be in many states at once (active, benched, etc.). However, if someone bumps the table or even just looks at the cards wrong, the whole system can collapse and go back to a simple state.
In the Pokémon analogy, this would be like having super rare and powerful cards, but they’re so sensitive that if you shuffle too hard or drop the deck, the cards get damaged or lost. Because of this, it’s hard to keep the quantum player’s strategy intact without ruining their game.
In real life, quantum computers need extremely controlled environments to work—like keeping them at near absolute zero temperatures. Otherwise, they make too many errors to be useful for most tasks.
The quantum player might be amazing at playing certain types of Pokémon battles, like tournaments that require deep strategy or involve many complex moves. However, if they try to play a quick, casual game with a simple strategy, their special abilities don’t help much. They may even be worse at simple games than regular players.
Got it, so Post-Quantum Cryptography
So conventional encryption algorithms often work with the following design. They select 2 very large prime numbers and then multiply them to obtain an even larger number. The act of multiplying the prime numbers is easy, but it's hard to figure out what you used to make the output. These two numbers are known as the prime factors and are what you are talking about obtaining when you are talking about breaking encryption.
Sometimes you hear this referred to as "the RSA problem". How do you get the private key with only the public key. Since this not-yet-existing quantum computer would be good at finding these prime numbers, a lot of the assumptions we have about how encryption works would be broken. For years and years the idea that it is safe to share a public key has been an underpinning of much of the software that has been written. Cue much panic.
But since it takes 20 years for us to do anything as an industry we have to start planning now even though it seems more likely in 20-30 years we'll be struggling to keep any component of the internet functional through massive heat waves and water wars. Anywho.
So the NIST starting in 2016 asked for help selecting some post-quantum standards and ended up settling on 3 of them. Let's talk about them and why they are (probably) better to solve this problem.
FIPS 203 (Module-Lattice-Based Key-Encapsulation Mechanism Standard)
Basically we have two different things happening here. We have a Key-Encapsulation Mechanism, which is a known thing you have probably used. Layered on top of that is a Module-Lattice-Based KEM.
Key-Encapsulation Mechanism
You and another entity need to establish a private key between the two of you, but only using non-confidential communication. Basically the receiver generates a key pair and transmits the public key to the sender. The sender needs to ensure they got the right public key. The sender, using that public key, generates another key and encrypted text and then sends that back to the receiver over a channel that could either be secure or insecure. You've probably done this a lot in your career in some way or another.
More Lattices
There are two common algorithms that allow us to secure key-encapsulation mechanisms.
Ring Learning with Errors
Learning with Errors
Ring Learning with Errors
So we have three parts to this:
Ring: A set of polynomials where the variables are limited to a specific range. If you, like me, forgot what a polynomial is I've got you.
Modulus: The maximum value of the variable in the ring (e.g., q = 1024).
Error Term: Random values added during key generation, simulating noise.
How Does It Work?
Key generation:
Choose a large prime number (p)
Generate two random polynomials within the ring: a and s. These will be used to create the public and private keys.
Public Key Creation
Compute the product of a and a fixed polynomial x, which is part of the key generation algorithm (ax = s + e). The error term e represents the "noise" added to simulate real-world conditions.
Private Key: Keep s secret. It's used for decryption.
Public Key: Publish a. This is how others can send you encrypted messages.
Assuming you have a public key, in order to encrypt stuff you need to do the following:
Generate a random polynomial r
Encrypt the message using a, r, and some additional computation (c = ar + e'). The error term e' represents the "noise" added during encryption.
To decrypt:
Computing the difference between the ciphertext and a multiple of the public key (d = c - as). This eliminates the noise introduced during encryption.
Solving forr: Since we know that c = ar + e', subtracting as from both sides gives us an equation to solve for r.
Extracting the shared secret key: Once you have r, use it as a shared secret key.
What do this look like in Python?
Note: This is not a good example to use for real data. I'm trying to show how it works at a basic level. Never use a randos Python script to encrypt actual real data.
import numpy as np
def rlsr_keygen(prime):
# Generate large random numbers within the ring
poly_degree = 4
# Create a polynomial for key generation
s = np.random.randint(0, 2**12, size=poly_degree)
# Compute product of a and x, adding an error term (noise) during key generation
A = np.random.randint(0, 2**12, size=(poly_degree, poly_degree))
e = np.random.randint(-2**11, 2**11, size=poly_degree)
return s, A
def rlsr_encapsulate(A, message):
# Generate random polynomial to be used for encryption
modulus = 2**12
r = np.random.randint(0, 2**12, size=4)
# Compute ciphertext with noise
e_prime = np.random.randint(-modulus//2, modulus//2, size=4)
c = np.dot(A, r) + e_prime
return c
def rlsr_decapsulate(s, A, c):
# Compute difference between ciphertext and a multiple of the public key
d = np.subtract(c, np.dot(A, s))
# Solve for r (short vector in the lattice)
# In practice, this is done using various algorithms like LLL reduction
return d
def generate_shared_secret_key():
prime = 2**16 + 1 # Example value
modulus = 2**12
s, A = rlsr_keygen(prime)
# Generate a random message (example value)
message = np.random.randint(0, 256, size=4)
c = rlsr_encapsulate(A, message)
# Compute shared secret key
d = rlsr_decapsulate(s, A, c)
return d
shared_secret_key = generate_shared_secret_key()
print(shared_secret_key)
FIPS 204 (Module-Lattice-Based Digital Signature Standard)
A digital signature is a way to verify the authenticity and integrity of electronic documents, messages, or data. This is pretty important for software supply chains and packaging along with a million other things.
How It Works
Key Generation: Two large prime numbers p and q are chosen, along with their product n. A random matrix A is generated within this lattice. This process creates two keys:
Public Key (A): Published for others to use when verifying a digital signature.
Private Key (s): Kept secret by the sender and used to create a digital signature.
Message Hashing: The sender takes their message or document, which is often large in size, and converts it into a fixed-size string of characters called a message digest or hash value using a hash function (e.g., SHA-256). This process ensures that any small change to the original message will result in a completely different hash value.
Digital Signature Creation: The sender encrypts their private key (s) with the public key of the recipient (A) and then combines it with the message digest using a mathematical operation like exponentiation modulo n. This produces a unique digital signature for the original message.
Message Transmission: The sender transmits the digitally signed message (message + digital signature) to the recipient.
Digital Signature Verification:
When receiving the digitally signed message, the recipient can verify its authenticity using their public key (A). Here's how:
Recover Private Key (s): The recipient uses their public key (A) and the received digital signature to recover the private key used by the sender.
Message Hashing (Again): The recipient recreates the message digest from the original message, which should match the one obtained during the digital signature creation process.
Verification: If the two hash values match, it confirms that the original message hasn't been tampered with and was indeed signed by the sender.
Module-Lattice-Based Digital Signature
So a lot of this is the same as the stuff in FIPS 203. I'll provide a Python example for you to see how similar it is.
import numpy as np
def rlsr_keygen(prime):
# Generate large random numbers within the ring
modulus = 2**12
poly_degree = 4
s = np.random.randint(0, 2**12, size=poly_degree)
A = np.random.randint(0, 2**12, size=(poly_degree, poly_degree))
e = np.random.randint(-2**11, 2**11, size=poly_degree)
return s, A
def rlsr_sign(A, message):
# Generate random polynomial to be used for signing
modulus = 2**12
r = np.random.randint(0, 2**12, size=4)
# Compute signature with noise
e_prime = np.random.randint(-modulus//2, modulus//2, size=4)
c = np.dot(A, r) + e_prime
return c
def rlsr_verify(s, A, c):
# Compute difference between ciphertext and a multiple of the public key
d = np.subtract(c, np.dot(A, s))
# Check if message can be recovered from signature (in practice, this involves solving for r using LLL reduction)
return True
def generate_signature():
prime = 2**16 + 1 # Example value
modulus = 2**12
s, A = rlsr_keygen(prime)
message = np.random.randint(0, 256, size=4)
c = rlsr_sign(A, message)
signature_validity = rlsr_verify(s, A, c)
if signature_validity:
print("Signature is valid.")
return True
else:
print("Signature is invalid.")
return False
generate_signature()
Basically the same concept as before but for signatures.
FIPS 205 (Stateless Hash-Based Digital Signature Standard)
The Stateful-Light Hash-based Digital Signature Scheme (SLH-DSA) is a family of digital signature schemes that use hash functions and do not require any intermediate computations or stored state. SLH-DSAs are designed to be highly efficient and secure, making them suitable for various applications.
Basically because they use hashes and are stateless they are more resistant to quantum computers.
Basic Parts
Forest of Random Subsets (FORS): A collection of random subsets generated from a large set.
Hash Functions: Used to compute the hash values for the subsets.
Subset Selection: A mechanism for selecting a subset of subsets based on the message to be signed.
How It Works
Key Generation: Generate multiple random subsets from a large set using a hash function (e.g., SHA-256).
Message Hashing: Compute the hash value of the message to be signed.
Subset Selection: Select a subset of subsets based on the hash value of the message.
Signature Generation: Generate a signature by combining the selected subsets.
The Extended Merkle Signature Scheme (XMSS) is a multi-time signature scheme that uses the Merkle Tree technique to generate digital signatures. It is basically the following 4 steps to use.
Key Generation: Generate a Merkle Tree using multiple levels of random hash values.
Message Hashing: Compute the hash value of the message to be signed.
Tree Traversal: Traverse the Merkle Tree to select nodes that correspond to the message's hash value.
Signature Generation: Generate a signature by combining the selected nodes.
Can I have a Python example?
Honestly I really tried on this. But there was not a lot on the internet on how to do this. I will give you what I wrote, but it doesn't work and I'm not sure exactly how to fix it.
Python QRL library. This seems like it'll work but I couldn't get the package to install successfully with Python 3.10, 3.11 or 3.12.
Quantcrypt: This worked but honestly the "example" doesn't really show you anything interesting except that it seems to output what you think it should output.
Standard library: I messed part of it up but I'm not sure exactly where I went wrong.
import hashlib
import os
# Utility function to generate a hash of data
def hash_data(data):
return hashlib.sha256(data).digest()
# Generate a pair of keys (simplified as random bytes for the demo)
def generate_keypair():
private_key = os.urandom(32) # Private key (random 32 bytes)
public_key = hash_data(private_key) # Public key derived by hashing the private key
return private_key, public_key
# Create a simplified Merkle tree with n leaves
def create_merkle_tree(leaf_keys):
# Create parent nodes by hashing pairs of leaf nodes
current_level = [hash_data(k) for k in leaf_keys]
while len(current_level) > 1:
next_level = []
# Pair nodes and hash them to create the next level
for i in range(0, len(current_level), 2):
left_node = current_level[i]
right_node = current_level[i+1] if i + 1 < len(current_level) else left_node # Handle odd number of nodes
parent_node = hash_data(left_node + right_node)
next_level.append(parent_node)
current_level = next_level
return current_level[0] # Root of the Merkle tree
# Sign a message using a given private key
def sign_message(message, private_key):
# Hash the message and then "sign" it by using the private key
message_hash = hash_data(message)
signature = hash_data(private_key + message_hash)
return signature
def verify_signature(message, signature, public_key):
message_hash = hash_data(message)
# Instead of using public_key, regenerate what the signature would be if valid
expected_signature = hash_data(public_key + message_hash) # Modify this logic
return expected_signature == signature
# Example of using the above functions
# 1. Generate key pairs for leaf nodes in the Merkle tree
tree_height = 4 # This allows for 2^tree_height leaves
num_leaves = 2 ** tree_height
key_pairs = [generate_keypair() for _ in range(num_leaves)]
private_keys, public_keys = zip(*key_pairs)
# 2. Create the Merkle tree from the public keys (leaf nodes)
merkle_root = create_merkle_tree(public_keys)
print(f"Merkle Tree Root: {merkle_root.hex()}")
# 3. Sign a message using one of the private keys (simplified signing)
message = b"Hello, this is a test message for XMSS-like scheme"
leaf_index = 0 # Choose which key to sign with (0 in this case)
private_key = private_keys[leaf_index]
public_key = public_keys[leaf_index]
signature = sign_message(message, private_key)
print(f"Signature: {signature.hex()}")
# 4. Verify the signature
is_valid = verify_signature(message, signature, public_key)
print("Signature is valid!" if is_valid else "Signature is invalid!")
It always says signature is invalid. If you spot what I did wrong let me know, but honestly I sort of lost enthusiasm for this as we went. Hopefully the code you shouldn't be using at least provides some context.
Is This Something I Should Worry About Now?
That really depends on what data you are dealing with. If I was dealing with tons of super-sensitive data, I would probably start preparing the way now. This isn't a change you are going to want to make quickly, in no small part to account for the performance difference in using some of these approaches vs more standard encryption. Were I working on something like a medical device or secure communications it would definitely be something I'd at least spike out and try to see what it looked like.
So basically if someone asks you about this, I hope now you can at least talk intelligently about it for 5 minutes until they wander away from boredom. If this is something you actually have to deal with, start with PQClean and work from there.
I've spent a fair amount of time around networking. I've worked for a small ISP, helped to set up campus and office networks and even done a fair amount of work with BGP and assisting with ISP failover and route work. However in my current role I've been doing a lot of mobile network diagnostics and troubleshooting which made me realize I actually don't know anything about how mobile networks operate. So I figured it was a good idea for me to learn more and write up what I find.
It's interesting that without a doubt cellular internet is either going to become or has become the default Internet for most humans alive, but almost no developers I know have any idea how it works (including myself until recently). As I hope that I demonstrate below, it is untold amounts of amazing work that has been applied to this problem over decades that has really produced incredible results. As it turns out the network engineers working with cellular were doing nuclear physics while I was hot-gluing stuff together.
I am not an expert. I will update this as I get better information, but use this as a reference for stuff to look up, not a bible. It is my hope, over many revisions, to turn this into a easier to read PDF that folks can download. However I want to get it out in front of people to help find mistakes.
TL/DR: There is a shocking, eye-watering amount of complexity when it comes to cellular data as compared to a home or datacenter network connection. I could spend the next six months of my life reading about this and feel like I barely scratched the surface. However I'm hoping that I have provided some basic-level information about how this magic all works.
Corrections/Requests: https://c.im/@matdevdug. I know I didn't get it all right, I promise I won't be offended.
Basics
A modern cellular network at the core is comprised of three basic elements:
the RAN (radio access network)
CN (core network)
Services network
RAN
The RAN contains the base stations that allow for the communication with the phones using radio signals. When we think of a cell tower we are thinking of a RAN. When we are thinking of what a cellular network provides in terms of services, a lot of that is actually contained within the CN. That's where the stuff like user authorization, services turned on or off for the user and all the background stuff for the transfer and hand-off of user traffic. Think SMS and phone calls for most users today.
Key Components of the RAN:
Base Transceiver Station (BTS): The BTS is a radio transmitter/receiver that communicates with your phone over the air interface.
Node B (or Evolved Node B for 4G or gNodeB for 5G): In modern cellular networks, Node B refers to a base station that's managed by multiple cell sites. It aggregates data from these cell sites and forwards it to the RAN controller.
Radio Network Controller (RNC): The RNC is responsible for managing the radio link between your phone and the BTS/Node B.
Base Station Subsystem (BSS): The BSS is a term used in older cellular networks, referring to the combination of the BTS and RNC.
Cell Search and Network Acquisition. The device powers on and begins searching for available cells by scanning the frequencies of surrounding base stations (e.g., eNodeB for LTE, gNodeB for 5G).
┌──────────────┐ ┌──────────────┐
│ Base Station│ │ Mobile │
│ │ │ Device │
│ Broadcast │ │ │
│ ──────────> │ Search for │ <────────── │
│ │ Sync Signals│ Synchronizes │
│ │ │ │
└──────────────┘ └──────────────┘
- Device listens for synchronization signals.
- Identifies the best base station for connection.
Random Access. After identifying the cell to connect to, the device sends a random access request to establish initial communication with the base station.This is often called RACH. If you want to read about it I found an incredible amount of detail here: https://www.sharetechnote.com/html/RACH_LTE.html
┌──────────────┐ ┌──────────────┐
│ Base Station│ │ Mobile │
│ │ │ Device │
│ Random Access Response │ │
│ <────────── │ ──────────> │ Random Access│
│ │ │ Request │
└──────────────┘ └──────────────┘
- Device sends a Random Access Preamble.
- Base station responds with timing and resource allocation.
Dedicated Radio Connection Setup (RRC Setup). The base station allocates resources for the device to establish a dedicated radio connection using the Radio Resource Control (RRC) protocol.
Device-to-Core Network Communication (Authentication, Security, etc.). Once the RRC connection is established, the device communicates with the core network (e.g., EPC in LTE, 5GC in 5G) for authentication, security setup, and session establishment.
┌──────────────┐ ┌──────────────┐
│ Base Station│ │ Mobile │
│ ──────────> │ Forward │ │
│ │ Authentication Data │
│ │ <────────── │Authentication│
│ │ │ Request │
│ │ │ │
└──────────────┘ └──────────────┘
- Device exchanges authentication and security data with the core network.
- Secure communication is established.
Data Transfer (Downlink and Uplink). After setup, the device starts sending (uplink) and receiving (downlink) data using the established radio connection.
┌──────────────┐ ┌──────────────┐
│ Base Station│ │ Mobile │
│ ──────────> │ Data │ │
│ Downlink │ │ <───────── │
│ <────────── │ Data Uplink │ ──────────> │
│ │ │ │
└──────────────┘ └──────────────┘
- Data is transmitted between the base station and the device.
- Downlink (BS to Device) and Uplink (Device to BS) transmissions.
Handover. If the device moves out of range of the current base station, a handover is initiated to transfer the connection to a new base station without interrupting the service.
Signaling
As shown in the diagram above, there are a lot of references to something called "signaling". Signaling seems to be a shorthand for handling a lot of configuration and hand-off between tower and device and the core network. As far as I can tell they can be broken into 3 types.
Access Stratum Signaling
Set of protocols to manage the radio link between your phone and cellular network.
Handles authentication and encryption
Radio bearer establishment (setting up a dedicated channel for data transfer)
Mobility management (handovers, etc)
Quality of Service control.
Non-Access Stratum (NAS) Signaling
Set of protocols used to manage the interaction between your phone and the cellular network's core infrastructure.
It handles tasks such as authentication, billing, and location services.
Authentication with the Home Location Register (HLR)
Roaming management
Charging and billing
IMSI Attach/ Detach procedure
Lower Layer Signaling on the Air Interface
This refers to the control signaling that occurs between your phone and the cellular network's base station at the physical or data link layer.
It ensures reliable communication over the air interface, error detection and correction, and efficient use of resources (e.g., allocating radio bandwidth).
Modulation and demodulation control
Error detection and correction using CRCs (Cyclic Redundancy Checks)
High Level Overview of Signaling
You turn on your phone (AS signaling starts).
Your phone sends an Initial Direct Transfer (IDT) message to establish a radio connection with the base station (lower layer signaling takes over).
The base station authenticates your phone using NAS signaling, contacting the HLR for authentication.
Once authenticated, lower layer signaling continues to manage data transfer between your phone and the base station.
What is HLR?
Home Location Register contains the subscriber data for a network. Their IMSI, phone number, service information and is what negotiates where in the world the user physically is.
Duplexing
You have a lot of devices and you have a few towers. You need to do many uplinks and downlinks to many devices.
It is important that any cellular communications system you can send and receive in both directions at the same time. This enables conversations to be made, with either end being able to talk and listen as required. In order to be able to transmit in both directions, a device (UE) and base station must have a duplex scheme. There are a lot of them including Frequency Division Duplex (FDD), Time Division Duplex (TDD), Semi-static TDD and Dynamic TDD.
Duplexing Types:
Frequency Division Duplex (FDD): Uses separate frequency bands for downlink and uplink signals.
Downlink: The mobile device receives data from the base station on a specific frequency (F1).
Uplink: The mobile device sends data to the base station on a different frequency (F2).
Key Principle: Separate frequencies for uplink and downlink enable simultaneous transmission and reception.
┌──────────────┐ ┌──────────────┐
│ Base Station│ │ Mobile │
│ │ │ Device │
│ ──────────> │ F1 (Downlink)│ <────────── │
│ │ │ │
│ <────────── │ F2 (Uplink) │ ──────────> │
└──────────────┘ └──────────────┘
Separate frequency bands (F1 and F2)
Time Division Duplex (TDD): Alternates between downlink and uplink signals over the same frequency band.
Downlink: The base station sends data to the mobile device in a time slot.
Uplink: The mobile device sends data to the base station in a different time slot using the same frequency.
Key Principle: The same frequency is used for both uplink and downlink, but at different times.
┌──────────────┐ ┌──────────────┐
│ Base Station│ │ Mobile Phone│
│ (eNodeB/gNB) │ │ │
└──────────────┘ └──────────────┘
───────────► Time Slot 1 (Downlink)
(Base station sends data)
◄─────────── Time Slot 2 (Uplink)
(Mobile sends data)
───────────► Time Slot 3 (Downlink)
(Base station sends data)
◄─────────── Time Slot 4 (Uplink)
(Mobile sends data)
- The same frequency is used for both directions.
- Communication alternates between downlink and uplink in predefined time slots.
Frame design
Downlink/Uplink: There are predetermined time slots for uplink and downlink, but they can be changed periodically (e.g., minutes, hours).
Key Principle: Time slots are allocated statically for longer durations but can be switched based on network traffic patterns (e.g., heavier downlink traffic during peak hours).
A frame typically lasts 10 ms and is divided into time slots for downlink (DL) and uplink (UL).
"Guard" time slots are used to allow switching between transmission and reception.
4. Dynamic Time Division Duplex (Dynamic TDD):
Downlink/Uplink: Time slots for uplink and downlink are dynamically adjusted in real time based on instantaneous traffic demands.
Key Principle: Uplink and downlink time slots are flexible and can vary dynamically to optimize the usage of the available spectrum in real-time, depending on the traffic load.
See second diagram for what "guard periods" are. Basically windows to ensure there are gaps and the signal doesn't overlap.
┌──────────────┐ ┌──────────────┐
│ Base Station│ │ Mobile Phone│
│ (eNodeB/gNB) │ │ │
└──────────────┘ └──────────────┘
───────────► Time Slot 1 (Downlink)
───────────► Time Slot 2 (Downlink)
───────────► Time Slot 3 (Downlink)
◄─────────── Time Slot 4 (Uplink)
───────────► Time Slot 5 (Downlink)
◄─────────── Time Slot 6 (Uplink)
- More slots for downlink in scenarios with high download traffic (e.g., streaming video).
- Dynamic slot assignment can change depending on the real-time demand.
┌──────────────┐ ┌──────────────┐
│ Base Station│ │ Mobile Phone│
│ (eNodeB/gNB) │ │ │
└──────────────┘ └──────────────┘
───────────► Time Slot 1 (Downlink)
───────────► Time Slot 2 (Downlink)
[Guard Period] (Switch from downlink to uplink)
◄─────────── Time Slot 3 (Uplink)
[Guard Period] (Switch from uplink to downlink)
───────────► Time Slot 4 (Downlink)
- Guard periods allow safe switching from one direction to another.
- Guard periods prevent signals from overlapping and causing interference.
Core
So I've written a lot about what the RAN does. But we haven't really touched on what the core network concept does. Basically once the device registers with the base station using the random access procedure discussed above, the device is enabled and allows the core network to do a bunch of stuff that we typically associate with "having a cellular plan".
For modern devices when we say authentication we mean "mutual authentication", which means the device authenticates the network and the network authenticates the device. This is typically something like a subscriber-specific secret key and a random number to generate a response to the request sent by the device. Then the network sends an authentication token and the device compares this token with the expected token to authenticate the network. It looks like the following:
┌───────────────────────┐
│ Encryption & │
│ Integrity Algorithms │
├───────────────────────┤
│ - AES (Encryption) │
│ - SNOW 3G (Encryption│
│ - ZUC (Encryption) │
│ - SHA-256 (Integrity)│
└───────────────────────┘
- AES: Strong encryption algorithm commonly used in LTE/5G.
- SNOW 3G: Stream cipher used for encryption in mobile communications.
- ZUC: Encryption algorithm used in 5G.
- SHA-256: Integrity algorithm ensuring data integrity.
The steps of the core network are as follows:
Registration (also called attach procedure): The device connects to the core network (e.g., EPC in LTE or 5GC in 5G) to register and declare its presence. This involves the device identifying itself and the network confirming its identity.
Mutual Authentication: The network and device authenticate each other to ensure a secure connection. The device verifies the network’s authenticity, and the network confirms the device’s identity.
Security Activation: After successful authentication, the network and the device establish a secure channel using encryption and integrity protection to ensure data confidentiality and integrity.
Session Setup and IP Address Allocation: The device establishes a data session with the core network, which includes setting up bearers (logical paths for data) and assigning an IP address to enable internet connectivity.
How Data Gets To Phone
Alright we've talked about how the phone finds a tower to talk to, how the tower knows who the phone is and all the millions of steps involved in getting the mobile phone an actual honest-to-god IP address. How is data actually getting to the phone itself?
Configuration for Downlink Measurement: Before downlink data transmission can occur, the mobile device (UE) must be configured to perform downlink measurements. This helps the network optimize transmission based on the channel conditions. Configuration messages are sent from the base station (eNodeB in LTE or gNB in 5G) to instruct the UE to measure certain DL reference signals.
Reference Signal (Downlink Measurements): The mobile device receives reference signals from the network. These reference signals are used by the UE to estimate DL channel conditions. In LTE, Cell-specific Reference Signals (CRS) are used, and in 5G, Channel State Information-Reference Signals (CSI-RS) are used.
DL Channel Conditions (CQI, PMI, RI): The mobile device processes the reference signals to assess the downlink channel conditions and generates reports such as CQI (Channel Quality Indicator), PMI (Precoding Matrix Indicator), and RI (Rank Indicator). These reports are sent back to the base station.
DL Resource Allocation and Packet Transmission: Based on the UE’s channel reports (CQI, PMI, RI), the base station allocates appropriate downlink resources. It determines the modulation scheme, coding rate, MIMO layers, and frequency resources (PRBs) and sends a DL scheduling grant to the UE. The data packets are then transmitted over the downlink.
Positive/Negative Acknowledgement (HARQ Feedback): After the UE receives the downlink data, it checks the integrity of the packets using CRC (Cyclic Redundancy Check). If the CRC passes, the UE sends a positive acknowledgement (ACK) back to the network. If the CRC fails, a negative acknowledgement (NACK) is sent, indicating that retransmission is needed.
New Transmission or Retransmission (HARQ Process): If the network receives a NACK, it retransmits the packet using the HARQ process. The retransmission is often incremental (IR-HARQ), meaning the device combines the new transmission with previously received data to improve decoding.
Uplink is a little different but is basically the device asking for a timeslot to upload, getting a grant, sending the data up and then getting an ack that it is sent.
Gs
So as everyone knows cellular networks have gone through a series of revisions over the years around the world. I'm going to talk about them and just try to walk through how they are different and what they mean.
1G
Starts in Japan, moves to Europe and then the US and UK.
Speeds up to 2.4kbps and operated in the frequency band of 150 KHz.
Didn't work between countries, had low capacity, unreliable handoff and no security. Basically any receiver can listen to a conversation.
2G
Launched in 1991 in Finland
Allows for text messages, picture messages and MMS.
Speeds up to 14.4kbps between 900MHz and 1800MHz bands
Actual security between sender and receiver with messages digitally encrypted.
Wait, are text messages encrypted?
So this was completely new to me but I guess my old Nokia brick had some encryption on it. Here's how that process worked:
Mobile device stores a secret key in the SIM card and the network generates a random challenge and sends it to the mobile device.
The A3 algorithm is used to compute a Signed Response (SRES) using the secret key and the random value.
Then the A8 algorithm is used with secret and the random value to generate a session encryption key Kc (64-bit key). This key will be used for encrypting data, including SMS.
After the authentication process and key generation, encryption of SMS messages begins. GSM uses a stream cipher to encrypt both voice and data traffic, including text messages. The encryption algorithm used for SMS is either A5/1 or A5/2, depending on the region and network configuration.
A5/1: A stronger encryption algorithm used in Europe and other regions.
A5/2: A weaker variant used in some regions, but deprecated due to its vulnerabilities.
The A5 algorithm generates a keystream that is XORed with the plaintext message (SMS) to produce the ciphertext, ensuring the confidentiality of the message.
So basically text messages from the phone to the base station were encrypted and then exposed there. However I honestly didn't even know that was happening.
TSMA and CDMA
I remember a lot of conversations about GSM vs CDMA when you were talking about cellular networks but at the time all I really knew was "GSM is European and CDMA is US".
TSMA is GSM and uses time slots
CDMA allocates each user a special code to communicate over multiple physical channels
GSM is where we see services like voice mail, SMS, call waiting
EDGE
So everyone who is old like me remembers EDGE on cellphones, including the original iPhone I waited in line for. EDGE was effectively a retrofit you could put on top of an existing GSM network, keeping the cost for adding it low. You got speeds on 9.6-200kbps.
3G
Welcome to the year 2000
Frequency spectrum of 3G transmissions is 1900-2025MHz and 2110-2200MHz.
UTMS takes over for GSM and CDMA2000 takes over from CDMA.
Maxes out around 8-10Mbps
IMT-2000 = 3G
So let's just recap quickly how we got here.
2G (GSM): Initially focused on voice communication and slow data services (up to 9.6 kbps using Circuit Switched Data).
2.5G (GPRS): Introduced packet-switched data with rates of 40-50 kbps. It allowed more efficient use of radio resources for data services.
2.75G (EDGE): Enhanced the data rate by improving modulation techniques (8PSK). This increased data rates to around 384 kbps, making it more suitable for early mobile internet usage.
EDGE introduced 8-PSK (8-Phase Shift Keying) modulation, which allowed the encoding of 3 bits per symbol (as opposed to 1 bit per symbol with the original GSM’s GMSK (Gaussian Minimum Shift Keying) modulation). This increased spectral efficiency and data throughput.
EDGE had really high latency so it wasn't really usable for things like video streaming or online gaming.
3G (WCDMA): Max data rate: 2 Mbps (with improvements over EDGE in practice). Introduced spread-spectrum (CDMA) technology with QPSK modulation.
3.5G (HSDPA): Enhanced WCDMA by introducing adaptive modulation (AMC), HARQ, and NodeB-based scheduling. Max data rate: 14.4 Mbps (downlink).
So when we say 3G we actually mean a pretty wide range of technologies all underneath the same umbrella.
4G
4G or as it is sometimes called LTE evolved from WCDMA. Instead of developing new radio interfaces and new technology existing and newly developed wireless system like GPRS, EDGE, Bluetooth, WLAN and Hiper-LAN were integrated together
4G has a download speed of 67.65Mbps and upload speed of 29.37Mbps
4G operates at frequency bands of 2500-2570MHz for uplink and 2620-2690MHz for downlink with channel bandwidth of 1.25-20MHz
4G has a few key technologies, mainly OFDM, SDR and Multiple-Input Multiple-Output (MIMO).
OFDM (Orthogonal Frequency Division Multiplexing)
Allows for more efficient use of the available bandwidth by breaking down data into smaller pieces and sending them simultaneously
Since each channel uses a different frequency, if one channel experiences interference or errors, the others remain unaffected.
OFDM can adapt to changing network conditions by dynamically adjusting the power levels and frequencies used for each channel.
SDR (Software Defined Radio)
Like it sounds, it is a technology that enables flexible and efficient implementation of wireless communication systems by using software algorithms to control and process radio signals in real-time. In cellular 4G, SDR is used to improve performance, reduce costs, and enable advanced features like multi-band support and spectrum flexibility.
MIMO (multiple-input multiple-output)
A technology used in cellular 4G to improve the performance and capacity of wireless networks. It allows for the simultaneous transmission and reception of multiple data streams over the same frequency band, using multiple antennas at both the base station and mobile device.
Works by having both the base station and the mobile device equipped with multiple antennas
Each antenna transmits and receives a separate data stream, allowing for multiple streams to be transmitted over the same frequency band
There is Spatial Multiplexing where multiple data streams are transmitted over the same frequency band using different antennas. Then Beamforming where advanced signal processing techniques to direct the transmitted beams towards specific users, improving signal quality and reducing interference. Finally Massive MIMO where you use a lot of antennas (64 or more) to improve capacity and performance.
5G
The International Telecommunication Union (ITU) defines 5G as a wireless communication system that supports speeds of at least 20 Gbps (gigabits per second), with ultra-low latency of less than 1 ms (millisecond).
5G operates on a much broader range of frequency bands than 4G
Low-band frequencies: These frequencies are typically below 3 GHz and are used for coverage in rural areas or indoor environments. Examples include the 600 MHz, 700 MHz, and 850 MHz bands.
Mid-band frequencies: These frequencies range from approximately 3-10 GHz and are used for both coverage and capacity in urban areas. Examples include the 4.5 GHz, 6 GHz, and 24 GHz bands.
High-band frequencies: These frequencies range from approximately 10-90 GHz and are used primarily for high-speed data transfer in dense urban environments. Examples include the 28 GHz, 39 GHz, and 73 GHz bands.
5g network designs are a step up in complexity from their 4g predecessors, with a control plane and a userplane with each plane using a separate network function. 4G networks have a single plane.
5G uses advanced modulation schemes such as 256-Quadrature Amplitude Modulation (QAM) to achieve higher data transfer rates than 4G, which typically uses 64-QAM or 16-QAM
All the MIMO stuff discussed above.
What the hell is Quadrature Amplitude Modulation?
I know, it sounds like a Star Trek thing. It is a way to send digital information over a communication channel, like a wireless network or cable. It's a method of "modulating" the signal, which means changing its characteristics in a way that allows us to transmit data.
When we say 256-QAM, it refers to the specific type of modulation being used. Here's what it means:
Quadrature: This refers to the fact that the signal is being modulated using two different dimensions (or "quadratures"). Think of it like a coordinate system with x and y axes.
Amplitude Modulation (AM): This is the way we change the signal's characteristics. In this case, we're changing the amplitude (magnitude) of the signal to represent digital information.
256: This refers to the number of possible states or levels that the signal can take on. Think of it like a binary alphabet with 2^8 = 256 possible combinations.
Why does 5G want this?
More information per symbol: With 256-QAM, each "symbol" (or signal change) can represent one of 256 different values. This means we can pack more data into the same amount of time.
Faster transmission speeds: As a result, we can transmit data at higher speeds without compromising quality.
Kubernetes and 5G
Kubernetes is a popular technology in 5G and is used for a number of functions, including the following:
Virtual Network Functions (VNFs): VNFs are software-based implementations of traditional network functions, such as firewalls or packet filters. Kubernetes is used to deploy and manage these VNFs.
Cloud-Native Network Functions (CNFs): CNFs are cloud-native applications that provide network function capabilities, such as traffic management or security filtering. Kubernetes is used to deploy and manage these CNFs.
Network Function Virtualization (NFV) Infrastructure: NFV infrastructure provides the underlying hardware and software resources for running VNFs and CNFs. Kubernetes is used to orchestrate and manage this infrastructure.
Conclusion
So one of the common sources of frustration for developers I've worked with when debugging cellular network problems is that often while there is plenty of bandwidth for what they are trying to do, the latency involved can be quite variable. If you look at all the complexity behind the scenes and then factor in that the network radio on the actual cellular device is constantly flipping between an Active and Idle state in an attempt to save battery life, this suddenly makes sense.
Because all of the complexity I'm talking about ultimately gets you back to the same TCP stack we've been using for years with all the overhead involved in that back and forth. We're still ending up with a SYN -> SYN-ACK. There are tools you can use to shorten this process somewhat (TCP Fast Open) and changing the initial congestion window but still you are mostly dealing with the same level of overhead you always dealt with.
Ultimately there isn't much you can do with this information, as developers have almost no control over the elements present here. However I think it's useful as cellular networks continue to become the dominant default Internet for the Earth's population that more folks understand the pieces happening in the background of this stack.
I've complained a lot about the gaps in offerings for login security in the past. The basic problem is this domain of security serves a lot of masters. To get the widest level of buy-in from experts, the solution has to scale from normal logins to national security. This creates a frustrating experience for users because it is often overkill for the level of security they need. Basically is it reasonable that you need Google Authenticator to access your gym website? In terms of communication, the solutions we hear about the most, i.e. with the most marketing, allow for the insertion of SaaS services into the chain so that an operation that was previously free now pays a monthly fee based on usage.
This creates a lopsided set of incentives where only the most technologically complex and extremely secure solutions are endorsed and when teams are (understandably) overwhelmed by their requirements a SaaS attempts to get inserted into a critical junction of their product.
The tech community have mostly agreed that username and passwords assigned by the user are not sufficient for even basic security. What we haven't done is precisely explained what it is that we want normal average non-genius developers to do about that. We've settled on this really weird place with the following rules:
Email accounts are always secure but SMS is never secure. You can always email a magic link and that's fine for some reason.
You should have TOTP but we've settled on very short time windows because I guess we decided NTP was a solved problem. There's no actual requirement the code changes every 30 seconds, we're just pretending that we're all spies and someone is watching your phone. Also consumers should be given recovery codes, which are basically just passwords you generate and give to them and only allow to be used once. It is unclear why generating a one-time password for the user is bad but if we call the password a "recovery code" it is suddenly sufficient.
TOTP serves two purposes. One is it ensures there is one randomly generated secret associated with the account that we don't hash (even though I think you could....but nobody seems to), so it's actually kind of a dangerous password that we need to encrypt and can't rotate. The other is we tacked on this stupid idea that it is multi-device, even though there's zero requirement that the code lives on another device. Just someone decided that because there is a QR code it is now multi-device because phones scan QR codes.
At some point we decided to add a second device requirement, but those devices live in entirely different ecosystems. Even if you have an iPhone and a work MacBook, they shouldn't be using the same Apple ID, so I'm not really clear how things would ever line up. It seems like most people sync things like TOTP with their personal Google accounts across different work devices over time. I can't imagine that was the intended functionality.
Passkeys are great but also their range of behavior is bizarre and unpredictable so if you implement them you will be expected to effectively build every other possible recovery flow into this system. Even highly technical users cannot be relied upon to know whether they will lose their passkey when they do something.
Offloading the task to a large corporation is good, but you cannot pick one big corporation. You must have a relationship with Apple and Facebook and Microsoft and Google and Discord and anyone else who happens to be wandering around when you build this. Their logins are secured with magic and unbreakable, but if they are bypassed you can go fuck yourself because that is your problem, not theirs.
All of this is sort of a way to talk around the basic problem. I need a username and a password for every user on my platform. That password needs to be randomly generated and never stored as plain text in my database. If I had a way to know that the browser generated and stored the password, this basic level of security is met. As far as I can tell, there's no way for me to know that for sure. I can guess based on the length of the password and how quickly it was entered into a form field.
Keep in mind all I am trying to do is build a simple login route on an application that is portable, somewhat future proof and doesn't require a ton of personal data from the user to resolve common human error problems. Ideally I'd like to be able to hand this to someone else, they generate a new secret and they too can enroll as many users as they want. This is a simple thing to build so it should be simple to solve the login story as well.
Making a simple CMS
The site you are reading this on is hosted on Ghost, a CMS that is written in Node. It supports a lot of very exciting features I don't use and comes with a lot of baggage I don't need. Effectively all I actually use for is:
RSS
Writing posts in its editor
Fixing typos in the posts I publish (sometimes, my writing is not good)
Let me write a million drafts for every thing I publish
Minimize the amount of JS I'm inflicting on people and try whenever possible to stick to just HTML and CSS
Ghost supports a million things on top of the things I have listed and it also comes with some strange requirements like running MySQL. I don't really need a lot of that stuff and running a full MySQL for a CMS that doesn't have any sort of multi-instance scaling functionality seems odd. I also don't want to stick something this complicated on the internet for people to use for long periods of time without regular maintenance.
Before you say it I don't care for static site generators. I find it's easier for me to have a tab open, write for ten minutes, then go back to what I was doing before.
My goal with this is just to make a normal friendly baby CMS that I could share with a group of people, less technical people, so they could write stuff when they felt like it. We're not trading nuclear secrets here. The requirements are:
Needs to be open to the public internet with no special device enrollment or network segmentation
Not administered by me. Whatever normal problem arises it has to be solvable by a non-technical person.
Making the CMS
So in a day when I was doing other stuff I put this together: https://gitlab.com/matdevdug/ezblog. It's nothing amazing, just sort of a basic template I can build on top of later. Uses sqlite and it does the things you would expect it to do. I can:
Write posts in Quill
Save the posts as drafts or as published posts
Edit the posts after I publish them
Have a valid RSS feed of the posts
The whole frontend is just HTML/CSS so it'll load fast and be easy to cache
Then there is the whole workflow of draft to published.
For one days work this seems to be roughly where I hoped to be. Now we get to the crux of the matter. How do I log in?
What you built is bad and I hate it
The point is I should be able to solve this problem quickly and easily for a hobby website, not that you personally like what I made. The examples are not fully-fleshed out examples, just templates to demonstrate the problem. Also I'm allowed to make stuff that serves no other function than it amuses me.
Password Login
The default for most sites (including Ghost) is just a username and password. The reason for this: it's easy, works on everything and it's pretty simple to work out a fallback flow for users. Everyone understands it, there's no concerns around data ownership or platform lock-in.
I've got a csrf_token in there and the rest is pretty straight forward. Server-side is also pretty easy.
@bp.route('/login', methods=('GET', 'POST'))
@limiter.limit("5 per minute")
def login():
if request.method == 'POST':
username = request.form['username']
password = request.form['password']
db = get_db()
error = None
user = db.execute(
'SELECT * FROM user WHERE username = ?', (username,)
).fetchone()
if user is None:
error = 'Incorrect username.'
elif not check_password_hash(user['password'], password):
error = 'Incorrect password.'
if error is None:
session.clear()
session['user_id'] = user['id']
return redirect(url_for('index'))
flash(error)
return render_template('auth/login.html')
I'm not storing the raw password, just the hash. It's requires almost no work to do. It works exactly the way I think it should. Great fine.
Why are passwords insufficient?
This has been talked to death but let's recap for the sake of me being able to say I did it and you can just kinda scroll quickly through this part.
Users reuse usernames and passwords, so even though I might not know the raw text of the password another website might be (somehow) even lazier than me and their database gets leaked and then oh no I'm hacked.
The password might be a bad password and it's just one people try and oh no they are in the system.
I have to build in a password reset flow because humans are bad at remembering things and that's just how it is.
Password Reset Flow
Everyone has seen this, but let's talk about what I would need to modify about this small application to allow more than one person to use it.
I would need to add a route that handles allowing the user to reset their password by requesting it through their email
To know where to send that email, I would need to receive and store the email address for every user
I would also need to verify the users email to ensure it worked
All of this hinges on having a token I could send to that user that I could generate with something like the following:
Since I'm salting it with the hash of the current password which will change when they change the password, the token can only be used once. Makes sense.
Why is this bad?
For a ton of reasons.
I don't want to know an email address if I don't need it. There's no reason to store more personal information about a user that makes the database more valuable if someone were to steal it.
Email addresses change. You need to write another route which handles that process, which isn't hard but then you need to decide whether you need to confirm that the user has access to address 1 with another magic URL or if it is sufficient to say they are currently logged in.
Finally it sort of punts the problem to email and says "well I assume and hope your email is secure even if statistically you probably use the same password for both".
How do you fix this?
The problem can be boiled down to 2 basic parts:
I don't want the user to tell me a username, I want a randomly generated username so it further reduces the value of information stored in my database. It also makes it harder to do a random drive-by login attempt.
I don't want to own the password management story. Ideally I want the browser to do this on its side.
In a perfect world I want a response that says "yes we have stored these credentials somewhere under this users control" and I can wash my hands of that until we get into the situation where somehow they've lost access to the sync account (which should hopefully be rare enough that we can just do that in the database).
The annoying thing is this technology already exists.
The Credential Manager API does the things I am talking about. Effectively I would need to add some Javascript to my Registration page:
<script>
document.getElementById('register-form').addEventListener('submit', function(event) {
event.preventDefault(); // Prevent form submission
const username = document.getElementById('username').value;
const password = document.getElementById('password').value;
// Save credentials using Credential Management API
if ('credentials' in navigator) {
const cred = new PasswordCredential({
id: username,
password: password
});
// Store credentials in the browser's password manager
navigator.credentials.store(cred).then(() => {
console.log('Credentials stored successfully');
// Proceed with registration, for example, send credentials to your server
registerUser(username, password);
}).catch(error => {
console.error('Error storing credentials:', error);
});
}
});
function registerUser(username, password) {
// Simulate server registration request
fetch('/register', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ username: username, password: password })
}).then(response => {
if (response.ok) {
console.log('User registered successfully');
// Redirect or show success message
} else {
console.error('Registration failed');
}
});
}
</script>
Then on my login page something like this:
function attemptAutoLogin() {
if ('credentials' in navigator) {
navigator.credentials.get({password: true}).then(cred => {
if (cred) {
// Send the credentials to the server to log in the user
fetch('/login', {
method: 'POST',
body: new URLSearchParams({
'username': cred.id,
'password': cred.password
})
}).then(response => {
// Handle login success or failure
if (response.ok) {
console.log('User logged in');
} else {
console.error('Login failed');
}
});
}
}).catch(error => {
console.error('Error retrieving credentials:', error);
});
}
}
// Call the function when the page loads
document.addEventListener('DOMContentLoaded', attemptAutoLogin);
So great, I assign a random cred.id and cred.password, stick it in the browser and then I sorta wash my hands of it.
We know the password is stored somewhere and can be synced for free
We know the user can pull the password out and put it somewhere else if they want to switch platforms
Browsers handle password migrations for users
The problem with this approach is I don't know if I'm supposed to use it.
I have no idea what this means. Could this go away? In testing it does seem like the performance is all over the place. Firefox seems to have some issues with this, whereas Chrome seems to always nail it. iOS Safari also seems to have some problems. So this isn't seemingly reliable enough to use.
Just please just make this a thing that works everywhere.
Before you yell at me about Math.random I think the following would make a good password:
function generatePassword(length) {
const charset = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
let password = "";
for (let i = 0; i < length; i++) {
const randomIndex = Math.floor(Math.random() * charset.length);
password += charset.charAt(randomIndex);
}
return password;
}
const password = generatePassword(32);
console.log(password);
Alright so I can't get away with just a password, so I have to assume the password is bunk and use it as one element of login. Then I have to use either TOTP or HOTP.
From a user perspective TOTP works as follows:
Set up 2FA for your online account.
Get a QR code.
You scan this QR code with an authenticator app of your choice
Your app will immediately start generating these six-digit tokens.
The website asks you to provide one of these six-digit tokens.
Practically this is pretty straight forward. I add a few extra libraries:
import io
import pyotp
import qrcode
from flask import send_file
I have to generate a secret totp_secret = pyotp.random_base32() which then I have to store in the database. Then I have to generate a QR code to show the user so they can generate the time-based codes.
However the more you look into this, the more complicated it gets.
You actually don't need the token to be 6 digits. It can be up to 10. I don't know why I'd want more or less. Presumably more is better.
The token can be valid for longer than 30 seconds. From reading it seems like that makes the code less reliant on perfect time sync between client and server (great) but also increases the probability of someone stealing the TOTP and using it. That doesn't seem like a super likely attack vector here so I'll make it way longer. But then why don't more services use longer tokens if the only concern then is if someone sees my code? Is this just people being unspeakably annoying?
I need to add some recovery step in case you lose access to the TOTP code.
How do you recover from a TOTP failure?
Effectively I'm back to my original problem. I can either:
Go back to the email workflow I don't want because again I don't want to rely on email as some sort of super-secure bastion and I really don't want to store email addresses.
Or I generate a recovery code and give you those codes which let you bypass the TOTP requirement. That at least lets me be like "this is no longer my fault". I like that.
How do I make a recovery code?
Honest to god I have no idea. As far as I can tell a "recovery code" is just a randomly generated value I hash and stick in the database and then when the user enters it on a form, check the hash. It's just another password. I don't know why all the recovery codes I see are numbers, since it seems to have no relationship to that and would likely work with any string.
Effectively all I need to do with the recovery code is ensure it gets burned once used. Which is fine, but now I'm confused. So I'm generating passwords for the user and then I give the passwords back to the user and tell them to store them somewhere? Why don't I just give them the one good password for the initial login and call it a day? Why is one forbidden and the other is mandatory?
Does HOTP help?
I'm really still not clear how HOTP works. Like I understand the basics:
@app.route('/verify_2fa', methods=['GET', 'POST'])
def verify_2fa():
if request.method == 'POST':
hotp = pyotp.HOTP(user_data['hotp_secret'])
otp = request.form['otp']
if hotp.verify(otp, user_data['counter']):
user_data['counter'] += 1 # Increment the counter after successful verification
return redirect(url_for('index'))
flash('Invalid OTP')
return render_template('verify_2fa.html')
There is a secret per-user and a counter and then I increment the counter every single time the user logs in. As far as I can tell there isn't a forcing mechanism which keeps the client and the server in-sync, so basically you tap a button and generate a password and then if you accidentally tap the button again the two counters are off. It seems like then the server has to decide "are you a reasonable number of times off or an unreasonable amount of counts off". With the PyOTP library I don't see a way for me to control that:
verify(otp: str, counter: int) → bool[source]
Verifies the OTP passed in against the current counter OTP.
Parameters:
otp – the OTP to check against
counter – the OTP HMAC counter
So I mean I could test it against a certain range of counters from the counter I know and then accept it if it falls within that window, but you still are either running a different application or an app on your phone to enter this code. I'm not sure exactly why I would ever use this over TOTP, but it definitely doesn't seem easier to recover from.
So TOTP would work with the recovery code but this seems aggressive to ask a normal people to install a different program on their computer or phone in order to login based on a time-based code which will stop working if the client and server (who have zero way to sync time with each other) drift too far apart. Then I need to give you recovery codes and just sorta hope you have somewhere good to put those.
That said, it is the closest to solving the problem because those are at least normal understandable human problems and it does meet my initial requirement of "the user has one good password". It's also portable and allows administrators to be like "well you fell through the one safety net, account is locked, make a new one".
What is the expected treatment of the TOTP secret?
When I was writing this out I became unsure if I'm allowed to hash this secret. Like in theory I should be able to, because I don't need to recover it. If the user was to go through a TOTP reset flow, then I would probably (presumably) want to generate a new secret in which case there's nothing stopping me from using a strong key derivation function.
None of the tutorials I was able to find seemed to have any opinion on this topic. It seems like using encryption is the SOP, which is fine (it's not sitting on disk as a plain string) but introduces another failure point. It seems odd there isn't a way to negotiate a rotation with a client or really provide any sort of feedback. It meets my initial requirement, but the more I read about TOTP the more surprised I was it hasn't been better thought out.
Things I would love from TOTP/HOTP
Some sort of secret rotation process would be great. It doesn't have to be common, but it would be nice if there was some standard way of informing the client.
Be great if we more clearly explained to people how long the codes should be valid for. Certainly 1 hour is sufficient for consumer-level applications right?
Explain like what would I do if the counters get off with HOTP. Certainly some human error must be accounted for by the designers. People are going to hit the button too many times at some point.
Use Google/Facebook/Apple
I'm not against using these sorts of login buttons except I can't offer just one, I need to offer all of them. I have no idea what login that user is going to have or what make sense for them to use. It also means I need to manage some sort of app registration with each one of these companies for each domain that they can suspend approximately whenever they feel like it because they're giant megacorps.
So now I can't just spin up as many copies of this thing as I want with different URLs and I need to go through and test each one to ensure they work. I also need to come up with some sort of migration path for if one of them disappears and I need to authenticate the users into their existing accounts but using a different source of truth.
Since I cannot think of a way to do that which doesn't involve me basically emailing a magic link to the email address I get sent in the response from your corpo login and then allowing that form to update your user account with a different "real_user_id" I gotta abandon this. It just seems like a tremendous amount of work to not really "solve" the problem but just make the problem someone else's fault if it doesn't work.
Like if a user could previously log into a Facebook account and now no longer can, there's no customer service escalation they can go on. They can effectively go fuck themselves because nobody cares about one user encountering a problem. But that means you would still need some way of being like "you were a Facebook user and now you are a Google user". Or what if the user typically logs in with Google, clicks Facebook instead and now has two accounts? Am I expected to reconcile the two?
It's also important to note that I don't want any permissions and I don't want all the information I get back. I don't want to store email address or real name or anything like that, so again like the OAuth flow is overkill for my usage. I have no intention of requesting permissions on behalf of these users with any of these providers.
Use Passkeys
Me and passkeys don't get along super well, mostly because I think they're insane. I've written a lot about them in the past: https://matduggan.com/passkeys-as-a-tool-for-user-retention/ and I won't dwell on it except to say I don't think passkeys are designed with the first goal being an easy user experience.
But regardless passkeys do solve some of my problems.
Since I'm getting a public key I don't care if my database gets leaked
In theory I don't need an email address for fallback because on some platforms some of the time they sync
If users care a lot about ownership of personal data they can use a password manager sometimes if the password manager knows the right people and idk is friends with the mayor of passkeys or something. I don't really understand how that works, like what qualifies you to store the passkeys.
Mayor of passkeys
My issue with passkeys is I cannot conceive of a even "somewhat ok" fallback plan. So you set it up on an iPhone with a Windows computer at home. You break your iPhone and get an Android. It doesn't seem that crazy of a scenario to me to not have any solution for. Do I need your phone number on top of all of this? I don't want that crap sitting in a database.
Tell the users to buy a cross-platform password manager
Oh ok yeah absolutely normal people care enough about passwords to pay a monthly fee. Thanks for the awesome tip. I think everyone on Earth would agree they'd give up most of the price of a streaming platform full of fun content to pay for a password manager. Maybe I should tell them to spin up a docker container and run bitwarden while we're at it.
Anyway I have a hyper-secure biometric login as step 1 and then what is step 2, as the fallback? An email magic link? Based on what? Do I give you "recovery codes" like I did with TOTP? It seems crazy to layer TOTP on top of passkeys but maybe that...makes some sense as a fallback route? That seems way too secure but also possibly the right answer?
I'm not even trying to be snarky, I just don't understand what would be the acceptable position to take here.
What to do from here
Basically I'm left where I started. Here are my options:
Let the user assign a username and password and hope they let the browser or password manager do it and assume it is a good one.
Use the API in the browser to generate a good username and password and store it, hoping they always use a supported browser and that this API doesn't go away in the future.
Generate a TOTP but then also give them passwords called "recovery codes" and then hope they store those passwords somewhere good.
Use email magic links a lot and hope they remember to change their email address here when they lose access to an old email.
Use passkeys and then add on one of the other recovery systems and sort of hope for the best.
What basic stuff would I need to solve this problem forever:
The browser could tell me if it generated the password or if the user typed the password. If they type the password, force the 2FA flow. If not, don't. Let me tell the user "seriously let the system make the password". 1 good password criteria met.
Have the PasswordCredential API work everywhere all the time and I'll make a random username and password on the client and then we can just be done with this forever.
Passkeys but they live in the browser and sync like a normal password. Passkey lite. Passkey but not for nuclear secrets.
TOTP but if recovery codes are gonna be a requirement can we make it part of the spec? It seems like a made-up concept we sorta tacked on top.
I don't think these are crazy requirements. I just think if we want people to build more stuff and for that stuff to be secure, someone needs to sit down and realistically map out "how does a normal person do this". We need consistent reliable conventions I can build on top of, not weird design patterns we came up with because the initial concept was never tested on normal people before being formalized into a spec.
So for years I've used Docker Compose as my stepping stone to k8s. If the project is small, or mostly for my own consumption OR if the business requirements don't really support the complexity of k8s, I use Compose. It's simple to manage with bash scripts for deployments, not hard to setup on fresh servers with cloud-init and the process of removing a server from a load balancer, pulling the new container, then adding it back in has been bulletproof for teams with limited headcount or services where uptime is less critical than cost control and ease of long-term maintenance. You avoid almost all of the complexity of really "running" a server while being able to scale up to about 20 VMs while still having a reasonable deployment time.
What are you talking about
Sure, so one common issue I hear is "we're a small team, k8s feels like overkill, what else is on the market"? The issue is there are tons and tons of ways to run containers on virtually every cloud platform, but a lot of them are locked to that cloud platform. They're also typically billed at premium pricing because they remove all the elements of "running a server".
That's fine but for small teams buying in too heavily to a vendor solution can be hard to get out of. Maybe they pick wrong and it gets deprecated, etc. So I try to push them towards a more simple stack that is more idiot-proof to manage. It varies by VPS provider but the basic stack looks like the following:
Debian servers setup with cloud-init to run all the updates, reboot, install the container manager of choice.
This also sets up Cloudflare tunnels so we can access the boxes securely and easily. Tailscale also works great/better for this. Avoids needing public IPs for each box.
Add a tag to each one of those servers so we know what it does (redis, app server, database)
Put them into a VPC together so they can communicate
Take the deploy script, have it SSH into the box and run the container update process
Linux updates involve a straightforward process of de-registering, destroying the VM and then starting fresh. Database is a bit more complicated but still doable. It's all easily done in simple scripts that you can tie to github actions if you are so inclined. Docker compose has been the glue that handles the actual launching and restarting of the containers for this sample stack.
When you outgrow this approach, you are big enough that you should have a pretty good idea of where to go now. Since everything is already in containers you haven't been boxed in and can migrate in whatever direction you want.
Why Not Docker
However I'm not thrilled with the current state of Docker as a full product. Even when I've paid for Docker Desktop I found it to be a profoundly underwhelming tool. It's slow, the UI is clunky, there's always an update pending, it's sort of expensive for what people use it for, Windows users seem to hate it. When I've compared Podman vs Docker on servers or my local machines, Podman is faster, seems better designed and just in general as a product is trending in a stronger direction. If I don't like Docker Desktop and prefer Podman Desktop, to me its worth migrating the entire stack over and just dumping Docker as a tool I use. Fewer things to keep track of.
Now the problem is that while podman has sort of a compatibility layer with Docker Compose, it's not a one to one replacement and you want to be careful using it. My testing showed it worked ok for basic examples, but more complex stuff and you started to run into problems. It also seems like work on the project has mostly been abandoned by the core maintainers. You can see it here: https://github.com/containers/podman-compose
I think podman-compose is the right solution for local dev, where you aren't using terribly complex examples and the uptime of the stack matters less. It's hard to replace Compose in this role because its just so straight-forward. As a production deployment tool I would stay away from it. This is important to note because right now the local dev container story often involves running k3 on your laptop. My experience is people loath Kubernetes for local development and will go out of their way to avoid it.
The people I know who are all-in on Podman pushed me towards Quadlet as an alternative which uses systemd to manage the entire stack. That makes a lot of sense to me, because my Linux servers already have systemd and it's already a critical piece of software that (as far as I can remember) works pretty much as expected. So the idea of building on top of that existing framework makes more sense to me than attempting to recreate the somewhat haphazard design of Compose.
Wait I thought this already existed?
Yeah I was also confused. So there was a command, podman-generate-systemd, that I had used previously to run containers with Podman using Systemd. That has been deprecated in favor of Quadlet, which are more powerful and offer more of the Compose functionality, but are also more complex and less magically generated.
So if all you want to do is run a container or pod using Systemd, then you can still usepodman-generate-systemd which in my testing worked fine and did exactly what it says on the box. However if you want to emulate the functionality of Compose with networks and volumes, then you want Quadlet.
What is Quadlet
The name comes from this excellent pun:
What do you get if you squash a Kubernetes kubelet?
A quadlet
Actually laughed out loud at that. Anyway Quadlet is a tool for running Podman containers under Systemd in a declarative way. It has been merged into Podman 4.4 so it now comes in the box with Podman. When you install Podman it registers a systemd-generator that looks for files in the following directories:
You put unit files in the directory you want (creating them if they aren't present which I assume they aren't) with the file extension telling you what you are looking at.
For example, if I wanted a simple volume I would make the following file:
and you should be able to use systemctl status to check all of these running processes. You don't need to run systemctl enable to get them to run on next boot IF you have the [Install] section defined. Also notice that when you are setting the dependencies (requires, after) that it is called name-of-thing.service, not name-of-thing.container or .volume. It threw me off at first but just wanted to call that out.
One thing I want to call out
Containers support AutoUpdate, which means if you just want Podman to pull down the freshest image from your registry that is supported out of the box. It's just AutoUpdate=registry. If you change that to local, Podman will restart when you trigger a new build of that image locally with a deployment. If you need more information about logging into registries with Podman you can find that here.
I find this very helpful for testing environments where I can tell servers to just run podman auto-update and just getting the newest containers. It's also great because it has options to help handle rolling back and failure scenarios, which are rare but can really blow up in your face with containers without k8s. You can see that here.
What if you don't store images somewhere?
So often with smaller apps it doesn't make sense to add a middle layer of build and storage the image in one place and then pull that image vs just building the image on the machine you are deploying to with docker compose up -d --no-deps --build myapp
You can do the same thing with Quadlet build files. The unit files are similar to the ones above but with a .build extension and the documentation is pretty simple to figure out how to convert whatever you are looking at to it.
I found this nice for quick testing so I could easily rsync changes to my test box and trigger a fast rebuild with the container layers mostly getting pulled from cache and only my code changes making a difference.
How do secrets work?
So secrets are supported with Quadlets. Effectively they just build on top of podman secret or secrets in Kubernetes. Assuming you don't want to go the Kubernetes route for this purpose, you have a couple of options.
Make a secret from a local file (probably bad idea): podman secret create my_secret ./secret.txt
Make a secret from an environmental variable on the box (better idea): podman secret create --env=true my_secret MYSECRET
Use stdin: printf <secret> | podman secret create my_secret -
Then you can reference these secrets inside of the .container file with the Secret=name-of-podman-secret and then the options. By default these secrets are mounted to run/secrets/secretname as a file inside of the container. You can configure it to be an environmental variable (along with a bunch of other stuff) with the options outlined here.
Rootless
So my examples above were not rootless containers which are best practice. You can get them to work, but the behavior is more complicated and has problems I wanted to call out. You need to use default.target and not multi-user.target and then also it looks like you do need loginctl enable-linger to allow your user to start the containers without that user being logged in.
Also remember that all of the systemctl commands need the --user argument and that you might need to change your sysctl parameters to allow rootless containers to run on privileged ports.
So for rootless networking Podman previously used slirp4netns and now uses pasta. Pasta doesn't do NAT and instead copies the IP address from your main network interface to the container namespace. Main in this case is defined as whatever interface as the default route. This can cause (obvious) problems with inter-container connections since its all the same IP. You need to configure the containers.conf to get around this problem.
Also ping didn't work for me. You can fix that with the solution here.
That sounds like a giant pain in the ass.
Yeah I know. It's not actually the fault of the Podman team. The way rootless containers work is basically they use user_namespaces to emulate the privileges to create containers. Inside of the UserNS they can do things like mount namespaces and networking. Outgoing connections are tricky because vEth pairs cannot be created across UserNS boundaries without root. Inbound relies on port forwarding.
So tools like slirp and pasta are used since they can translate Ethernet packets to unprivileged socket system calls by making a tap interface available in the namespace. However the end result is you need to account for a lot of potential strangeness in the configuration file. I'm confident this will let less fiddly as time goes on.
Podman also has a tutorial on how to get it set up here: https://github.com/containers/podman/blob/main/docs/tutorials/rootless_tutorial.md which did work for me. If you do the work of rootless containers now you have a much easier security story for the rest of your app, so I do think it ultimately pays off even if it is annoying in the beginning.
Impressions
So as a replacement for Docker Compose on servers, I've really liked Quadlet. I find the logging to be easier to figure out since we're just using the standard systemctl commands, checking status is also easier and more straightforward. Getting the rootless containers running took....more time than I expected because I didn't think about how they wouldn't start by default until the user logged back in without the linger work.
It does stink that this is absolutely not a solution for local-dev for most places. I prefer that Podman remains daemonless and instead hooks into the existing functionality of Systemd but for people not running Linux as their local workstations (most people on Earth) you are either going to need to use the Podman Desktop Kubernetes functionality or use the podman-compose and just be aware that it's not something you should use in actual production deployments.
But if you are looking for something that scales well, runs containers and is super easy to manage and keep running, this has been a giant hit for me.
A lot has been written in the last few weeks about the state of IT security in the aftermath of the CrowdStrike outage. A range of opinions have emerged, ranging from blaming Microsoft for signing the CrowdStrike software (who in turn blame the EU for making them do it) to blaming the companies themselves for allowing all of these machines access to the Internet to receive the automatic template update. Bike-shedding among the technical community continues to be focused on the underlying technical deployment, which misses the forest for the trees.
The better question is what was the forcing mechanism that convinced every corporation in the world that it was a good idea to install software like this on every single machine? Why is there such a cottage industry of companies that are effectively undermining Operating System security with the argument that they are doing more "advanced" security features and allowing (often unqualified) security and IT departments to make fundamental changes to things like TLS encryption and basic OS functionality? How did all these smart people let a random company push updates to everyone on Earth with zero control? The justification often give is "to pass the audit".
These audits and certifications, of which there are many, are a fundamentally broken practice. The intent of the frameworks was good, allowing for the standardization of good cybersecurity practices while not relying on the expertise of an actual cybersecurity expert to validate the results. We can all acknowledge there aren't enough of those people on Earth to actually audit all the places that need to be audited. The issue is the audits don't actually fix real problems, but instead create busywork for people so it looks like they are fixing problems. It lets people cosplay as security experts without needing to actually understand what the stuff is.
I don't come to this analysis lightly. Between HIPAA, PCI, GDPR, ISO27001 and SOC2 I've seen every possible attempt to boil requirements down to a checklist that you can do. Add in the variations on these that large companies like to send out when you are attempting to sell them an enterprise SaaS and it wouldn't surprise me at all to learn that I've spent over 10,000 hours answering and implementing solutions to meet the arbitrary requirements of these documents. I have both produced the hundred page PDFs full of impressive-looking screenshots and diagrams AND received the PDFs full of diagrams and screenshots. I've been on many calls where it is clear neither of us understands what the other is talking about, but we agree that it sounds necessary and good.
I have also been there in the room when inept IT and Security teams use these regulations, or more specifically their interpretation of these regulations, to justify kicking off expensive and unnecessary projects. I've seen laptops crippled due to full filesystem scans looking for leaked AWS credentials and Social Security numbers, even if the employee has nothing to do with that sort of data. I've watched as TLS encryption is broken with proxies so that millions of files can be generated and stored inside of S3 for security teams to never ever look at again. Even I have had to reboot my laptop to apply a non-critical OS update in the middle of an important call. All this inflicted on poor people who had to work up the enthusiasm to even show up to their stupid jobs today.
Why?
Why does this keep happening? How is it that every large company keeps falling into the same trap of repeating the same expensive, bullshit processes?
The actual steps to improve cybersecurity are hard and involve making executives mad. You need to update your software, including planning ahead for end of life technology. Since this dark art is apparently impossible to do and would involve a lot of downtime to patch known-broken shit and reboot it, we won't do that. Better apparently to lose the entire Earths personal data.
Everyone is terrified that there might be a government regulation with actual consequences if they don't have an industry solution to this problem that sounds impressive but has no real punishments. If Comcast executives could go to jail for knowingly running out-of-date Citrix NetScaler software, it would have been fixed. So instead we need impressive-sounding things which can be held up as evidence of compliance that if, ultimately, don't end up preventing leaks the consequences are minor.
Nobody questions the justification of "we need to do x because of our certification". The actual requirements are too boring to read so it becomes this blank check that can be used to roll out nearly anything.
Easier to complete a million nonsense steps than it is to get in contact with someone who understands why the steps are nonsense. The number of times I've turned on silly "security settings" to pass an audit when the settings weren't applicable to how we used the product is almost too high to count.
Most Security teams aren't capable of stopping a dedicated attacker and, in their souls, know that to be true. Especially with large organizations, the number of conceivable attack vectors becomes too painful to even think about. Therefore too much faith is placed in companies like Zscaler and CloudStrike to use "machine learning and AI" (read: magic) to close up all the possible exploits before they happen.
If your IT department works exclusively with Windows and spends their time working with GPOs and Powershell, every problem you hand them will be solved with Windows. If you handed the same problem to a Linux person, you'd get a Linux solution. People just use what they know. So you end up with a one-size-fits-all approach to problems. Like mice in a maze where almost every step is electrified, if Windows loaded up with bullshit is what they are allowed to deploy without hassles that is what you are going to get.
Future
We all know this crap doesn't work and the sooner we can stop pretending it makes a difference, the better. AT&T had every certification on the planet and still didn't take the incredibly basic step of enforcing 2FA on a database of all the most sensitive data it has in the world. If following these stupid checklists and purchasing the required software ended up with more secure platforms, I'd say "well at least there is a payoff". But time after time we see the exact same thing which is an audit is not an adequate replacement for someone who knows what they are doing looking at your stack and asking hard questions about your process. These audits aren't resulting in organizations doing the hard but necessary step of taking downtime to patch critical flaws or even applying basic security settings across all of their platforms.
Because cryptocurrency now allows for hacking groups to demand millions of dollars in payments (thanks crypto!), the financial incentives to cripple critical infrastructure have never been better. At the same time most regulations designed to encourage the right behavior are completely toothless. Asking the tech industry to regulate itself has failed, without question. All that does is generate a lot of pain and suffering for their employees, who most businesses agree are disposable and idiots. All this while doing nothing to secure personal data. Even in organizations that had smart security people asking hard questions, that advice is entirely optional. There is no stick with cybersecurity and businesses, especially now that almost all of them have made giant mistakes.
I don't know what the solution is, but I know this song and dance isn't working. The world would be better off if organizations stopped wasting so much time and money on these vendor solutions and instead stuck to much more basic solutions. Perhaps if we could just start with "have we patched all the critical CVEs in our organization" and "did we remove the shared username and password from the cloud database with millions of call records", then perhaps AFTER all the actual work is done we can have some fun and inject dangerous software into the most critical parts of our employees devices.
It was 4 AM when I first heard the tapping on the glass. I had been working for 30 minutes trying desperately to get everything from the back store room onto the sales floor when I heard a light knocking. Peeking out from the back I saw an old woman wearing sweat pants and a Tweetie bird jacket, oxygen tank in tow, tapping a cane against one of the big front windows. "WE DON'T OPEN UNTIL 5" shouted my boss, who shook her head and resumed stacking boxes. "Black Friday is the worst" she said to nobody as we continued to pile the worthless garbage into neat piles on the store floor.
What people know now but didn't understand then was the items for sale on Black Friday weren't our normal inventory. These were TVs so poorly made they needed time to let their CRT tubes warm up before the image became recognizable. Radios with dials so brittle some came out of the box broken. Finally a mixer that when we tested it in the back let out such a stench of melted plastic we all screamed to turn it off before we burned down the building. I remember thinking as I unloaded it from the truck certainly nobody is gonna want this crap.
Well here they were and when we opened the doors they rushed in with a violence you wouldn't expect from a crowd of mostly senior citizens. One woman pushed me to get at the TVs, which was both unnecessary (I had already hidden one away for myself and put it behind the refrigerators in the back) and not helpful as she couldn't lift the thing on her own. I watched in silence as she tried to get her hands around the box with no holes cut out, presumably a cost savings on Sears part, grunting with effort as the box slowly slid while she held it. At the checkout desk a man told me he was buying the radio "as a Christmas gift for his son". "Alright but no returns ok?" I said keeping a smile on my face.
We had digital cameras the size of shoe-boxes, fire-hazard blenders and an automatic cat watering dish that I just knew was going to break a lot of hearts when Fluffy didn't survive the family trip to Florida. You knew it was quality when the dye from the box rubbed off on your hands when you picked it up. Despite my jokes about worthless junk, people couldn't purchase it fast enough. I saw arguments break out in the aisles and saw Robert, our marine veteran sales guy, whisper "forget this" and leave for a smoke by the loading dock. When I went over to ask if I could help, the man who had possession of the digital camera spun around and told me to "either find another one of these cameras or butt the fuck out". They resumed their argument and I resumed standing by the front telling newcomers that everything they wanted was already gone.
Hours later I was still doing that, informing everyone who walked in that the item they had circled in the newspaper was already sold out. "See, this is such a scam, why don't you stock more of it? It's just a trick to get us into the store". Customer after customer told me variations on the above, including one very kind looking grandfather type informing me I could "go fuck myself" when I wished him a nice holiday.
Beginnings
The store was in my small rural farming town in Ohio, nestled between the computer shop where I got my first job and a carpet store that was almost certainly a money laundering front since nobody ever went in or out. I was interviewed by the owner, a Vietnam veteran who spent probably half our interview talking about his two tours in Vietnam. "We used to throw oil drums in the water and shoot at them from our helicopter, god that was fun. Don't even get me started about all the beautiful local woman." I nodded, unsure what this had to do with me but sensing this was all part of his process. In the years to come I would learn to avoid sitting down in his office, since then you would be trapped listening to stories like these for an hour plus.
After these tales of what honestly sounded like a super fun war full of drugs and joyrides on helicopters, he asked me why I wanted to work at Sears. "It's an American institution and I've always had a lot of respect for it" I said, not sure if he would believe it. He nodded and went on to talk about how Sears build America. "Those kit houses around town, all ordered from Sears. Boy we were something back in the day. Anyway fill out your availability and we'll get you out there helping customers." I had assumed at some point I would get training on the actual products, which never happened in the years I worked there. In the back were dust covered training manuals which I was told I should look at "when I got some time". I obviously never did and still sometimes wonder about what mysteries they contained.
I was given my lanyard and put on the floor, which consisted of half appliances, one quarter electronics and then the rest being tools. Jane, one of the saleswomen told me to "direct all the leads for appliances to her" and not check one out myself, since I didn't get commission. Most of my job consisted of swapping broken Craftsmen tools since they had a lifetime warranty. You filled out a carbon paper form, dropped the broken tool into a giant metal barrel and then handed them a new one. I would also set up deliveries for rider lawnmowers and appliances, working on an ancient IBM POS terminal that required memorizing a series of strange keyboard shortcuts to navigate the calendar.
When there was downtime, I would go into the back and help Todd assemble the appliances and rider lawnmowers. Todd was a special needs student at my high school who was the entirety of our "expert assembly" service. He did a good job, carefully following the manual every time. Whatever sense of superiority as an honor role student I felt disappeared when he watched me try to assemble a rider mower myself. "You need to read the instructions and then do what they say" he would helpfully chime in as I struggled to figure out why the brakes did nothing. His mowers always started on the first try while mine were safety hazards that I felt certain was going to be on the news. "Tonight a Craftsman rider lawnmower killed a family of 4. It was assembled by this idiot." Then just my yearbook photo where I had decided to bleach my hair blonde like a chonky backstreet boy overlaid on top of live footage of blood splattered house siding.
Any feeling I had that people paying us $200 to assemble their rider mowers disappeared when I saw the first one where a customer tried to assemble it. If my mowers were death traps these were actual IEDs whose only conceivable purpose on Earth would be to trick innocent people into thinking they were rider lawnmowers until you turned the key and they blew you into the atmosphere. One guy brought his back with several ziplock bags full of screws bashfully explaining that he tried his best but "there's just no way that's right". That didn't stop me from holding my breath every time someone drove a mower I had worked on up the ramp into the back of the truck. "Please god just don't fall apart right now, wait until they get it home" was my prayer to whatever deity looked after idiots in jobs they shouldn't have.
Sometimes actual adults with real jobs would come in asking me questions about tools, conversations that both of us hated. "I'm looking for a oil filter wrench" they would say, as if this item was something I knew about and could find. "Uh sure, could you describe it?" "It's a wrench, used for changing oil filters, has a loop on it." I'd nod and then feebly offer them up random items until they finally grabbed it themselves. One mechanic when I offered a claw hammer up in response to his request for a cross-pein hammer said "you aren't exactly handy, are you?" I shook my head and went back behind the counter, attempting to establish what little authority I had left with the counter. I might not know anything about the products we sell, but only one of us is allowed back here sir.
Sears Expert
As the months dragged on I was moved from the heavier foot traffic shifts to the night shifts. This was because customers "didn't like talking to me", a piece of feedback I felt was true but still unfair. I had learned a lot, like every incorrect way to assemble a lawn mower and that refrigerators are all the same except for the external panels. Night shifts were mostly getting things ready for the delivery company, a father and son team who were always amusing.
The father was a chain-smoking tough guy who would regularly talk about his "fuck up" of a son. "That idiot dents another oven when we're bringing it in I swear to god I'm going to replace him with one of those Japanese robots I keep seeing on the news." The son was the nicest guy on Earth, really hard working, always on time for deliveries and we got like mountains of positive feedback about him. Old ladies would tear up as they told me about the son hauling their old appliances away in a blizzard on his back. He would just sit there, smile frozen on his face while his father went on and on about how much of a failure he was. "He's just like this sometimes" the son would tell me by the loading dock, even though I would never get involved. "He's actually a nice guy". This was often punctuated by the father running into a minor inconvenience and flying off the handle. "What kind of jackass would sort the paperwork alphabetically instead of by order of delivery?" he'd scream from the parking lot.
When the son went off to college he was replaced by a Hispanic man who took zero shit. His response to customer complaints was always that they were liars and I think the father was afraid of him. "Oh hey don't bother Leo with that, he's not in the mood, I'll call them and work it out" the father would tell me as Leo glared at us from the truck. Leo was incredibly handy though, able to fix almost any dent or scratch in minutes. He popped the dent out of my car door by punching the panel, which is still one of the cooler things I've seen someone do.
Other than the father and son duo, I was mostly alone with a woman named Ruth. She fascinated me because her life was unspeakably bleak. She had been born and raised in this town and had only left the county once in her life, to visit the Sears headquarters in Chicago. She'd talk about it like she had been permitted to visit heaven. "Oh it was something, just a beautiful shiny building full of the smartest people you ever met. Boy I'd love to see it again sometime." She had married her high school boyfriend, had children and now worked here in her 60s as her reward for a life of hard work. She had such bad pain in her knees she had to lean on the stocking cart as she pushed it down the aisles, often stopping to catch her breath. The store would be empty except for the sounds of a wheezing woman and squeaky wheels.
When I would mention Chicago was a 4 hour drive and she could see it again, she'd roll her eyes at me and continue stocking shelves. Ruth was a type of rural person I encountered a lot who seemed to get off on the idea that we were actually isolated from the outside world by a force field. Mention leaving the county to go perhaps to the next county and she would laugh or make a comment about how she wasn't "that kind of person". Every story she would tell had these depressing endings that left me pondering what kind of response she was looking for. "My brother, well he went off to war and when he came back was just a shell of a man. Never really came back if you ask me. Anyway let's clean the counters."
She'd talk endlessly about her grandson, a 12 year old who was "stupid but kind". His incredibly minor infractions were relayed to me like she was telling me about a dark family scandal. "Then I said, who ate all the chips? I knew he had, but he just sat there looking at me and I told him you better wipe those crumbs off your t-shirt smartass and get back to your homework". He finally visited and I was shocked to discover there was also a granddaughter who I had never heard about. He smirked when he met me and told me that Ruth had said I was "a lazy snob".
I'll admit, I was actually a little hurt. Was I a snob compared to Ruth? Absolutely. To be honest with you I'm not entirely sure she was literate. I'd sneak books under the counter to read during the long periods where nothing was happening and she'd often ask me what they were about even if the title sort of explained it. "What is Battle Cry of Freedom: The Civil War Era about? Um well the Civil War." I'd often get called over to "check" documents for her, which typically included anything more complicated than a few sentences. I still enjoyed working with her.
Our relationship never really recovered after I went to Japan when I was 16. I went by myself and wandered around Tokyo, having a great time. When I returned full of stories and pictures of the trip, I could tell she was immediately sick of me. "Who wants to see a place like Japan? Horrible people" she'd tell me as I tried to tell her that things had changed a tiny bit since WWII. "No it's really nice and clean, the food was amazing, let me tell you about these cool trains they have". She wasn't interested and it was clear my getting a passport and leaving the US had changed her opinion of me.
So when her grandson confided that she had called me lazy AND a snob my immediate reaction was to lean over and tell him that she had called him "a stupid idiot". Now she had never actually said "stupid idiot", but in the heat of the moment I went with my gut. Moments after I did that the reality of a 16 year old basically bullying a 12 year old sunk in and I decided it was time for me to go take out some garbage. Ruth of course found out what I said and mentioned it every shift after that. "Saying I called my grandson a stupid idiot, who does that, a rude person that's who, a rude snob" she'd say loud enough for me to hear as the cart very slowly inched down the aisles. I deserved it.
Trouble In Paradise
At a certain point I was allowed back in front of customers and realized with a shock that I had worked there for a few years. The job paid very little, which was fine as I had nothing in the town to actually buy, but enough to keep my lime green Ford Probe full of gas. It shook violently if you exceeded 70 MPH, which I should have asked someone about but never did. I was paired with Jane, the saleswoman who was a devout Republican and liked to make fun of me for being a Democrat. This was during the George W Bush vs Kerry election and she liked to point out how Kerry was a "flipflopper" on things. "He just flips and flops, changes his mind all the time". I'd point out we had vaporized the country of Iraq for no reason and she'd roll her eyes and tell me I'd get it when I was older.
My favorite was when we were working together during Reagan's funeral, an event which elicited no emotion from me but drove her to tears multiple times. "Now that was a man and a president" she'd exclaim to the store while the funeral procession was playing on the 30 TVs. "He won the Cold War you know?" she'd shout at a woman looking for replacement vacuum cleaner bags. Afterwards she asked me what my favorite Reagan memory was. All I could remember was that he had invaded the small nation of Grenada for some reason, so I said that. "Really showed those people not to mess with the US" she responded. I don't think either one of us knew that Grenada is a tiny island nation with a population less than 200,000.
Jane liked to dispense country wisdom, witty one-liners that only sometimes were relevant to the situation at hand. When confronted with an angry customer she would often say afterwards that you "You can't make a silk purse out of a sow's ear" which still means nothing to me. Whatever rural knowledge I was supposed to obtain through osmosis my brain clearly rejected. Jane would send me over to sell televisions since I understood what an HDMI cord was and the difference between SD and HD television.
Selling TVs was perhaps the only thing I did well, that and the fun vacuum demonstration where we would dump a bunch of dirt on a carpet tile and suck it up. Some poor customer would tell me she didn't have the budget for the Dyson and I'd put my hand up to silence her. "You don't have to buy it, just watch it suck up a bunch of pebbles. I don't make commission anyway so who cares." Then we'd both watch as the Dyson would make a horrible screeching noise and suck in a cups worth of small rocks. "That's pretty cool huh?" and the customer would nod, probably terrified of what I would do if she said no.
Graduation
When I graduated high school and prepared to go off to college, I had the chance to say goodbye to everyone before I left. They had obviously already replaced me with another high school student, one that knew things about tools and was better looking. You like to imagine that people will miss you when you leave a job, but everyone knew that wasn't true here. I had been a normal employee who didn't steal and mostly showed up on time.
My last parting piece of wisdom from Ruth was not to let college "make me forget where I came from". Sadly for her I was desperate to do just that, entirely willing to adopt whatever new personality that was presented to me. I'd hated rural life and still do, the spooky dark roads surrounded by corn. Yelling at Amish teens to stop shoplifting during their Rumspringa where they would get dropped off in the middle of town and left to their own devices.
Still I'm grateful that I at least know how to assemble a rider lawnmower, even if it did take a lot of practice runs on customers mowers.