Skip to content

Developers Guide to Moving to Denmark

Developers Guide to Moving to Denmark

I've wanted to write a guide for tech workers looking to leave the US and
move to Denmark for awhile. I made the move over 4 years ago and finally
feel like I can write on the topic with enough detail to answer most
questions.

Denmark gets a lot of press in the US for being a socialist
paradise and is often held up as the standard by which things are
judged. The truth is more complicated, Europe has its own issues that may impact you more or less depending on your background.

Here's the short version: moving to Europe from the US is a significant
improvement in quality of life for most people. There are pitfalls, but
especially if you have children, every aspect of being a parent, from
the amount of time you get to spend with them to the safety and quality
of their schools, is better. If you have never considered it, you
should, even if not Denmark (although I can't help you with how). It
takes a year to do, so even if things seem ok right this second try to
think longer-term.

TL;DR
Reasons to move to Denmark

  • 5 or 6 weeks of vacation a year (with some asterisks)
  • Public very good healthcare
  • Amazing work/life balance, with lots of time for hobbies and activities
  • Great public childcare at really affordable prices
  • Be in a union
  • Summer is amazing weather
  • Safety. Denmark is really safe compared to even a safe US area.
  • Very low stress. You don't worry about retirement or health insurance or childcare or any of the things that you might constantly obsess over.
  • Freedom from religious influence. Denmark has a very good firewall against religious influence in politics.
  • Danes are actually quite friendly. I know, they pretend they're not in media but they are. They won't start a conversation with you but they'd love to chat if you start one.

Reasons not to move to Denmark

  • You are gonna miss important stuff back home. People are gonna die and you won't be there. Weddings, birthdays, promotions and divorces are all still happening and you aren't there. I won't sugarcoat it, you are leaving most of your old life behind.
  • Eating out is a big part of your life; there are restaurants, but they're expensive and for the most part unimpressive. If someone who worked at Noma owns it, then it's probably great, but otherwise often meh.
  • You refuse to bike or take public transit. Owning a car here is possible but very expensive and difficult to park.
  • Lower salary. Tech workers make a lot less here before taxes.
  • Taxes. They're high. All Danes pay 15% but if you earn "top tax" you pay another 15%. Basically with a high salary you get bumped into this tax bracket.
  • Food. High quality ingredients at stores but overall very bland. You'll eat a lot of meals at your work cafeteria and it's healthy but uninspired.
  • Buying lots of items is your primary joy in life. Electronics are more expensive in the EU, Amazon doesn't exist in Denmark so your selection is much more limited.

Leaving

There are certainly no shortages of reasons today why one would consider leaving the US. From mass shootings to a broken political system where the majority is held hostage by the minority rural voter, it's not the best time to live in the US. When Trump was elected, I decided it was time to get out. Trump wasn't the reason I left (although man wouldn't you be a little bit impressed if it were?).

I was tired of everything being so hopeless. Everyone I knew in Chicago was working professional jobs, often working late hours, but nobody was getting ahead. I was lucky that I could afford a house and a moderately priced car, but for so many I knew it just seemed pointless. Grind forever and maybe you would get to buy a condo. All you could do was run the numbers with people and you realized they were never gonna get out.

Everything felt like it was building up to this explosion. I would go back to rural Ohio and people were desperately poor even though the economy had, on paper, been good for years. Flex scheduling at chain retail locations meant you couldn't take another job because your first job might call you at any time. Nobody had health insurance outside of the government-provided "Buckeye card". Friends I knew, people who had been moderates growing up, were carrying AR-15s to Wal-Mart and talking about the upcoming war with liberals. There were confederate flags everywhere in a state that fought on the Union side.

I'm not really qualified to talk about politics, so I won't. The only skill that lends itself to this situation was someone who has watched a lot of complex systems fail. This felt like watching a giant companies infrastructure collapse. Every piece had problems, different problems, that nobody seemed to understand holistically. I couldn't get it out of my head, this fear that I would need to flee quickly and wouldn't be able to. "If you think it's gonna explode you wanna go now before people realize how bad it is" was the thought that ran over and over in my head.

So I'd sit out the "burning to the ground" part. Denmark seemed the perfect place to do it. Often held up as the perfect society with its well-functioning welfare state, low levels of corruption and high political stability. They have a shortage of tech workers and I was one of those. I'd sell my condo, car and all my possessions and wait out the collapse. It wasn't a perfect plan (the US economy is too important to think its collapse wouldn't be felt in Denmark) but it was the best plan I could come up with.

Doing something is better than sitting on Twitter and getting sad all the time was my thought.

Is it a paradise?

No. I think the US media builds Denmark, Norway and Sweden up to unrealistic levels. It's a nice country with decent people and the trains, while DSB doesn't run on time, it does run and exist which is more than I can say for US trains. There are problems and often the problems don't get discussed until you get here. I'll try to give you some top level stuff you should be aware of.

There is a lot of anger towards communities that have immigrated from the Muslim world. These neighborhoods are often officially classified as "ghettos" under a 2018 law:

And the Danish state decides whether areas are deemed ghettoes not just by their crime, unemployment or education rates, but on the proportion of residents who are deemed “non-western” – meaning recent, first-, or second-generation migrants.

You'll sometimes hear this discussed as the "parallel societies" problem. That Denmark is not Danish enough anymore unless steps are taken to break up these neighborhoods and disperse their residents. The solution proposed was to change the terms: The Interior Ministry last week revealed proposed reforms that would remove the word "ghetto" in current legislation and reduce the share of people of "non-Western" origin in social housing to 30% within 10 years. Families removed from these areas would be relocated to other parts of the country.

It's not a problem that stops at one generation either. I've seen Danes whose family immigrated over a generation ago who speak fluent Danish (as they are Danish) be asked "where they come from" at social events multiple times. So even if you are a citizen, speak the language, go through the educational system, you aren't fully integrated by a lot of folks standards. This can be demoralizing for some people.

I also love their health system, but it's not fair to all the workers who maintain it. The medical staff don't get paid enough in Denmark for all the work they do, especially when you compare staff like nurses to nurses in the US. Similarity a lot of critical workers like daycare workers, teachers, etc, are in the same boat. It's not as bad as the US for teachers but there's still definitely a gap there.

Denmark also doesn't apply all these great benefits uniformly. Rural Denmark is pretty poor and has limited access to a lot of these services. It's not West Virginia, but some of the magic of a completely flat fair society disappears when you spend a week in rural Jutland. These towns are peeling paint and junk on the front lawn, just like you would expect in any poor small town. There's still a safety net and they still have a much better time of it than an American in the same situation, but still.

I hope Danish people reading this don't get upset. I'm not saying this to speak ill of your country, but sometimes I see people emotionally crash and burn when they come expecting liberal paradise and encounter many problems which look similar to ones back home. It's important to be realistic about what living here looks like. Denmark has not solved all racial, gender or economic issues in their society. They are still trying to, which is more than I can say for some.

Steps

The next part are all the practical steps I could think of. I'm glad to elaborate on any of them if there is useful information that is missing. If I missed something or you disagree, the easiest way to reach me is on the Fediverse at: [email protected].

Why do I say this is a guide for developers? Because I haven't run this by a large group to fact-check, just developers moving from the US or Canada. So this is true of what we've experienced, but I have no idea how true it is for other professions. Most of it should be applicable to anyone, but maybe not, you have been warned etc etc.

Getting a Visa

The first part of the process is the most time consuming but not very difficult. You need to get a job from an employer looking to sponsor someone for a work visa. Since developers tend towards the higher end of the pay scale, you'll likely qualify for a specific (and easier) visa process.

In terms of job posting sites I like Work in Denmark and Job Index. I used Work in Denmark and it was great. If a job listing is in Danish, don't bother translating it and applying, it means they want a local. Danish CVs are similar to US resumes but often folks include a photo in theirs. It's not a requirement, but I've seen it a fair amount when people are applying to jobs.

It can be a long time before you hear anything, which is just how it works. Even if you seem amazing for a job, my experience with US tech companies was often I'd hear back within a week for an interview. Often with Denmark its 2-3 weeks to get a rejection. Just gotta wait them out.

Where do you want to live

Short answer? As close to Copenhagen as possible. It's the capital city, it has the most resources by a lot and it is the easiest to adjust to IMO. I originally moved to Odense, the third largest city and found it far too small for me. I ended up passing time by hanging out in the Ikea food court because I ran out of things to do, which is as depressing as it sounds.

The biggest cities in Denmark are Copenhagen, Aarhus on the Jutland peninsula and Odense on the island of Fyn sitting between the two. Here's a map that shows you what I'm talking about.

A lot of jobs will contact you that are based in Jutland. I would think long and hard before committing to living in Jutland if you haven't spent a lot of time in Denmark. The further you get from Copenhagen, the more expectation there is that you are fluent in Danish. These cities are also painfully small by US standards.

  • Copenhagen: 1,153,615
  • Aarhus: 237,551
  • Odense: 145,931

Typically jobs in Jutland are more desperate for applicants and are easier to get for foreign workers. If you are looking for a smaller city or maybe even living out in the countryside (which is incredibly beautiful), it's a good option. Just be sure that's what you want to do. You'll want to enroll in Danish classes immediately at a faster rate to get around and do things like "read menus" and "answer the phone".

There are perks to not living in Copenhagen. My wife got to ride horses once a week, which is something she did as a little kid and could do again for a very reasonable $50 a month. I enjoyed the long walks through empty nature around Fyn and the leisurely pace of life for awhile. Just be sure, because these towns are very sleepy and can make you go a bit insane.

Interviews

Danish interviews are a bit different from US ones. Take home assignments and test projects are less common, with most companies comfortable assuming you aren't lying on your resume. They may ask for a GitHub handle just to see if you have anything up there. The pace is pretty relaxed compared to the US, don't expect a lot of live code challenges or random quizzes. You walk through the work you've done and they'll ask follow ups.

Even though the interviews are relaxed, they're surprisingly easy to fail. Danes really don't like bragging or "dominating" the conversation. Make sure you attribute victories to your team, you were part of a group that did all this great work. It's not cheap to move someone to Denmark, so try and express why you want to do it. A lot of foreign workers bounce off of Denmark when they move here, so you are trying to convince them you are worth the work.

After the interview you'll have....another interview. Then another interview. You'll be shocked how often people want to talk to you. This is part of the group consensus thing that is pretty important here. Jobs really want the whole team to be happy with a decision and get a chance to weigh in on who they work with. Managers and bosses have a lot less power than in the US and you see it from the very beginning of the interview.

Remember, keep it light, lots of self-deprecating humor. Danes love that stuff, poking fun at yourself or just injecting some laughter into the interview. They also love to hear how great Denmark is, so drop some of that in too. You'll feel a little weird celebrating their country in a job interview, but I've found it really creates positive feelings among the people you are talking to.

Don't answer the baby question. They can't directly ask you if you are gonna have kids, but often places bringing foreign workers over will dance around the question. "Oh it's just you and your partner? Denmark is a great place for kids." The right answer is no. I gave a sad no and stared off screen for a moment. I don't have any fertility issues, it just seemed an effective way to sell it.

Alright you got the job. Now we start the visa process for real. That was actually the easy part.

Sitting in VFS

This wasn't going to work. That was my thought as I sat in the waiting room of VFS Chicago, a visa application processing company. Think airport waiting area meets DMV. Basically for smaller countries it doesn't make sense for them to pay to staff places with employees to intake immigrants, so they outsource it to this depressing place. I was surrounded by families all carrying heavy binders and all I had was my tiny thin binder.

I watched in horror as a French immigration official told a woman "she was never getting to France" as a binder was closed with authority. Apparently the French staff their counter with actual French people who seem to take some joy in crushing dreams. This woman immediately started to cry and plead that she needed the visa, she had to get back. She had easily 200 pages of some sort of documentation. I looked on in horror as she collapsed sobbing into a seat.

On the flip side I had just watched a doctor get approved in three minutes. He walked in still wearing scrubs, said "I'm here to move to Sweden", they checked his medical credentials and stamped a big "APPROVED" on the document. If you or your spouse is a doctor or nurse, there's apparently nowhere in the EU who won't instantly give you a visa.

My process ended up fine, with some confusion over whether I was trying to move to the Netherlands or Denmark. "You don't want a Dutch visa, correct?" I was asked more than once. They took my photo and fingerprints and we moved on. Then I waited for a long time for a PDF saying "approved". I was a little bit sad they didn't mail me anything.

Work Visa Process

Just because it seems like nobody in either sphere understands how the other works

The specific visa we are trying to get is outlined here. This website is where you do everything. Danish immigration doesn't have phone support and nothing happens with paper. It's all through this website. Basically your employer fills out one part and you fill out the rest. It's pretty straight forward and the form is hard to mess up. But also your workplace has probably done it before and can answer most questions.

This can be weird for Americans where we are still a paper-based society. Important things come with a piece of paper generally. When my daughter was born in a Danish hospital I freaked out because when it was time to discharge her they were like "ok time to go!". "Certainly there's a birth certificate or something that I get about her?" The nurse looked confused and then told me "the church handles all that sort of stuff." She was correct, the church (for some reason) is where we got the document that we provided to the US to get her citizenship.

Almost nothing you'll get in this entire process is on paper. It's all through websites and email. Once you get used to it, it's fine, but I have the natural aversion to important documents existing only in government websites where (in the US) they can disappear with no warning. I recommend backups of everything even though it rarely comes up. The Danish systems mostly just work, or if they break they break for everyone.

IMPORTANT

There is a part of the process that they don't draw particular attention to. You need to get your biometrics taken, which means photo and fingerprints. This process is a giant pain in the ass in the US. You have a very limited time window from when you submit the application to get your biometrics recorded, so check the appointment availability BEFORE you hit submit. The place that offers biometric intake is VFS You have to get it done within 14 days of submitting and there are often no appointments.

Here are the documents you will need over and over:

  • full color copies of your passport including covers
  • the receipt from the application website showing you paid the fee. THIS IS INCREDIBLY IMPORTANT and the website does not tell you how important it is when you pay the fee. That ID number it generates is needed by everything.
  • Employment contract
  • Documentation of education. For me this included basically my resume and jobs I had done as a proxy for not having a computer science degree.

Make a binder and put all this stuff in, with multiple copies. It will save you a ton of work in the long-term. This binder is your life for this entire process. All hail the binder.

Alright you've applied after checking for a biometrics appointment. You paid your fee, sat through the interviews, put in the application. Now you wait for an email. It can take a mysterious amount of time, but you just need to be patient. Hopefully you get the good new email with your new CPR number. Congrats, you are in the Danish system.

Moving

Moving stuff to Denmark is a giant pain in the ass. There are a lot of international moving companies and I hear pretty universally bad things about all of them. You need to think of your possessions in terms of cargo containers. How many cargo containers do you currently have in your house worth of stuff and how much can you get rid of. Our moving company advised us to try and get within a 20 foot cargo container for the best pricing.

It's not a ton of space. We're talking 1,094 cubic feet.

You gotta get everything inside there and ideally you go way smaller. Moving prices can vary wildly between $1000 and $10,000 depending on how much junk you have. You cannot be sentimental here, you want to get rid of everything possible. Don't bring furniture, buy new stuff at Ikea. Forget bringing a car, the cost to register it in Denmark will be roughly what you paid for the car to begin with. Sell the car, sell the furniture, get rid of everything you can.

Check to see if anything with a plug will work. If your device shows an inscription for a range 110V-220V then all you need is a plug adapter. If you only see an inscription for 110V, then you need a transformer that will transform the electricity from 220V to 110V. Otherwise, if you attempt to plug in your device without a transformer, bad things happen. I wouldn't bother bringing anything that won't work with 220V. The plug adapters are cheap, but the transformers aren't.

Stuff you will want to stockpile

This is a pretty good idea of what American stuff you can get. 
  • over the counter medicine, doesn't really exist here outside of Panodil.
  • Pepto, aspirin, melatonin, cold and flu pills, buy a lot of it cause you can't get more
  • Spices and Sauces
  • Cream of tartar
  • Pumpkin pie spice
  • Meatloaf mix
  • Good chili spice mixes or chili spices in general
  • Hot peppers, like a variety of dried peppers especially ones from Mexico are almost impossible to find here
  • Everything bagel seasoning, I just love it
  • Ranch dressing
  • Hot sauces, they're terrible here
  • BBQ sauces, also terrible here
  • Liquid butter for popcorn if that's your thing
  • Taco mix, it's way worse here
  • Foods
  • Cheez-its and Goldfish crackers don't exist
  • Gatorade powder (you can buy it per bottle but its expensive)
  • Tex-mex anything, really Mexican food in general
  • Cereal, American sugar cereal doesn't exist
  • Cooler ranch Doritos
  • Mac and Cheese
  • Good dill pickles (Danish pickles are sweet and gross)
  • Peanut butter - its here but its expensive

You are going to get used to Danish food, I promise, but it's painfully bland at first. There's a transition period and spices can help get you over the hurdle.

Note: If you eat a lot of peppers like jalapeños, it is too expensive to buy them every time. You will want to grow them in your house. This is common among American expats, but be aware if you are used to them being everywhere and cheap.

Medical Records
When you get your yellow card (your health insurance card), you are also assigned a doctor. In order to get your medical records into the Danish system, you need to bring them with you. If you don't have a complicated medical history I think it's fine to skip this step (they'll ask you all the standard questions) but if you have a more complicated health issue you'll want those documents with you. The lead time to get a doctors appointment here in Denmark for a GP isn't long, typically same week for kids and two weeks for adults.

Different people have different experiences with the health system in Denmark, but I want to give you a few high level notes. Typically Danes get a lot less medication than Americans, so don't expect to walk out of the doctors office with a prescription. There is a small fee for medicine, but it's a small fraction of what it costs with insurance in the US. Birth control pills, IUDs and other resources are easy to get and quite affordable (or free).

If you need a specific medication for a disease, try to get as much as you can from the US doctor. The process for getting specific medicine can sometimes be complicated in Denmark, possibly requiring a referral to a specialist and additional testing. You'll want to allocate some time between when you arrive and when you can get a new script. Generally it works but it might take awhile.

Landing

The pets and I waiting for the bus with my stolen luggage cart

My first week was one of the harder weeks I've had in my life. I landed and then immediately had to take off to go grab the dog and cat. The plan was simple: the pets had been flown on a better airline than me. I would grab them and then take the train from the airport to Odense. It's like an hour and a half train ride. Should be simple. I am all jitters when I land but I find the warehouse where the pets were unloaded.

Outside are hundreds of truck drivers and I realize I have made a critical error. People had told me over and over I didn't need to rent a car, which might have been true if I didn't have pets. However the distance between the warehouse and where I needed to be was too long to walk again with animals in crates. The truck drivers are sitting around laughing and drinking energy drinks while I wander around waiting for the warehouse to let me in.

I decide to steal an abandoned luggage cart outside of the UPS building. "I'm bringing it closer to where it should be anyway" is my logic. The drivers find this quite funny, with many jokes being made at my expense. Typically I'd chalk this up to paranoia but they are pointing and laughing at me.  I get the dog and cat, they're not in great shape but they're alive. I give them some water and take off for the bus to the airport.

Loading two crated animals onto a city bus isn't easy in the best of times. Doing it while the cat pee smell coming out of one crate is enough to make your eyes water is another. I have taken over the middle of this bus and people are waving their hands in front of their faces due to the smell. After loading everyone on, I check Google Maps again and feel great. This bus is going to turn around but will take me back to the front of the airport where I want to go.

It does not do that. Instead it takes off to the countryside. After ten minutes of watching the airport disappear far into the background, I get off at the next stop. In front of a long line of older people (tourists?) I get the dog out of the box, throw the giant kennel into a dumpster, zip tie the cat kennel to the top of my suitcase and start off again.

We make it to the train where a conductor is visibly disgusted by the smell. I sit next to the bathroom hoping the smell of public train bathroom would cover it. I attempt to grab a taxi to take me to where I am staying to get set up. No go, there are no taxis. I had not planned for there to be no taxis. On the train I had swapped out the cat pad so the smell was not nearly so intense, but it still wasn't great.

I then walked the kilometers from the train station to where I was staying, sweating the entire time. The dog was desperately trying to escape after the trauma of flying and staying in the SAS animal holding area with race horses and other exotic animals. There were giant slugs on the ground everywhere, something I have since learned is just a Thing in Denmark. We eventually get there and I collapse onto the unmade bed.

What I have with me is what I'm going to need to get set up. There is a multi-month delay between when you land and when your stuff gets there, so for a long time you are starting completely fresh. The next day I start the millions of appointments you need to get set up.

Week 1

Alright you've landed, your stuff is on a boat on its way to you. Typically jobs will either put you up in corporate housing to let you find an apartment or they'll stick you in a hotel. You are gonna be overwhelmed at first, so try to take care of the basics. There is a great outline of all the steps here.

It is a pretty extreme culture shock at first. My first night in Denmark was a disaster. I didn't realize you had to buy the shopping bags and just stole a few by accident. So basically within 24 hours of landing I was already committing crimes. My first meal included an non-alcoholic beer because I assumed Carlsberg Nordic meant "lots of booze" not "no booze".

When you wake up have a plan for what you need to get done that day. It's really tiring, you are gonna be jet-lagged, you aren't used to biking so don't beat yourself up if you only get that one thing done. But you are time limited here so it's important to hit these milestones quickly. You are also going to burn though kind of a lot of cash to get set up. You'll make it up over time, but be aware.

Get a phone plan

You can bring a cellphone from the US and have it work here. Cellphone plans are quite cheap, with a pay as you go sim available for 99 dkk a month with 100 GB of data and 100 hours of talk time. You can get that deal here. If you require an esim, I recommend 3 although it is a bit more. They are here.

Find an apartment
The gold standard for apartment hunting is BoligPortal here. Findboliger was also ok but much lower amounts of inventory. You can get a list of all the good websites here.

These services cost money to you. I'm not exactly sure why (presumably because they can so why not). Just remember to cancel once you find the apartment.

Some tips for apartment hunting

  • Moving into an apartment in Denmark can be jaw droppingly expensive. Landlords are allowed to ask for up to 3 months of rent as a deposit AND 3 months of rent before you move in. You may have to pay 6 months of rent before you get a single paycheck from your new job.
  • You aren't going to get back all that deposit. Danish landlord companies are incredibly predatory in how this works. They will act quite casual when you move in, but come back when you move out and will inspect everything for an hour plus. You need to document all damage before you sign in, same as the US. But mentally you should write off half that deposit.
  • After you have moved in, you have 14 days to fill out a list of defects and send it to your landlord.
  • Don't pay rent in cash. If the landlord says pay in cash it's a scam. Move on.
  • See if you have separate meters in your apartment for water/electric. You want this ideally.
  • Fiber internet is surprisingly common in Denmark. In general they have awesome internet. If this is a priority ask the apartment folks about it. Even if the building you are looking at doesn't have it, chances are there is a building they manage that does.
This doesn't have anything to do with this, I just love this picture

Appliances
Danish washers and dryers are great. Their refrigerators suck so goddamn hard. They're small, for some reason a pool of water often forms at the bottom, the seal needs to be reglued from time to time, stuff freezes if its anywhere near the back wall. I've never seen a good fridge after three tries so just expect it to be crap.

All the normal kitchen appliances are here, but there are distinct tiers of fancy. Grocery stores like Netto often have cheap appliances like toasters, Ikea sells some, but stay away from the electronics stores like Power unless you know you want a fancy one of them. Amazon Germany will ship to Denmark and that's where I got my vacuum and a few other small items.

Due to the cost of eating out in Denmark you are going to be cooking a lot. So get whatever you need to make that process less painful. Here's what I found to be great:

  • Instant Pot: slow cooker and a rice cooker
  • Salad washer: their lettuce is very dirty
  • Hand blender: if you wanna do soups
  • Microwave: I got the cheapest I could find, weirdly no digital controls just a knob you turn. Not sure why
  • Coffee bean grinder: Pre-ground coffee is always bad, Danish stuff is nightmarish bad
  • Hot water kettle: just get one you'll use it all the time
  • Drip coffee maker: again surprisingly hard to find. Amazon Germany for me.
  • Vacuum

Kitchen Tools

  • Almost all stove-tops are induction so expect to have to buy new pots and pans, don't bring non-induction ones from the US
  • Counter space is limited and there is not a ton of kitchen storage in your average Danish apartment so think carefully about anything you might not need or use on a regular basis
  • Magasin will sell you any exotic tools you might want or need and there are plenty of specialist cooking stores around town

Go visit ICS
You can make an appointment here.

They will get you set up with MitID, the digital ID service. This is what you use to log into your bank account, government websites, the works. They'll also get you your yellow card as well as sign you up for your doctor. The process is pretty painless.

Bank

  • pick whichever you want, bring your US passport, Danish yellow card and employment contract
  • it takes forever, so also maybe a book
  • they'll walk you through what you need there but it's pretty straight forward
  • credit card rewards don't exist in Denmark and you don't really need a credit card for anything

If the bank person tells you they need to contact the US, ask to speak to someone else. I'm not sure why some Danish bank employees think this, but there is nobody at the US Department of Treasury they can speak to. It was a bizarre roadblock that left me trying to hunt down who they would be talking to at a giant federal organization. In the end another clerk explained she was wrong and just set me up, but I've heard this issue from other Americans so be aware.

I did enjoy how the woman was like "I'll just call the US" and I thought I am truly baffled at who she might be calling.

First night

Moving In

  • Danish apartments don't come with light fixtures installed. This means your first night is gonna be pretty dark if you aren't prepared. Trust me, I know from having spent my first night sleeping on the floor in the dark because I assumed I would have lights to unpack stuff. You are gonna see these on the wall:

Here's the process to install a light fixture:

  1. Turn off the power
  2. Pop the inner plastic part out with a screwdriver
  3. Put the wire from the light fixture through the hole
  4. Strip the cables from the light fixture like 4 cm
  5. Insert the two leads of your lamp into the N and M1 terminals
  6. If colored, the blue wire goes into N and the brown wire into M1
  7. If not colored it shouldn't matter

Here is a video that walks you through it.

You are gonna wanna do this while the sun is out for obvious reasons so plan ahead.

Buying a Bike

See me wearing jeans? Like a fucking idiot?

Your bike in Denmark is going to be your primary form of transportation. You ride it, rain or shine, everywhere. You'll haul groceries on it, carry Ikea stuff home on it, this thing is going to be a giant part of your life. Buying one is....tricky. You want something like this:

Here's the stuff you want:

  • Mudguards, Denmark rains a lot
  • Kevlar tires. Your bike tires will get popped at the worst possible moments, typically during a massive downpour.
  • Basket. You want a basket on the front and you want them to put it on. Sometimes men get weird about this but this isn't the time for that. Just get the basket.
  • Cargo rack on the back.
  • Wheel lock, the weird circular lock on the back wheel. It's what keeps people from stealing it (kinda). You also need a chain if the bike is new.
  • Lights, ideally permanently mounted lights. They're a legal requirement here for bikes and police do give tickets.
  • If you haven't changed a tube on a bike in awhile, practice it. You'll have to do it on the road sometime.
  • Get a road tool kit.
  • Get a flashlight in this tool kit because the sun sets early in the winter in Denmark and hell is trying to swap a tube in the dark by the light of a cellphone while its raining.
  • If you can get disc brakes, they're less work and last longer
  • Minimum three gears, five if you can.
  • Denmark always has a bike lane. Never ride with traffic.
You need all that

It doesn't have to be that one but it should have everything that one does plus a flashlight

Bike Ownership

  • home insurance covers your bike mostly, but make sure you have that option(and get home insurance)
  • write down the frame number off the bike, it's also on the receipt. You need it for insurance claims
  • You should lubricate the chain every week with daily use and clean the chain at least once a month. A lot of people don't and end up with very broken bikes
  • Danes use hand signals to indicate.

You are expected to use these every time.

  • Danes are very serious about biking. You need to treat it like driving a car. Stay to the right unless you are passing, don't ride together blocking people from passing, move out of the way of people who ring their bells.
  • Never ever walk in a bike lane
  • Wear a helmet
  • Buy rain gear. It rained every morning on my way to work for a month when I first moved here. I got hit in the eye with hail and fell off the bike. You need gear.

Rain Gear

Rain jackets: Regnjakker
Best stuff is: https://www.hellyhansen.com/en_dk/ or McKinley on a budget.

Rain pants: regnbukser
I love the Patagonia rain pants cause they're not just hot rubber pants. Get some with air slots if you can.

You can grab a full set here if you don't want to mix and match: https://www.spejdersport.dk/asivik-rain-regnsaet-dame

Rain boots:
Tretorn is the brand to beat. You can grab that here: https://www.tretorn.dk/ They also sell all the gear you need.

Backpack:
Get a rain cover for the backpack and also get a waterproof backpack. I'm not kidding when I say it rains a lot. Rain covers are everywhere and I used a shopping bag for two months when I kept forgetting mine.

Alright you got your apartment, yellow card, bank account, bike and rain gear. You are ready to start going to work. Get ready for Danish work culture, which is pretty different from US work culture.

Work

Danish work can be a rough adjustment for someone growing up in the American style of work. I'll try to guide you through it. Danes have to work 37 hours a week, but in practice this can be a bit flexible. You'll want to be there at 9 your first day but don't be shocked if you are pretty alone when you get there. Danes often get to work a little later.

You'll want to join your union. You aren't eligible for the unemployment payouts since you are here on a work visa, but the union is still the best place to turn to in Denmark to get information about whether something is allowed or not. They're easy to talk to, with my union I submit an email and get a call the next day. They are also the ones who track what salaries are across the industry and whether you are underpaid. This is critical to salary negotiation and can be an immense amount of leverage when sitting down with your boss or employer.

Just another day biking to work

Seriously, join a union

If you get fired in Denmark, you have the right to get your union in there to negotiate the best possible exit package for you. I have heard a lot of horror stories from foreigners moving to Denmark about not getting paid, about being lied to about what to do if they get hurt on the job, the list goes on and on. This is the group that can help you figure out what is and isn't allowed. They're a bargain at twice the price.

Schedules tend to be pretty relaxed in Denmark as long as you are hitting around that 37. It's socially acceptable to take an hour to run an appointment or take care of something. Lunches are typically short, like 30 minutes, with most workplaces providing food you pay for in a canteen. It's cheaper than bringing lunch and usually pretty good. A lot of Danes are vegetarian or vegan so that shouldn't be a problem.

Titles don't mean anything

This can be tricky for Americans who see "CTO" or "principal engineer" and act really deferential. Danes will give (sometimes harsh) feedback to management pretty often. This is culturally acceptable where management isn't really "above" anyone, it's just another role. You really want to avoid making decisions that impact other people without their approval, or at least the opportunity to give that approval, even in high management positions.

Danish work isn't the same level of competitive as US/China/India

As an American, if you want a high-paying job you need a combination of luck, family background and basically winning a series of increasingly tight competitions. You need to do well in high school and standardized tests to get into an ok university where you need to major in the right thing to make enough money to pay back the money you borrowed to go to the university. You need a job that offers good enough health insurance that you don't declare bankruptcy with every medical issue you encounter.

US Tech interviews are grueling, multi-day affairs involving a phone screen, take home, on-site personality and practical exam AND the job can fire you at any second with zero warning. You have to be consistently providing value on a project the executive level cares about. So it's not even enough to be doing a good job, you have to do a good job on whatever hobby project is hot that quarter.

Danes don't live in that universe. They are competitive people in terms of sports or certain schools, but they don't have the "if I fail I'm going to be in serious physical distress". So things like job titles, which to Americans are "how I tell you how important I am", mean nothing here. Don't try to impress with a long list of your previous titles, just be like "I worked a bunch of places and here's what I did". Always shoot for casual, not panicked and intense.

Cultural Norms

Dress is pretty casual. I've never seen people working in suits and ties outside of a bank or government office. There isn't AC in most places, so dress in the summer for comfort. Typically once a week someone brings in cake and there are beers or sodas provided by the workplace. Friday beer is actually kind of important and you don't want to always skip it. It's one of the big bonding opportunities in Denmark among coworkers.

Many things considered taboo in American workplaces are fine here. You are free to discuss salary and people often will. You are encouraged to join a union, which I did and found to be worthwhile. They'll help with any dispute or provide you with advice if you aren't sure if something is allowed. Saying you need to leave early is totally fine. Coffee and tea are always free but soda isn't and it's not really encouraged at any workplace I've been at in Denmark to consume soda every day.

There are requirements around desk ergonomics which means you can ask for things like a standing desk, ergonomic mouse and keyboard, standing pad, etc. Often workplaces will bring in someone to assess desks and provide recommendations, which can be useful. If you need something ask for it. Typically places will provide it without too much hassle.

Working Late/On-Call

It happens, but a lot less. Typically if you work after-hours or late you would be expected to get that time back later on by leaving early or coming in late. The 37 hours is all hours worked. The rules for on-call are a bit mixed and as far as I know aren't defined in any sort of on-call rules. Just be aware that your boss shouldn't be asking you to work late and unlike the US being on salary doesn't mean that you can be asked to work unlimited hours in a week.

Vacation

Danish summer isn't bad

Danish vacation is mostly awesome. Here's the part that kinda stinks. Some jobs will ask that you use a big chunk of your vacation over a summer holiday, which is two or three weeks the office is closed during May 1 and September 30. Now your boss can require that you use your vacation during this period, which is a disaster for foreigners. The reason being is you don't have anywhere to go, everything is already booked in Denmark during the summer vacation and everything travel related is more expensive.

Plus you'll probably want to spend more of that vacation back home with family. So try to find a job that doesn't mandate when you use your vacation. Otherwise you'll be stuck either flying out at higher prices or doing a lame staycation in your apartment while everyone else flees to their summer houses in Jutland.

Conclusion

Is it worth it? I think so. You'll feel the reduction in stress within six months. For the first time maybe in your entire adult life, you'll have time to explore new hobbies. Wanna try basketweaving or kayaking or horseback riding? There's a club for that. You'll also have the time to try those things. It sounds silly but the ability to just relax during your off-time and not have to do something related to tech at all has had a profound impact on my stress levels.

Some weeks are easier than other. You'll miss home. It'll be sad. But you can push through and adapt if you want to. If I missed something or you need more information please reach out at [email protected] on the Fediverse. Good luck!


Monitoring is a Pain

And we're all doing it wrong (including me)

I have a confession. Despite having been hired multiple times in part due to my experience with monitoring platforms, I have come to hate monitoring. Monitoring and observability tools commit the cardinal sin of tricking people into thinking this is an easy problem. It is very simple to monitor a small application or service. Almost none of those approaches scale.

Instead monitoring becomes an endless series of small failures. Metrics disappeared for awhile, logs got dropped for a few hours, the web UI for traces doesn't work anymore. You set up these tools with the mentality of "set and forget" but they actually require ever increasing amounts of maintenance. Some of the tools break and are never fixed. The number of times I join a company to find an unloved broken Jaeger deployed has been far too many.

It feels like we have more tools than ever to throw at monitoring but we're not making progress. Instead the focus seems to be on increasing the output of applications to increase the revenue of the companies doing the monitoring. Very little seems to be happening around the idea of transmitting fewer logs and metrics over the wire from the client. I'm running more complicated stacks to capture massive amounts of data in order to use it less and less.

Here are the best suggestions I have along with my hopes and dreams. I encourage you to tell me I'm wrong and there are better solutions. It would (actually) make my life much easier so feel free: https://c.im/@matdevdug

Logs

They seem like a good idea right? Small little notes you leave for future you letting you know what is going on. Logs begin, in my experience, as basically "print statements stored to disk". Quickly this doesn't scale as disk space is consumed storing useless information that served a function during testing but now nobody cares about. "Let's use log levels". Alright now we're off to the confusing Olympics.

  1. Log Levels Don't Mean Anything

Syslog Levels

Python Levels

Golang

2. Log formats are all over the place

  • JSON logging - easy to parse, but nested JSON can break parsers and the format is easy to change by developers
  • Windows event log - tons of data, unclear from docs how much of a "standard" it is
  • Common Event Format - good spec (you can read it here) but I've never seen anyone use it outside of network hardware companies.
  • GELF - a really good format designed to work nicely with UDP for logging (which is a requirement of some large companies) that I've never heard of before writing this. You can see it here.
  • Common Log Format - basically Apache logs: 127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
  • Nginx Log Format - log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"';

No consensus on what logs are even for

Traditionally the idea is that you use Debug locally or in your dev environment, info was typically thrown away (you probably don't need to know when an application has done something normal in intense detail) and then you would log everything above info. The problem is, especially with modern microservices with distributed requests, logging is often the only place you can say with any degree of confidence "we know everything that happened inside of the system".

What often happens is that someone will attach some ID header to the request, like a UUID. Then this UUID is returned back to the end consumer and this is how customer service can look through requests and determine "what happened to this customer at this time". So suddenly the logging platform becomes much more than capturing print statements that happen when stuff crashes, it's the primary tool that people use to debug any problems inside of the platform.

This impacts customer service (what customer did what when), it impacts auditing requirements when they require that you keep a record of every interaction. So soon the simple requirement of "please capture and send everything above info" turns into a MUCH bigger project where the log search and capture infrastructure is super mission critical. It's the only way you can work backwards to what specifically happened with any individual user or interaction. Soon this feeds into business analytics where logs because the source of truth for how many requests you got or is this new customer using the platform, etc.

Suddenly your very simple Syslog system isn't sufficient to do this because you cannot have someone SSHing into a box to run a query for customer service. You need some sort of user-friendly interface. Maybe you start with an ELK stack, but running Elasticsearch is actually a giant pain in the ass. You try out SigNoz and that works but now it's a new mission critical piece of infrastructure that often kinda get thrown out there.

Chances are this isn't someones full-time job in your org, they just happened to pick up logging. It's not supposed to be a full-time gig so I totally get it. They installed a few Helm charts, put it behind an OAuth proxy and basically hoped for the best. Instead they get a constant flood of complaints from consumers of the logging system. "Logs are missing, the search doesn't work, my parser doesn't return what I expect".

Logs start to serve as business intelligence source of truth, customer service tool, primary debugging tool, way you know deploys worked, etc. I've seen this pattern at several jobs and often the frailness of this approach is met with a "well it's worked pretty well up to this point".

Not me, I just put it in cloud/SaaS/object storage.

Great, but since you need every log line your costs grow with every incoming customer. That sucks for a ton of reasons, but if your applications are chatty or you just have a lot of requests in a day, it can actually become a serious problem. My experience is companies do not anticipate that the cost of monitoring an application can easily exceed the cost of hosting the application even for simple applications.

Logging always ends up the same way. You eventually either: add some sort of not-log system for user requests that you care about, you stick with the SaaS and then aggressively monitor usage hoping for the best and/or you maintain a full end to end logging infrastructure that writes everything out to a disk you manage.

Logs make sense as a concept but they don't work as an actual tool unless you are willing to basically commit real engineering time every cycle to keeping the logging functional OR you are willing to throw a lot of cash at a provider. On top of that, soon you'll have people writing log parsers to alert on certain situations happening which seems fine, but then the logs become even MORE critical and now you need to enforce logging structure standards or convert old log formats to the new format.

The other problem is logs are such a stupid thing to have to store. 99.9999% of them are never useful, the ones that are look exactly like the rest of them and you end up sticking them in object storage forever at some point where no human being will ever interact with them until the end of time. The number of times I've written some variation on "take terabytes of logs nobody has ever looked at from A and move to B" scripts is too high.  Even worse, the cost of tools like Athena to run a query against a massive bucket means this isn't something where you want developers splunking looking for info.

Suggestions

  • If log messages are the primary way you monitor the entirety of a microservice-based system, you need to sit down and really think that through. What does it cost, how often does it have problems, can you scale it? Can you go without logs being stored?
  • When you have a log that must be stored for compliance or legal reasons, don't stick it into the same system you use to store every 200 - OK line. Write it to a database (ideally) or an object store outside of the logging pipeline. I've used DynamoDB for this and had it work pretty well by sticking it in an SQS pipeline -> Lambda -> Dynamo. Then your internal application can query this and you don't need to worry about log expiration with DynamoDB TTL.
  • If you aren't going to make logging a priority (which I totally respect), then you need to set and enforce a low SLA. An SLA of 99% is 7 hours and 14 minutes down a month. This is primarily a management problem, but it means you need to let the system experience problems to break people of the habit that it is an infinitely reliable source of truth.
  • Your org needs a higher SLA than that? Pay a SaaS and calculate that into the cost of running the app. It's important to set billing labels with external SaaS as per-app as possible. You need to be able to go back to teams and say "your application is costing us too much in observability", not "the business as a whole is spending a lot on observability".
  • Sampling is your friend. OpenTelemetry supports log sampling as an alpha feature here. It supports sampling based on priority which to me is key. You want some percentage of lower-priority logs but ideally as services mature you can continue to tune that down.
  • If you have to write a bunch of regex to parse it start praying to whatever gods you believe in that it's a stable format

Hopes and Dreams

  • Schema validation as a component of collectors for JSON logs. It seems weird that I can't really do this already but it should be possible to globally enforce whether logs are ingested into my system by ensuring they follow a org schema. It'd be great to enforce in the dev environment so people immediately see "hey logs don't show up".
  • Sampled logs being more of a thing. My dream would be to tie them to deployments so I crank the retention to 100% before I deploy, as I deploy and then for some period of time after I deploy. The collector makes an API call to see what is the normal failure rate for this application (how many 2xx, 4xx, 5xx) and then if the application sticks with that breakdown increase the sampling.

I love what GCP does here for flow logs:

Even though Google Cloud doesn't capture every packet, log record captures can be quite large. You can balance your traffic visibility and storage cost needs by adjusting the following aspects of logs collection:
Aggregation interval: Sampled packets for a time interval are aggregated into a single log entry. This time interval can be 5 seconds (default), 30 seconds, 1 minute, 5 minutes, 10 minutes, or 15 minutes.
Sample rate: Before being written to Logging, the number of logs can be sampled to reduce their number. By default, the log entry volume is scaled by 0.5 (50%), which means that half of entries are kept. You can set this from 1.0 (100%, all log entries are kept) to 0.0 (0%, no logs are kept).
Metadata annotations: By default, flow log entries are annotated with metadata information, such as the names of the source and destination VMs or the geographic region of external sources and destinations. Metadata annotations can be turned off, or you can specify only certain annotations, to save storage space.
Filtering: By default, logs are generated for every flow in the subnet. You can set filters so that only logs that match certain criteria are generated.

I want that for everything all the time.

Metrics

Alright logs are crap and the signal to noise ratio is all off. We're gonna use metrics instead. Great! Metrics begin as super easy. Adding Prometheus-compatible metrics to applications is simple with one of the client libraries. You ensure that Prometheus grabs those metrics, typically with some k8s DNS regex or internal zone DNS work. Finally you slap Grafana in front of Prometheus, adding in Login with Google and you are good to go.

Except you aren't, right? Prometheus is really designed to be running on one server. You can scale vertically as you add more metrics and targets, but there's a finite cap on how big you can grow. Plus when there is a Prometheus problem you lose visibility into your entire stack at once. Then you need to start designing for federation. This is where people panic and start to talk about paying someone to do it.

Three Scaling Options

You can either:
1. Adopt a hierarchical federation which is one Prometheus server scraping higher-level metrics from another server. It looks like this:

The complexity jump here cannot be overstated. You go from store everything and let god figure it out to needing to understand both what metrics matter and which ones matter less, how to do aggregations inside of Prometheus and you need to add out-of-band monitoring for all these new services. I've done it, it's doable but it is a pain in the ass.

2. Cross-Service Federation which is less complicated to set up but has its own weirdness. Basically it's normal Prometheus at the bottom and then less cardinality Prometheus reading from them and you point everything at "primary" node for lack of a better term.

This design works but it uses a lot of disk space and you still have the same monitoring problems as before. Plus again it's a big leap in complexity (however in practice I find managing this level of complexity even solo to be doable).

I need an actual end-to-end metrics and alerting platform

Alright so my examples work fine for just short-term metrics. You can scale to basically "the disk size of a machine" which in practice in the cloud is probably fine. However all of this has been for the consumption of metrics as tools for developers. Similar to logs, as metrics get more useful they also get interest outside of the scope of just debugging applications.

You can now track all sorts of things across the stack and compare things like "how successful was a marketing campaign". "Hey we need to know if Big Customer suddenly gets 5xxs on their API integration so we can tell their account manager." "Can you tell us if a customer stops using the platform so we know to reach out to them with a discount code?" These are all requests I've gotten and so many more, at multiple jobs.

I need A Lot of Metrics Forever

I like the Terminator-style font for the ingredients 

So as time goes on inevitably the duration of metrics people want you to keep will increase, as will the cardinality. They want more specific information about not just services, but in many cases customers or specific routes. They'll also want to alert on those routes, store them for (maybe) forever and do all sorts of upstream things with the metrics.

This is where it starts to get Very Complicated.

Cortex

Cortex is a push-service, so basically you push the metrics to it from the Prometheus stack and then process it from there. There are some really nice features with Cortex, including deduplicating incoming samples from redundant Prometheus servers. So you can stand up a bunch of redundant Prometheus, point them at Cortex and then only store one metric. For this to work though you need to add a key-value store, so add another thing to the list. Here are all the new services you are adding:

I've used Cortex once, it's very good but it is a lot of work to run. Between the Prometheus servers you are running and managing this plus writing the configs and monitoring it, it has reached Big Project status. You probably want it running in its own k8s cluster or server group.

Thanos

Similar goals to Cortex, different design. It's a sidecar process that ingests the metrics and moves them using (to me) a more simple modular system. I've only just started to use Thanos but have found it to be a pretty straight-forward system. However it's still a lot to add on top of what started as a pretty simple problem. Of the two though, I'd recommend Thanos just based on ease of getting started. Here are the services you are adding:

  • Sidecar: connects to Prometheus, reads its data for query and/or uploads it to cloud storage.
  • Store Gateway: serves metrics inside of a cloud storage bucket.
  • Compactor: compacts, downsamples and applies retention on the data stored in the cloud storage bucket.
  • Receiver: receives data from Prometheus’s remote write write-ahead log, exposes it, and/or uploads it to cloud storage.
  • Ruler/Rule: evaluates recording and alerting rules against data in Thanos for exposition and/or upload.
  • Querier/Query: implements Prometheus’s v1 API to aggregate data from the underlying components.
  • Query Frontend: implements Prometheus’s v1 API to proxy it to Querier while caching the response and optionally splitting it by queries per day.

This is too complicated I'm gonna go with SaaS

Great but they're expensive. All the same rules as logging apply. You need to carefully monitor ingestion and ensure you aren't capturing high-cardinality metrics for no reason. Sticker shock when you get the first bill is common, so run some estimates and tests before you plug it in.

Suggestions

  • Define a hard limit for retention for metrics from day 1. What you are going to build really differs greatly depending on how long you are gonna keep this stuff. I personally cap the "only Prometheus" design at 30 days of metrics. I know people who go way longer with the federated designs but I find it helps to keep the 30 days as my north star of design.
  • If metrics are going to be your primary observability tool, don't do it in half measures. It's way harder to upgrade just Prometheus once the entire business is relying on it and downtime needs to be communicated up and down the chain. I'd start with either Thanos or Cortex from launch so you have a lot more flexibility if you want to keep a lot of metrics for a long period of time.
  • Outline an acceptable end state. If you are looking at a frightening number of metrics, Cortex is a better tool for sheer volume. I've seen a small group of people who knew it well manage Cortex at 1.6 million metrics a second with all the tools it provides to control and process that much data. However if the goal is less about sheer volume and more about long-term storage and accessibility, I'd go with Thanos.
  • Unlike a lot of folks I think you just need to accept that metrics are going to be something you spend a lot of time working with. I've never seen a completely hands-off system that Just Works at high volume without insane costs. You need to monitor them, change their ingestion, tinker with the configuration, go back, it's time consuming.

Tracing

Logs are good for knowing exactly what happened but have a bad signal to noise ratio. Metrics are great for knowing what happened but can't work for infinite cardinality. Enter tracing, the hot new thing from 5 years ago. Traces solve a lot of the problems from before allowing for tremendous amounts of data to be collected about requests through your stack. In addition it allows for amazing platform-agnostic monitoring. You can follow the request from your app to your load balancer to backend and microservices and back.

Now the real advantage of tracing to me is it comes out of the box with the idea of sampling. It is a debugging and troubleshooting tool, not something with compliance or business uses. Therefore it hasn't become completely mucked up with people jamming all sorts of weird requirements in there over time. You can very safely sample because its only for developer troubleshooting.

I'll be honest with you. My experience with setting tracing up has been "use SaaS and configure tracing". The one I've used the most and had the best experience with is Cloud Trace. It was easy to implement, controlling pricing was pretty straight-forward and I liked the troubleshooting element

The problem with me and tracing is nobody uses it. When I monitor the teams usage of traces, it is always a small fraction of the development team that ever logs in to use them. I don't know why the tool hasn't gotten more popularity among developers. It's possible folks are more comfortable with metrics and logs or perhaps they don't see the value (or maybe they feel like they know where the time-consuming services are in their stack so they just need the round-trip time off the load balancer). So far I haven't seen tracing "done right".

Hopefully that'll change some day.

Conclusion

Maybe I'm wrong about monitoring. Maybe everyone else is having a great time with it and I'm the one struggling. My experience has been monitoring is an unloved internal service of the worst kind. It requires a lot of work, costs a lot of money and never makes the company any money.


The point of AI chat is selling ads

It's all advertising, all the way down.

I like that the robot is an asshole about it AND brings its own wrench from home

The Robots are Coming

My entire life automation has been presented as a threat. It is hard to measure how often business has threatened this to keep wages down and keep workers increasing productivity. While the mechanism of threatened automation changes over time (factory line robots, computers, AI) the basic message remains the same. If you demand anything more from work at any time, we'll replace you.

The reason this never happens is automation is hard and requires intense organizational precision. You can't buy a factory robot and then decide to arbitrarily change things on the product. Human cashiers can deal with a much wider range of situations vs a robotic cashier. If an organization wants to automate everything, it would need to have a structure capable of detailing what it wanted to happen at every step. Along with leadership informed enough about how their product works to account for every edge case.

Is this possible? Absolutely, in fact we see it with call center decision trees, customer support flows and chat bots. Does it work? Define work! Does it reduce the amount of human workers you need giving unhelpful answers to questions? Yes. Are your users happy? No but that's not a metric we care about anymore.

Let us put aside the narrative that AI is coming for your job for a minute. Why are companies so interested in this technology that they're willing to pour billions into it? The appeal I think is a delivery system of a conversation vs serving you up a bunch of results. You see advertising in search results. Users are now used to scrolling down until the ads are gone (or blocking them when possible).

With AI bots you have users interact with data only through a service controlled by one company. The opportunity for selling ads to those users is immense. There already exists advertising marketplaces for companies to bid on spots to users depending on a wide range of criteria. If you are the company that controls all those pieces you can now run ads inside of the answer itself.

There is also the reality that AI is going to destroy web searching and social media. If these systems can replicate normal human text enough that a casual read cannot detect them and generate images on demand good enough that it takes detailed examination to determine that they're fake, conventional social media and web search cannot survive. Any algorithm can be instantly gamed, people can be endlessly impersonated or just overwhelmed with fake users posting real sounding opinions and objections.

So now we're in an arms race. The winner gets to be the exclusive source of truth for users and do whatever they want to monetize that position. The losers stop being relevant within a few years and joins the hall of dead and dying tech companies.

Scenario 1 - Buying a Car Seat

Meet Todd. He works a normal job, with the AI chatbot installed on his Android phone. He haven't opted out of GAID, so his unique ID is tracked across all of your applications. Advertising networks know he lives in the city of Baltimore and has a pretty good idea of his income, both from location information and the phone model information they get. Todd uses Chrome with the Topics API enabled and rolled out.

Already off the bat we know a lot about Todd. Based on the initial spec sheet for the taxonomy of topics (which is not a final draft/could change/etc etc) available from here: https://github.com/patcg-individual-drafts/topics, there's a ton of information we can get about Todd. You can download the IAB Tech Lab list of topics here: https://iabtechlab.com/wp-content/uploads/2023/03/IABTL-Audience-Taxonomy-1.1-Final-3.xlsx

Let's say Todd is in the following:

Demographic | Age Range | 30-34 |

Demographic | Education & Occupation | Undergraduate Education |

Demographic | Education & Occupation | Skilled/Manual Work |

Demographic | Education & Occupation | Full-Time |

Demographic | Household Data | $40000 - $49999 |

Demographic | Household Data | Adults (no children)

Demographic | Household Data | Median Home Value (USD) |

Demographic | Household Data | $200,000-$299,999 |

Demographic | Household Data | Monthly Housing Payment (USD) |

Demographic | Household Data | $1,000-$1,499 |

Interest | Automotive | Classic Cars |

That's pretty precise data about Todd. We can answer a lot of questions about him, what he does, where he lives, what kind of house he has and what kinds of advertising would speak to him. Now let's say we know all that already and can combine that information with a new topic which is:

Interest | Family and Relationships | Parenting |

Todd opens his chat AI app and starts to ask questions about what is the best car seat. Anyone who has ever done this search in real life knows Google search results are jammed-packed full of SEO spam, so you end up needing to do "best car seat reddit" or "best car seat wirecutter". Todd doesn't know that trick, so instead he turns to his good friend the AI. When the AI gets that query, it can route the request to the auction system to decide "who is going to get returned as an answer".

Is this nefarious? Only if you consider advertising on the web nefarious. This is mostly a more efficient way of doing the same thing other advertising is trying to do, but with a hyper-focus that other systems lack.

Auction System

The existing ad auction system is actually pretty well equipped to do this. The AI parses the question, determines what keywords apply to this question and then see who is bidding for those keywords. Depending on the information Google knows about the user (a ton of information), it can adjust the Ad Rank of different ads to serve up the response that is most relevant to that specific user. So Todd won't get a response for a $5000 car seat that is a big seller in the Bay Area because he doesn't make enough money to reasonably consider a purchase like that.

Instead Todd gets a response back from the bot steering him towards a cheaper model. He assumes the bot has considered the safety, user scores and any possible recalls when doing this calculation, but it didn't. It offered up the most relevant advertising response to his question with a link to buy the product in question. Google is paid for this response at likely a much higher rate than their existing advertising structure since it is so personalized and companies are more committed than ever to expanding their advertising buy with Google.

Since the bot doesn't show sources when it returns an answer, just the text of the answer, he cannot do any further research without going back to search. There is no safety check for this data since Amazon reviews are also broken. Another bot might return a different answer but how do you compare?

Unless Todd wants to wander the neighborhood asking people what they bought, this response is a likely winner. Even if the bot discloses that the link is a sponsored link, which presumably it will have to do, it doesn't change the effect of the approach.

Scenario 2 - Mary is Voting

Mary is standing in line waiting to vote. She know who she wants to vote for in big races, but the ballot is going to have a lot of smaller candidates on there as well. She's a pretty well-informed person but even she doesn't know where the local sheriff stands on the issues or who is a better judge over someone else. But she has some time before she gets to vote, so she asks the AI who is running for sheriff and information about them.

Mary uses an iPhone, so it hides her IP from the AI. She has also declined ATT, so the amount of information we know about her is pretty limited. We have some geoIP data off the private relay IP address. Yet we don't need that much information to do what we want to do.

Let's assume these companies aren't going to be cartoonishly evil for a minute and place some ethical guidelines on responses. If she were to ask "who is the better candidate for sheriff", we would assume the bot would return a list of candidates and information about them. Yet we can still follow that ethical guideline and have an opportunity to make a lot of money.

One of the candidates for sheriff recently had an embarrassing scandal. They're the front-runner candidate and will likely win as long as enough voters don't hear about this terrible thing he did. How much could an advertising company charge to not mention it? It's not a lie, you are still answering the question but you leave out some context. You could charge a tremendous amount for this service and still be (somewhat) ok. You might not even have to disclose it.

You already see this with conservative and liberal bent news in the US, so there is an established pattern. Instead of the bent being one way or the other, adjust the weights based on who pays more. It doesn't even need to be that blatant. You can even have the AI answer the question if asked "what is the recent scandal with candidate for sheriff x". The omission appears accidental.

Mary gets the list of candidates and reviews their stances on positions important to her. Everything she interacted with looked legitimate and data-driven with detailed answers to questions. It didn't mention the recent scandal so she proceeds to act as if it had never happened.

The ability to omit information the company wants to omit from surfacing to users at all in a world where the majority of people consume information from their phones after searching for it is massive. Even if the company has no particular interest in doing so for its own benefit, the ability to offer it or to tilt the scales is so powerful that it is hard to ignore.

The value of AI to advertising is the perception of its intelligence

What we are doing right now is publishing as many articles and media pieces as we can claiming how intelligent AI is. It can pass the bar exam, it can pass certain medical exams, it can even interpret medical results. This is creating the perception among people that this system is highly intelligent. The assumption people make is that this intelligence will be used to replace existing workers in those fields.

While that might happen, Google is primarily an ad company. They have YouTube ads which account for 10.2% of revenue, Google Network ads for 11.4%, and ads from Google Search & other properties for 57.2%. Meta is even more one-dimensional with 97.5% of its revenue coming from advertising. None of these companies are going to turn down opportunities to deploy their AI systems into workplaces, but that's slow growth businesses. It'll take years to convince hospitals to let them have their AI review the result, work through the regulatory problems of doing so, having the results peer-checked, etc.

Instead there's simpler, lower-hanging fruit we're all missing. By funneling users away from different websites where they do the data analysis themselves and towards the AI "answer", you can directly target users with high-cost advertising that will have a higher ROI than any conventional system. Users will be convinced they are receiving unbiased data-based answers while these companies will be able to use their control of side systems like phone OS, browser and analytics to enrich the data they know about the user.

That's the gold-rush element of AI. Whoever can establish their platform as the ones that users see as intelligent first and get it installed on phones will win. Once established it's going to be difficult to convince users to double-check answers across different bots. The winner will be able to grab the gold ring of advertising. A personalized recommendation from a trusted voice.

If this obvious approach occurred to me, I assume it's old news for people inside of these respective teams. Even if regulators "cracked down" we know the time delay between launching the technology and regulation of that technology is measured in years, not months. That's still enough time to generate the kind of insane year over year growth demanded by investors.

I'll always double-check the results

That presupposes you can. The ability to detect whether content is generated by an AI is extremely bad right now. There's no reason to think that it will get better quickly. So you will be alone cruising the internet looking for trusted sources on topics with search results that are going to be increasingly jam packed full of SEO-optimized junk text.

Will there be websites you can trust? Of course, you'll still be able to read the news. But even news sites are going to start adopting this technology (on top of many now being owned by politically-motivated owners). In a sea of noise, it's going to become harder and harder to figure out what is real and what is fake. These AI bots are going to be able to deliver concise answers without dealing with the noise.

Firehose of Falsehoods

According to a 2016 RAND Corporation study, the firehose of falsehood model has four distinguishing factors: it (1) is high-volume and multichannel, (2) is rapid, continuous, and repetitive, (3) lacks a commitment to objective reality; and (4) lacks commitment to consistency.[1] The high volume of messages, the use of multiple channels, and the use of internet bots and fake accounts are effective because people are more likely to believe a story when it appears to have been reported by multiple sources.[1] In addition to the recognizably-Russian news source, RT, for example, Russia disseminates propaganda using dozens of proxy websites, whose connection to RT is "disguised or downplayed."[8] People are also more likely to believe a story when they think many others believe it, especially if those others belong to a group with which they identify. Thus, an army of trolls can influence a person's opinion by creating the false impression that a majority of that person's neighbors support a given view.[1]

I think you are going to see this technique everywhere. The lower cost of flooding conventional information channels with fake messages, even obviously fake ones, is going to drown out real sources. People will need to turn to this automation just to be able to get quick answers to simple questions. By destroying the entire functionality of search and the internet, these tools will be positioned to be the only source of truth.

The amount of work you will need to do in order to find primary-source independent information about a particular topic, especially a controversial topic, is going to be so high that it will simply exceed the capacity of your average person. So while some simply live with the endless barrage of garbage information, others use AI bots to return relevant results.

That's the value. Tech companies won't have to compete with each other, or with the open internet or start-up social media websites. If you want your message to reach its intended audience, this will be the only way to do it in a sea of fake. That's the point and why these companies are going to throw every resource they have at this problem. Whoever wins will be able to exclude the others for long enough to make them functionally irrelevant.

Think I'm wrong? Tell me why on Mastodon: https://c.im/@matdevdug


MRSK Review

I, like the entire internet, has enjoyed watching the journey of 37Signals from cloud to managed datacenter. For those unfamiliar, it's worth a read here. This has spawned endless debates about whether the cloud is worth it or should we all be buying hardware again, which is always fun. I enjoy having the same debates every 5 years just like every person who works in tech. However mentioned in their migration documentation was a reference to an internal tool called "MRSK" which they used to manage their infrastructure. You can find their site for it here.

When I read this, my immediate thought was "oh god no". I have complicated emotions about creating custom in-house tooling unless it directly benefits your customers (which can include internal customers) enough that the inevitable burden of maintenance over the years is worth it. It's often easier to yeet out software than it is to keep it running and design around its limitations, especially in the deployment space. My fear is often this software is the baby of one engineer, adopted by other teams, that engineer leaves and now the entire business is on a custom stack nobody can hire for.

All that said, 37Signals has open-sourced MRSK and I tried it out. It was better than expected (clearly someone has put love into it) and the underlying concepts work. However if the argument is that this is an alternative to a cloud provider, I would expect to hit fewer sharper edges. This reeks of internal tool made by a few passionate people who assumed nobody would run it any differently than they do. Currently its hard to recommend to anyone outside of maybe "single developers who work with no one else and don't mind running into all the sharp corners".

How it works

The process to run it is pretty simple. Set up a server wherever (I'll use digital ocean) and configure it to start with an SSH key. You need to select Ubuntu (which is a tiny bummer and would have preferred Debian but whatever) and then you are off to the races.

Then select a public SSH key you already have in the account.

Setting up MRSK

On your computer run gem install mrsk if you have ruby or alias mrsk='docker run --rm -it -v $HOME/.ssh:/root/.ssh -v /var/run/docker.sock:/var/run/docker.sock -v ${PWD}/:/workdir  ghcr.io/mrsked/mrsk' if you want to do it as a Docker container. I did the second option, sticking that line in my .zshrc file.

Once installed you run mrsk init which generates all you need.

The following is the configuration file that is generated and gives you an idea of how this all works.

# Name of your application. Used to uniquely configure containers.
service: my-app

# Name of the container image.
image: user/my-app

# Deploy to these servers.
servers:
  - 192.168.0.1

# Credentials for your image host.
registry:
  # Specify the registry server, if you're not using Docker Hub
  # server: registry.digitalocean.com / ghcr.io / ...
  username: my-user

  # Always use an access token rather than real password when possible.
  password:
    - MRSK_REGISTRY_PASSWORD

# Inject ENV variables into containers (secrets come from .env).
# env:
#   clear:
#     DB_HOST: 192.168.0.2
#   secret:
#     - RAILS_MASTER_KEY

# Call a broadcast command on deploys.
# audit_broadcast_cmd:
#   bin/broadcast_to_bc

# Use a different ssh user than root
# ssh:
#   user: app

# Configure builder setup.
# builder:
#   args:
#     RUBY_VERSION: 3.2.0
#   secrets:
#     - GITHUB_TOKEN
#   remote:
#     arch: amd64
#     host: ssh://[email protected]

# Use accessory services (secrets come from .env).
# accessories:
#   db:
#     image: mysql:8.0
#     host: 192.168.0.2
#     port: 3306
#     env:
#       clear:
#         MYSQL_ROOT_HOST: '%'
#       secret:
#         - MYSQL_ROOT_PASSWORD
#     files:
#       - config/mysql/production.cnf:/etc/mysql/my.cnf
#       - db/production.sql.erb:/docker-entrypoint-initdb.d/setup.sql
#     directories:
#       - data:/var/lib/mysql
#   redis:
#     image: redis:7.0
#     host: 192.168.0.2
#     port: 6379
#     directories:
#       - data:/data

# Configure custom arguments for Traefik
# traefik:
#   args:
#     accesslog: true
#     accesslog.format: json

# Configure a custom healthcheck (default is /up on port 3000)
# healthcheck:
#   path: /healthz
#   port: 4000

Good to go?

Well not 100%. On first run I get this:

❯ mrsk deploy
Acquiring the deploy lock
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
  ERROR (RuntimeError): Can't use commit hash as version, no git repository found in /workdir

Apparently the directory you work in needs to be a git repo. Fine, easy fix. Then I got a perplexing SSH error.

❯ mrsk deploy
Acquiring the deploy lock
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
  INFO [39265e18] Running /usr/bin/env mkdir mrsk_lock && echo "TG9ja2VkIGJ5OiAgYXQgMjAyMy0wNS0wOVQwOToyNzoxNloKVmVyc2lvbjog
SEVBRApNZXNzYWdlOiBBdXRvbWF0aWMgZGVwbG95IGxvY2s=
" > mrsk_lock/details on 206.81.22.60
  ERROR (Net::SSH::AuthenticationFailed): Authentication failed for user [email protected]

❯ ssh [email protected]
Welcome to Ubuntu 22.10 (GNU/Linux 5.19.0-23-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Tue May  9 09:26:40 UTC 2023

  System load:  0.0               Users logged in:       0
  Usage of /:   6.7% of 24.06GB   IPv4 address for eth0: 206.81.22.60
  Memory usage: 19%               IPv4 address for eth0: 10.19.0.5
  Swap usage:   0%                IPv4 address for eth1: 10.114.0.2
  Processes:    98

0 updates can be applied immediately.

New release '23.04' available.
Run 'do-release-upgrade' to upgrade to it.


Last login: Tue May  9 09:26:41 2023 from 188.177.18.83
root@ubuntu-s-1vcpu-1gb-fra1-01:~#

So Ruby SSH Authentication failed even though I had the host configured in the SSH config and the standard SSH login worked without issue. Then a bad thought occurs to me. "Does it care....what the key is called? Nobody would make a tool that relies on SSH and assume it's id_rsa right?"

❯ mrsk deploy
Acquiring the deploy lock
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
  INFO [6c25e218] Running /usr/bin/env mkdir mrsk_lock && echo "TG9ja2VkIGJ5OiAgYXQgMjAyMy0wNS0wOVQwOTo1Mjo0NloKVmVyc2lvbjog
SEVBRApNZXNzYWdlOiBBdXRvbWF0aWMgZGVwbG95IGxvY2s=
" > mrsk_lock/details on 142.93.110.241
Enter passphrase for /root/.ssh/id_rsa:
Booooooh

Moving past the bad SSH

Then I get this error:

❯ mrsk deploy
Acquiring the deploy lock
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
  INFO [3b53d161] Running /usr/bin/env mkdir mrsk_lock && echo "TG9ja2VkIGJ5OiAgYXQgMjAyMy0wNS0wOVQwOTo1ODoyOVoKVmVyc2lvbjog
SEVBRApNZXNzYWdlOiBBdXRvbWF0aWMgZGVwbG95IGxvY2s=
" > mrsk_lock/details on 142.93.110.241
Enter passphrase for /root/.ssh/id_rsa:
  INFO [3b53d161] Finished in 6.094 seconds with exit status 0 (successful).
Log into image registry...
  INFO [2522df8b] Running docker login -u [REDACTED] -p [REDACTED] on localhost
  INFO [2522df8b] Finished in 1.209 seconds with exit status 0 (successful).
  INFO [2e872232] Running docker login -u [REDACTED] -p [REDACTED] on 142.93.110.241
  Finished all in 1.3 seconds
Releasing the deploy lock
  INFO [2264c2db] Running /usr/bin/env rm mrsk_lock/details && rm -r mrsk_lock on 142.93.110.241
  INFO [2264c2db] Finished in 0.064 seconds with exit status 0 (successful).
  ERROR (SSHKit::Command::Failed): docker exit status: 127
docker stdout: Nothing written
docker stderr: bash: line 1: docker: command not found

docker command not found? I thought MRSK set it up.

From the GitHub:

This will:

    Connect to the servers over SSH (using root by default, authenticated by your ssh key)
    Install Docker on any server that might be missing it (using apt-get): root access is needed via ssh for this.
    Log into the registry both locally and remotely
    Build the image using the standard Dockerfile in the root of the application.
    Push the image to the registry.
    Pull the image from the registry onto the servers.
    Ensure Traefik is running and accepting traffic on port 80.
    Ensure your app responds with 200 OK to GET /up.
    Start a new container with the version of the app that matches the current git version hash.
    Stop the old container running the previous version of the app.
    Prune unused images and stopped containers to ensure servers don't fill up.

However:

root@ubuntu-s-1vcpu-1gb-fra1-01:~# which docker
root@ubuntu-s-1vcpu-1gb-fra1-01:~#

Fine I guess I'll install Docker. Not feeling like this is saving a lot of time vs rsyncing a Docker Compose file over.

sudo apt update
sudo apt upgrade -y
sudo apt install -y docker.io curl git
sudo usermod -a -G docker ubuntu

Now we have Docker on the machine.

Did it work after that?

Yeah so my basic Flask app needed to have a new route added to it, but once I saw that you need to configure a route at /up  and did that, worked fine. The traffic is successfully paused during deployment and rerun once the application is healthy again. Overall once I got it running it worked much as intended.

I also tried accessories, which is their term for necessary internal services like mysql. These are more like standard Docker compose commands but they're nice to be able to include. Again, it feels a little retro to say "please install mysql on the mysql box" and just hope that box doesn't go down, but it's totally serviceable. I didn't encounter anything interesting with the accessory testing.

Impressions

MRSK is an interesting tool. I think, if the community adopts it and irons out the edge cases, it'll be a good building-block technology for people not interested in running infrastructure. Comparing it to Kubernetes is madness, in the same way I wouldn't compare a go-kart I made in my garage to semi-truck.

MRSK

Kubernetes

That isn't to hate on MRSK, I think it's a good idea to solve for people with less complicated concerns. However part of the reasons more complicated tools are complicated is because they cover more edgecases and automate more failure scenarios. MRSK doesn't cover for those, so it gets to be more simple, but as you grow more of those concerns shift back to you.

It's the difference between managing 5 hosts with Ansible and 1500. 5 is easy and scales well, 1500 becomes a nightmare. MRSK in its current state should be seen as a bridge technology unless your team expends the effort to customize it for your workflow and add in the gaps in monitoring.

If it were me and I was starting a company today, I'd probably invest the effort in something like GKE Autopilot where GCP manages almost all the node elements and I worry exclusively about what my app is doing. But I have a background in k8s so I understand I'm an edge case. If you are looking to start a company or a project and want to keep it strictly cloud-agnostic, MRSK does do it.

What I would love to see added to MRSK to work-proof it more:

  • Adding support for 1Password/secret manager for the SSH key component so it isn't a key on your local machine
  • Adding support for multiple users with different keys on the box managed inside of some secret configuration so you can tell what user did what deployment and rotation of keys is part of deployment as needed (you can set a user per config file but that isn't really granular enough to scale)
  • Fixing the issue where the ssh_config doesn't seem to be respected
  • Providing an example project in the documentation of what exactly you need to hit msrk deploy and have a functional project up and running
  • Let folks know that having the configuration file inside of a git repo is a requirement
  • Ideally integrating some concept of autoscaling group into the configuration with some lookup concept back to the config file (which you can do with an template but would be nice to build in)
  • Do these servers update themselves? What happens if Docker crashes? Can I pass resource limits to the service and not just accessories? A lot of missing pieces there.
  • mrsk details is a great way to quickly see the health status, but you obviously need to do more to monitor whether your app is functional or not. That's more on you than the MRSK team.

Should you use MRSK today

If you are a single developer who runs a web application, ideally a rails application, and you are provisioning your servers one by one with Terraform or whatever, where static IP addresses (internal or external) are something you can get and don't change often, this is a good tool for you. I wouldn't recommend using the accessories functionality, I think you'll probably want to use a hosted database service if possible. However it did work, so I mean just consider how critical uptime is to you when you roll this out.

However if you are on a team, I don't know if I can recommend this at the current juncture. Certainly not run from a laptop. If you integrate this into a CI/CD system where the users don't have access to the SSH key and you can lock that down such that it stops being a problem, it's more workable. However as (seemingly) envisioned this tool doesn't really scale to multiple employees unless you have another system swapping the deployment root SSH key at a regular interval and distributing that to end users.

You also need to do a lot of work around upgrades, health monitoring of the actual VMs, writing some sort of replacement system if the VM dies and you need to put another one in its place. What is the feedback loop back to this static config file to populate IP addresses, automating rollbacks if something fails, monitoring deployments to ensure they're not left in a bad state, staggering the rollout (which MRSK does support). There's a lot here that comes in the box with conventional tooling that you need to write here.

If you want to use it today

Here's the minimum I would recommend.

  • I'd use something like the 1Password SSH agent so you can at least distribute keys across the servers without having to manually add them to each laptop: https://developer.1password.com/docs/ssh/agent/
  • I'd set up a bastion server (which is supported by MSRK and did work in my testing). This is a cheap box that means you don't need to allow your application and database servers to be exposed directly to the internet. There is a decent tutorial on how to make one here: https://zanderwork.com/blog/jump-host/
  • Ideally do this all from within a CI/CD stack so that you are running it from one central location and can more easily centralize the secret storage.

Parse YAML and push to Confluence in Python

I recently rewrote a system to output a YAML to get a bunch of information for internal users. However we use Confluence as our primary information sharing system. So I needed to parse the YAML file on GitHub (where I was pushing it after every generation), generate some HTML and then push this up to Confluence on a regular basis. This was surprisingly easy to do and so I wanted to share how I did it.

from atlassian import Confluence
from bs4 import BeautifulSoup
import yaml
import requests
import os

print(os.environ)
git_username = "github-username"
git_token = os.environ['GIT-TOKEN']
confluence_password = os.environ['CONFLUENCE-PASSWORD']
url = 'https://raw.githubusercontent.com/org/repo/file.yaml'
page_id=12345678
page_title='Title-Of-Confluence-Page'
path='/tmp/file.yaml'
original_html =  '''<table>
  <tr>
    <th>Column Header 1</th>
    <th>Column Header 2</th>
    <th>Column Header 3</th>
    <th>Column Header 4</th>
  </tr>
</table>'''

def get_file_from_github(url, username, password):
    response = requests.get(url, stream=True, auth=(username,password))
    print(response)
    with open(path, 'wb') as out_file:
        out_file.write(response.content)
        print('The file was saved successfully')

def update_confluence(path, page_id, page_title, original_html):
    with open(path, 'r') as yamlfile:
        current_yaml = yaml.safe_load(yamlfile)

    confluence = Confluence(
            url='https://your-hosted-confluence.atlassian.net',
            username='[email protected]',
            password=confluence_password,
            cloud=True)
    soup = BeautifulSoup(original_html, 'html5lib')
    table = soup.find('table')
    
    #This part is going to change based on what you are parsing but hopefully provides a template. 

    for x in current_yaml['top-level-yaml-field']:
        dump = '\n'.join(x['list-of-things-you-want'])
        pieces = x['desc'].split("-")

        table.append(BeautifulSoup(f'''
                                <tr>
                                  <td>{name}</td>
                                  <td>{x['role']}</td>
                                  <td>{x['assignment']}</td>
                                  <td style="white-space:pre-wrap; word-									wrap:break-word">{dump}</td>
                                </tr>''', 'html.parser'))
    
    body = str(soup)
    update = confluence.update_page(page_id, page_title, body, parent_id=None, type='page', representation='storage', minor_edit=False, full_width=True)
    
    print(update)

def main(request):
    if confluence_password is None:
        print("There was an issue accessing the secret.")
    get_file_from_github(url, git_username, git_token)
    update_confluence(path, page_id, page_title, original_html)
    return "Confluence is updated"

Some things to note:

  • obviously the YAML parsing depends on the file you are going to parse
  • The Confluence Page ID is most easily grabbed from the URL in Confluence when you make the page. You can get instructions on how to grab the Page ID here.
  • I recommend making the Confluence page first, grabbing the ID and then running it as an update.
  • I'm running logging through a different engine.
  • The github token should be a read-only token scoped to just the repo you need. Don't make a large token.

The deployment process on GCP couldn't have been easier.  Put your secrets in the GCP secret manager and then run:

gcloud functions deploy confluence_updater --entry-point main --runtime python310 --trigger-http --allow-unauthenticated --region=us-central1 --service-account serverless-function-service-account@gcp-project-name.iam.gserviceaccount.com --set-secrets 'GIT-TOKEN=confluence_git_token:1,CONFLUENCE-PASSWORD=confluence_password:1'
  • I have --allow-unauthenticated just for testing purposes. You'll want to put it behind auth
  • The set-secrets loads them an environmental variables.

There you go! You'll have a free function you can use forever to parse YAML or any other file format from GitHub and push to Confluence as HTML for non-technical users to consume.

The requirements.txt I used is below:

atlassian-python-api==3.34.0
beautifulsoup4==4.11.2
functions-framework==3.3.0
install==1.3.5
html5lib==1.1

Problems? Hit me up on Mastodon: https://c.im/@matdevdug


TIL How to write a Python CLI tool that writes Terraform YAML

I'm trying to use more YAML in my Terraform as a source of truth instead of endlessly repeating the creation of resources and to make CLIs to automate the creation of the YAML. One area that I've had a lot of luck with this is GCP IAM. This is due to a limitation in GCP that doesn't allow for the combination of pre-existing IAM roles into custom roles, which is annoying. I end up needing to assigns people the same permissions to many different projects and wanted to come up with an easier way to do this.

I did run into one small problem. When attempting to write out the YAML file, PyYAML was inserting these strange YAML tags into the end file that looked like this: !!python/tuple.

It turns out this is intended behavior, as PyYAML is serializing arbitrary objects as generic YAML, it is inserting deserialization hint tags. This would break the Terraform yamldecode as it couldn't understand the tags being inserted. The breaking code looks as follows.

with open(path,'r') as yamlfile:
    current_yaml = yaml.safe_load(yamlfile)
    current_yaml['iam_roles'].append(permissions)

if current_yaml:
    with open(path,'w') as yamlfile:
        yaml.encoding = None
        yaml.dump(current_yaml, yamlfile, indent=4, sort_keys=False)

I ended up stumbling across a custom Emitter setting to fix this issue for Terraform. This is probably not a safe option to enable, but it does seem to work for me and does what I would expect.

The flag is: yaml.emitter.Emitter.prepare_tag = lambda self, tag: ''

So the whole thing, including the click elements looks as follows.

import click
import yaml

@click.command()

@click.option('--desc', prompt='For what is this role for? Example: analytics-developer, devops, etc', help='Grouping to assign in yaml for searching')

@click.option('--role', prompt='What GCP role do you want to assign?', help="All GCP premade roles can be found here: https://cloud.google.com/iam/docs/understanding-roles#basic")

@click.option('--assignment', prompt="Who is this role assigned to?", help="This needs the syntax group:, serviceAccount: or user: before the string. Example: group:[email protected] or serviceAccount:[email protected]")

@click.option('--path', prompt="Enter the relative path to the yaml you want to modify.", help="This is the relative path from this script to the yaml file you wish to append to", default='project-roles.yaml')

@click.option('--projects', multiple=True, type=click.Choice(['test', 'example1', 'example2', 'example3']))

def iam_augmenter(path, desc, role, assignment, projects):
    permissions = {}
    permissions["desc"] = desc
    permissions["role"] = role
    permissions["assignment"] = assignment
    permissions["projects"] = projects

    with open(path,'r') as yamlfile:
        current_yaml = yaml.safe_load(yamlfile)
        current_yaml['iam_roles'].append(permissions)

    if current_yaml:
        with open(path,'w') as yamlfile:
            yaml.emitter.Emitter.prepare_tag = lambda self, tag: ''
            yaml.encoding = None
            yaml.dump(current_yaml, yamlfile, indent=4, sort_keys=False)

if __name__ == '__main__':
    iam_augmenter()

This worked as intended, allowing me to easily append to an existing YAML file with the following format:

iam_roles:
  - desc: analytics-reader-bigquery-data-viewer
    role: roles/bigquery.dataViewer
    assignment: group:[email protected]
    projects:
    - example1
    - example2
    - example3

This allowed me to easily add the whole thing to automation that can be called from a variety of locations, meaning we can keep using the YAML file as the source of truth but quickly append to it from different sources. Figured I would share as this took me an hour to figure out and maybe it'll save you some time.

The Terraform that parses the file looks like this:

locals {
  all_iam_roles = yamldecode(file("project-roles.yaml"))["iam_roles"]


  stock_roles = flatten([for iam_role in local.all_iam_roles :
    {
      "description" = "${iam_role.desc}"
      "role"        = "${iam_role.role}"
      "member"      = "${iam_role.assignment}"
      "project"     = "${iam_role.projects}"
    }
  ])
  
  # Shortname for projects to full names
  test          = "test-dev"
  example1      = "example1-dev"
  example2      = "example2-dev"
  example3      = "example3-dev"
}

resource "google_project_iam_member" "test-dev" {
  for_each = {
    for x in local.stock_roles : x.description => x
    if contains(x.project, local.test) == true
  }
  project = local.test
  role    = each.value.role
  member  = each.value.member
}

resource "google_project_iam_member" "example1-dev" {
  for_each = {
    for x in local.stock_roles : x.description => x
    if contains(x.project, local.example1) == true
  }
  project = local.example1
  role    = each.value.role
  member  = each.value.member
}

Hopefully this provides someone out there in GCP land some help with handling large numbers of IAM permissions. I've found it to be much easier to wrangle as a Python CLI that I can hook up to different sources.

Did I miss something or do you have questions I didn't address? Hit me up on Mastodon: https://c.im/@matdevdug


Layoffs are Cruel and Don't Work

Imagine you had a dog. You got the dog when it was young, trained and raised it. This animal was a part of your family and you gave it little collars and cute little clothes with your family name on it. The dog came to special events and soon thought of this place as its home and you all as loved ones. Then one day with no warning, you locked the dog out of the house. You and the other adults in the house had decided getting rid of any random dog was important to the bank that owned your self, so you locked the door. Eventually it wandered off, unsure of why you had done this, still wearing the sad little collar and t-shirt with your name.

If Americans saw this in a movie, people would be warning each other that it was "too hard to watch". In real life, this is an experience that a huge percentage of people working in tech will go through. It is a jarring thing to watch, seeing former coworkers told they don't work there anymore by deactivating their badges and watching them try to swipe through the door. I had an older coworker who we'll call Bob, upon learning it was layoffs, took off for home. "I can't watch this again" he said as he quickly shoved stuff into his bag and ran out the door.

In that moment all illusion vanishes. This place isn't your home, these people aren't your friends and your executive leadership would run you over with their cars if you stood between them and revenue growth. Your relationship to work changes forever. You will never again believe that you are "critical" to the company or that the company is interested in you as a person. I used to think before the layoffs that Bob was a cynic, never volunteering for things, always double-checking the fine print of any promise made by leadership. I was wrong and he was right.

Layoffs don't work

Let us set aside the morality of layoffs for a moment. Do layoffs work? Are these companies better positioned after they terminate some large percentage of people to compete? The answer appears to be no:

The current study investigated the financial effects of downsizing in Fortune 1000 Companies during a five-year period characterized by continuous economic growth. Return on assets, profit margin, earnings per share, revenue growth, and market capitalization were measured each year between 2003 and 2007. In general, the study found that both downsized and nondownsized companies reported positive financial outcomes during this period. The downsized companies, however, were outperformed consistently by the nondownsized ones during the initial two years following the downsizing. By the third year, these differences became statistically nonsignificant. Consequently, although many companies appear to conduct downsizing because the firm is in dire financial trouble, the results of this study clearly indicated that downsizing does not enhance companies' financial competitiveness in the near-term. The authors discuss the theoretical and practical implications of these findings.

Source

In all my searching I wasn't able to find any hard data which suggests layoffs either enable a company to better compete or improve earnings in the long term. The logic executives employ seems to make sense on its face. You eliminate employees and departments which enable you to use that revenue to invest in more profitable areas of the business. You are scaling to meet demand, so you don't have employees churning away at something they don't need to be working on. Finally you are eliminating low performance employees.

It’s about the triumph of short-termism, says Wharton management professor Adam Cobb. “For most firms, labor represents a fairly significant cost. So, if you think profit is not where you want it to be, you say, ‘I can pull this lever and the costs will go down.’ There was a time when social norms around laying off workers when the firm is performing relatively well would have made it harder. Now it’s fairly normal activity.”

This all tracks until you start getting into the details. Think about it strictly from a financial perspective. Firms hire during boom periods, paying a premium for talent. Then they layoff people, incurring the institutional hit of losing all of that knowledge and experience. Next time they need to hire, they're paying that premium again. It is classic buying high and selling low. In retail and customer facing channels, this results in a worse customer experience, meaning the move designed to save you money costs you more in the long term. Investors don't even always reward you for doing it, even though they ask for them.

Among the current tech companies this logic makes even less sense. Meta, Alphabet, PayPal and others are profitable companies, so this isn't even a desperate bid to stay alive. These companies are laying people off in response to investor demand and imitative behavior. After decades of research, executive know layoffs don't do what it says on the box, but their board is asking why they aren't considering layoffs and so they proceed anyway.

Low-performing Employees

A common argument I've heard is "well ok, maybe layoffs don't help the company directly, but it is an opportunity to get rid of dead weight". Sure, except presumably at-will employers could have done that at any time if they had hard data that suggested this pool of employees weren't working out.

Recently, we asked 30 North American human resource executives about their experiences conducting white-collar layoffs not based on seniority — and found that many believed their organizations had made some serious mistakes. More than one-third of the executives we interviewed thought that their companies should have let more people go, and almost one-third thought they should have laid off fewer people. In addition, nearly one-third of the executives thought their companies terminated the wrong person at least 20% of the time, and approximately an additional quarter indicated that their companies made the wrong decision 10% of the time. More than one-quarter of the respondents indicated that their biggest error was terminating someone who should have been retained, while more than 70% reported that their biggest error was retaining someone who should have been terminated.

Source

Coming up with a scientific way of determining who is doing a good job and who is doing a bad job is extremely hard. If your organization wasn't able to identify those people before layoffs, you can't do it at layoff time. My experience with layoffs is it is less a measure of quality and more an opportunity for leadership to purge employees who are expensive, sick or aren't friends with their bosses.

All in all we know layoffs don't do the following:

  • They don't reliably increase stock price (American Express post layoffs)
  • Layoffs don't increase productivity or employee engagement (link)
  • It doesn't keep the people you have. For example, layoffs targeting just 1% of the workforce preceded, on average, a 31% increase in turnover. Source
  • It doesn't help you innovate or reliably get rid of low-performance employees.

Human Cost

Layoffs also kill people. Not in the spiritual sense, but in the real physical sense. In the light beach book "MORTALITY, MASS-LAYOFFS, AND CAREER OUTCOMES:
AN ANALYSIS USING ADMINISTRATIVE DATA" which you can download here we see some heavy human costs for this process.

We find that job displacement leads to a 15-20%
increase in death rates during the following 20 years. If such increases were sustained beyond this period, they would imply a loss in life expectancy of about 1.5 years for a worker displaced at age 40.

The impact isn't just on the people you lay-off, but on the people who have to lay them off and the employee who remain. It is a massive trickle-down effect which destroys morale at a critical juncture in your company. Your middle management is going to be more stressed and less capable. The employees you have are going to be less efficient and capable as well.

This isn't a trivial amount of damage being done here. Whatever goodwill an employer has built with their employees is burned to the ground. The people you have left are going to trust you less, not work as hard, be more stressed and resent you more. This is at a time when you are asking more of the remaining teams, feeding into that increase in turnover.

If you were having trouble executing before, there is no way in hell it gets better after this.

Alternatives

“Companies often attempt to move out of an unattractive game and into an attractive one through acquisition. Unfortunately, it rarely works. A company that is unable to strategize its way out of a current challenging game will not necessarily excel at a different one—not without a thoughtful approach to building a strategy in both industries. Most often, an acquisition adds complexity to an already scattered and fragmented strategy, making it even harder to win overall.”

So if layoffs don't work, what are the options? SAS Institute has always been presented as a fascinating outlier in this area as a software company that bucks the trends. One example I kept seeing was SAS Institute has never done layoffs, instead hiring during downturns as a way to pick up talent for cheap.  You can read about it here.

Now in reality the SAS Institute has done small rounds of layoffs, so this often-repeated story isn't as true as it sounds. Here they are laying off 100 people. These folks were in charge of a lot of office operations during a time when nobody was going to the office but it still counts. However the logic behind not doing mass layoffs still holds true despite the lie being repeated that SAS Institute never ever does them.  

Steve Jobs also bucked this trend somewhat famously.

"We've had one of these before, when the dot-com bubble burst. What I told our company was that we were just going to invest our way through the downturn, that we weren't going to lay off people, that we'd taken a tremendous amount of effort to get them into Apple in the first place -- the last thing we were going to do is lay them off. And we were going to keep funding. In fact we were going to up our R&D budget so that we would be ahead of our competitors when the downturn was over. And that's exactly what we did. And it worked. And that's exactly what we'll do this time."

If you truly measure the amount of work it takes to onboard employees, get them familiar with your procedures and expectations and retain them during the boom times, it really stops making sense to jettison them during survivable downturns. These panic layoffs that aren't based in any sort of hard science or logic are amazing opportunities for company who are willing to weather some bad times and emerge intact with a motivated workforce.

It's not altruism at work. Rather, executives at no-layoff companies argue that maintaining their ranks even in terrible times breeds fierce loyalty, higher productivity, and the innovation needed to enable them to snap back once the economy recovers.

So if you work for any company, especially in tech, and leadership starts discussing layoffs, you should know a few things. They know it doesn't do what they say it does. They don't care that it is going to cause actual physical harm to some of the people they are doing it to. These execs are also aware it isn't going to be a reliable way of getting rid of low-performing employees or retaining high performing ones.

If you choose to stay after a round of layoffs, you are going to be asked to do more with less. The people you work with are going to be disinterested in their jobs or careers and likely less helpful and productive than ever before. Any loyalty or allegiance to the company is dead and buried, so expect to see more politics and manipulations as managers attempt to give leadership what they want in order to survive.

On the plus side you'll never have the same attitude towards work again.


Why are passwords a users problem?

Why are passwords a users problem?

In the light of GoTo admitting their breach was worse than initially reported I have found myself both discussing passwords with people more than ever before and directing a metric ton of business towards 1Password. However, it has raised an obvious question for me: why are users involved with passwords at all? Why is this still something I have to talk to my grandparents about?

Let us discuss your password storage system again

All the major browsers have password managers that sync across devices. These stores are (as far as I can tell) reasonably secure. Access to the device would reveal them, but excluding physical access to an unlocked computer they seem fine. There is a common API, the Credentials Management API docs here that allow for a website to query the password store inside of the browser for the login, even allowing for Federated logins or different or same logins for subdomains as part of the spec. This makes for a truly effortless login experience for users without needing them to do anything. These browsers already have syncing with a master password concept across mobile/desktop and can generate passwords upon request.

If the browser can: generate a password, store a password, sync a password and return the password when asked, why am I telling people to download another tool that does the exact same thing? A tool made by people who didn't make the browser and most of whom haven't been independently vetted by anybody.

Surely it can't be that easy

So when doing some searching about the Credentials Management API, one of the sites you run across a lot is this demo site: https://credential-management-sample.appspot.com/. This allows you to register an account, logout and then see the login auto-filled by the browser when you get back to it. The concept seems to work as expected on Chrome.

Bummer

Alright so it doesn't work on Firefox and Safari but honestly, neither do 10% of the websites I go to. 88% of all the users in the world still isn't bad, so I'm not willing to throw the idea out entirely.

Diving into how the process works, again, it seems pretty straight forward.

var signin = document.querySelector('#signin');
signin.addEventListener('click', (e) => {
  if (window.PasswordCredential || window.FederatedCredential) {
    navigator.credentials
      .get({
        password: true,
        federated: {
          providers: ['https://accounts.google.com'],
        },
        mediation: 'optional',
      })
      .then((c) => {
        if (c) {
          switch (c.type) {
            case 'password':
              return sendRequest(c);
              break;
            case 'federated':
              return gSignIn(c);
              break;
          }
        } else {
          return Promise.resolve();
        }
      })
      .then((profile) => {
        if (profile) {
          updateUI(profile);
        } else {
          location.href = '/signin';
        }
      })
      .catch((error) => {
        location.href = '/signin';
      });
  }
});

If the user has a login then get it. It supports federated logins or passwords and falls back to redirecting to the sign-in page if you cannot locate a login. I tried the samples available here and they seemed to mostly be plug and play. In fact in my testing this seemed to be a far superior user experience to using traditional password managers with browser extensions.

Also remember that even for browsers that don't support it, I'm just falling back to the normal password storage system. So for websites that support it, the experience is magical on Chrome and the same as using a password manager with every other browser. It doesn't cost anything, it isn't complicated and it is a better experience.

I know someone out there is gearing up 

Are Password Managers Better?

One common theme when you search for this stuff is an often repeated opinion that browser password managers are trash and password managers are better. Looking more into how they work, this seems to come with some pretty big asterisks. Most password managers seem to use some JavaScript from their CDN to insert their interface into the form values.

This is a little nerve-racking because websites could interact with that element but also the communication between the password manager is a potential source of problems. Communication to a local HTTP target seems to make sense, but this can be the source of problems (and has been in the past). Example Example Example

So at a minimum you'd need the tool you chose to meet these requirements to reach or exceed the same level of security as the browser built-in manager.

  • The add-on runs in a sandboxed background page
  • Communication between the password manager and the page isn't happening in the DOM
  • Any password element would need to be an iframe or something else that stops the site from interacting with the content
  • CSP is set up flawlessly
  • Communication between the extension and anything outside of the extension is secure and involves some verification step
  • Code validation in pretty much every direction: is the browser non-modified, is the server process valid, is the extension good, etc

This isn't even getting to the actual meat of the encryption on the secrets or the security of the syncing. We're just talking about whether the thing that interacts with the secrets and jams them into the page.

To make a product that does this and does it well and consistently across releases isn't an easy problem. Monitoring for regressions and breaches would be critical, disclosures would be super important to end users and you would need to get your stack vetted by an outside firm kind of a lot. I trust the developers of my web browser in part because I have to and because, over the years, Mozilla has been pretty good to me. The entire browser stack is under constant attack because it has effectively become the new OS we all run.

Well these companies are really good at all that

Are they? Frankly in my research I wasn't really blown away with the amount of technical auditing most of these companies seem to do or produce any evidence of. The only exception to this was 1Password and Bitwarden.

1Password

I love that they have a whitepaper available here but nobody finished writing it.

No rush I guess

However they do have actual independent audits of their software. Recent audits done by reputable firms and available for review. You can see all these here. For the record this should be on every single one of these companies website for public review.

Keeper Password Manager

I found what they call a whitepaper but it's 17 pages and basically says "We're ISO certified". That's great I guess, but not the level of detail I would expect at all. You can read it here. This doesn't mean you are doing things correctly, just that you have generated enough documentation to get ISO certified.

Not only do we implement the most secure levels of encryption, we also adhere to very strict internal
practices that are continually audited by third parties to help ensure that we continue to develop secure software and
provide the world’s most secure cybersecurity platform.
Great, can I read these audits?

Dropbox Password

Nothing seems to exist discussing this products technical merits at all. I don't know how it works. I can look into it more if someone can point me to something else, but it seems to be an encrypted file that lives in your Dropbox folder secured with a key generated by Dropbox and returned to you upon enrollment.  

Dashlane

I found a great security assessment from 2016 that seemed to suggest the service was doing pretty well. You can get that here. I wasn't able to find one more recent. Reading their whitepaper here they actually do go into a lot of detail and explain more about how the service works, which is great and I commend them for that.

It's not sufficient though. I'm glad you understand how the process should work but I have no idea if that is still happening or if this is more of an aspirational document. I often understand the ideal way software should work but the real skill of the thing is getting it to work that way.

Bitwarden

They absolutely kill it in this department. Everything about them is out in the open like it should be. However sometimes they discover issues, which is good for the project but underscores what I was talking about above. It is hard to write a service that attempts to handle your most sensitive data and inject that data into random websites.

These products introduce a lot of complexity and failure points into the secret management game. All of them with the exception of 1Password seem to really bet the farm on the solo Master/Primary Password concept. This is great if your user picks a good password, but statistically this idea seems super flawed to me. This is a password they're going to enter all the time, won't they pick a crap one? Even with 100,000 iterations on that password it's pretty dangerous.

Plus if you are going to rely on the concept of "well if the Master/Primary Password is good then the account is secure" then we're certainly not justifying the extra work here. It's as good as the Firefox password manager and not as good as the Safari password manager. Download Firefox and set a good Primary Password.

Can we be honest with each other?

I want you to go to this website and I want you to type in your parents password. You know the one, the one they use for everything? The one that's been shouted through the halls and texted/emailed/written on so many post-it notes that any concept of security has long since left the building.

That's the password they're gonna use to secure the vault. They shouldn't, but they're gonna. Now I want you to continue on this trust exercise with me. If someone got access to read/write in a random cross-section of your coworkers computers, are passwords really the thing that is gonna destroy your life? Not an errant PDF, Excel document of customer data or unsecured AWS API key?

I get it, security stuff is fun to read. "How many super computers will it take to break in" feels very sci-fi.

Well but my family/coworkers/lovers all share passwords

I'm not saying there is zero value to a product where there is a concept of sharing and organizing passwords with a nice UI, but there's also no default universal way of doing it. If all the password managers made a spec that they held to that allowed for secure bidirectional sharing between these services, I'd say "yeah the cost/benefit is likely worth it". However chances are if we're in a rush and sharing passwords, I'm going to send you the password through an insecure system anyway.

Plus the concept of sharing introduces ANOTHER huge layer of possible problems. Permission mistakes, associating the secret with the wrong user, the user copying the secret into their personal vault and not getting updated when the shared secret is updated are all weird issues I've seen at workplaces. To add insult to injury, the requirements of getting someone added to a shared folder they need is often so time-consuming people will just bypass the process and copy/paste the secret anyway.

Also let's be honest among ourselves here. Creating one shared login for a bunch of employees to use was always a bad idea. We all knew it was a bad idea and you knew it while you were doing it. Somewhere in the back of your mind you were like "boy it'll suck if someone decides to quit and steals these".

I think we can all agree on this

I know, "users will do it anyway". Sure but you don't have to make it institutional policy. The argument of "well users are gonna share passwords so we should pay a service to allow them to do it easier" doesn't make a lot of sense. I also know sometimes you can't avoid it, but for those values, if they're that sensitive, it might not make sense to share them across all employees in a department. Might make more sense to set them up with a local tool like pass.

Browsers don't prompt the user to make a Master/Primary Password

That is true and perhaps the biggest point in the category of "you should use a password manager". The way the different browsers do this is weird. Chrome effectively uses the users login as the key, on Windows calling a Windows API that encrypts the sqlite database and decrypts it when the user logs in. On the Mac there is a login keychain entry of a random value that seems to serve the same function. If the user is logged in, the sqlite database is accessible. If they aren't, it isn't.

On Firefox there is a Primary Password that you can set that effectively works like most of the password managers that we saw. Unlike password managers this isn't synced, so you would set a different primary for every Firefox device. That means the Firefox account is still controlling what syncs to what, this just ensures that a user who takes the database of username and passwords would need this key to decrypt it.

So for Chrome if your user is logged in, the entire password database is available. On macOS they can get access to the decryption key through the login keychain and on Firefox the value is encrypted in the file but for additional security and disallowing for random users to interact with it through the browser. There is a great write-up of how local browser password stores work here.

There are more steps than Chrome but allows for a Master Password

Is that a sufficient level of security?

Honestly? Yeah I think so. The browser prompts the user to generate a secure value, stores the value, syncs the value securely and then, for 88% of the users on the web, the site can use a well-documented API to automatically fill in that value in the future. I'd love to see Chrome add a few more security levels, some concept of Primary Password so that I can lock the local password storage to something that isn't just me being logged into my user account.

However we're also rapidly reaching a point where the common wisdom is that everything important needs 2FA. So if we're already going to treat the password as a tiered approach, I think a pretty good argument could be made that it is safer for a user to store their passwords in the browser store (understanding that the password was always something that a malicious actor with access to their user account could grab through keyloggers/clipboard theft/etc) and having a 2FA on a phone as compared to what a lot of people do, which is keep the 2FA and the password inside of the same third-party password manager.

TOTPs are just password x2

When you scan that QR code, you are getting back a string that looks something like this:

otpauth://totp/example:[email protected]?algorithm=SHA1&digits=6&issuer=mywebsite&period=30&secret=CelwNEjn3l7SWIW5SCJT

This combined with time gets you your 6 digit code. The value of this approach is twofold: does the user posses another source of authentication and now there is a secret which we know is randomly generated that effectively serves as a second password. This secret isn't exposed to normal users as a string, so we don't need to worry about that.

If I have the secret value I can make the same code. If we remove the second device component like we do with password managers, what we're saying is "TOTP is just another random password". If we had a truly random password to begin with, I'm not adding much to the security model by adding in the 2FA but sticking it in the same place.

What if they break into my phone

On iOS even without a Primary password set on Firefox, it prompts for FaceID authentication before allowing someone access to the list of stored passwords. So that's already a pretty intense level of security. Add in a Primary password and we've reached the same level of security as 1Password. Chrome is the same story.

It's the same level of security with Android. Attempt to open the saved passwords, get a PIN or biometric check depending on the phone. That's pretty good! Extra worried about it? Use a TOTP app that requires biometrics before they reveal that code. Here is one for iOS

Even if someone steals your phone and attempts to break into your accounts, there are some non-trivial security measures in their way with the standard browser password storage combined with a free TOTP app that checks identity.

I use my password manager for more than passwords

Sure and so do I, but that doesn't really matter to my point. The common wisdom that all users would benefit from the use of a dedicated password manager is iffy at best. We've now seen a commonly recommended one become so catastrophically breached that anything stored there is now needs to be considered leaked. This isn't the first credential leak or the 10th or the 100th, but now there is just a constant never-ending parade of password leaks and cracks.

So if that is true and a single password cannot ever truly serve as the single step of authentication for important resources, then we're always going to be relying on adding on another factor. Therefore the value a normal user gets out of a password manager vs the browser they're already using is minimal. With passkeys and Credentials Management API the era of exposing the user to the actual values being used in the authentication step is coming to a close anyway. Keys synced by the browser vendor will become the default authentication step for users.

In the light of that reality, it doesn't really make sense to bother users with the additional work and hassle of running a new program to manage secrets.

Summary of my rant

  • Normal users don't need to worry about password managers and would be better served by using the passwords the browser generates and investing that effort into adding 2FA using a code app on their phone or a YubiKey.
  • In the face of new APIs and standards, the process of attempting to manage secrets with an external manager will becoming exceedingly challenging. It is going to be much much easier to pick one browser and commit to it everywhere vs attempting to use a tool to inject all these secrets
  • With the frequency of breaches we've already accepted that passwords at, at best, part of a complete auth story. The best solution we have right now is "2 passwords".
  • Many of the tools users rely on to manage all their secrets aren't frequently audited or if they are, any security assessment of their stack isn't being published.
  • For more technical users looking to store a lot of secrets for work, using something like pass will likely fulfill that need with a smaller less complicated and error-prone technical implementation. It does less so less stuff can fail.
  • If you are going to use a password manager, there are only two options: 1Password and Bitwarden. 1Password is the only one that doesn't rely exclusively on the user-supplied password, so if you are dealing with very important secrets this is the right option.
  • It is better to tell users "shared credentials are terrible and please only use them if you absolutely have no choice at all" than to set up a giant business-wide tool of shared credentials which are never rotated.

My hope is with passkeys and the Credentials Management API this isn't a forever problem. Users won't be able to export private keys, so nobody is going to be sharing accounts. The Credentials Management UI and flow is so easy for developers and users that it becomes the obvious choice for any new service. My suspicion is we'll still be telling users to set up 2FA well after its practical lifespan has ended, but all we're doing is replicating the same flow as the browser password storage.

Like it or not you are gonna start to rely on the browser password manager a lot soon, so might as well get started now.

Wanna send me angry messages? What else is the internet good for! https://c.im/@matdevdug


Upgrading Kubernetes - A Practical Guide

Upgrading Kubernetes - A Practical Guide

One common question I see on Mastodon and Reddit is "I've inherited a cluster, how do I safely upgrade it". It's surprising that this still isn't a better understood process given the widespread adoption of k8s, but I've had to take over legacy clusters a few times and figured I would write up some of the tips and tricks I've found over the years to make the process easier.

A very common theme to these questions is "the version of Kubernetes is very old, what do I do". Often this question is asked with shame, but don't feel bad. K8s is better at the long-term maintenance story as compared to a few years ago, but is still a massive amount of work to keep upgraded and patched. Organizations start to fall behind almost immediately and teams are hesitant to touch a working cluster to run the upgrades.

NOTE: A lot of this doesn't apply if you are using hosted Kubernetes. In that case, the upgrade process is documented through the provider and is quite a bit less complicated.

How often do I need to upgrade Kubernetes?

This is something people new to Kubernetes seem to miss a lot, so I figured I would touch on it. Unlike a lot of legacy infrastructure projects, k8s moves very quickly in terms of versions. Upgrading can't be treated like switching to a new Linux distro LTS release, you need to plan to do it all the time.

To be fair to the Kubernetes team they've done a lot to help make this process less horrible. They have a support policy of N-2, meaning that the 3 most recent minor versions receive security and bug fixes. So you have time to get a cluster stood up and start the process of planning upgrades, but it needs to be in your initial cluster design document. You cannot wait until you are almost EOL to start thinking "how are we going to upgrade". Every release gets patched for 14 months, which seems like a lot but chances are you aren't going to be installing the absolute latest release.

Current support timeline

So the answer to "how often do you need to be rolling out upgrades to Kubernetes" is often. They are targeting 3 releases a year, down from the previous 4 releases a year. You can read the projects release goals here. However in order to vet k8s releases for your org, you'll likely need to manage several different versions at the same time in different environments. I typically try to let a minor version "bake" for at least 2 weeks in a dev environment and same for stage/sandbox whatever you call the next step. Prod version upgrades should ideally have a month of good data behind them suggesting the org won't run into problems.

My staggered layout

  1. Dev cluster should be as close to bleeding edge as possible. A lot of this has to do with establishing SLAs for the dev environment, but the internal communication should look something like "we upgrade dev often during such and such a time and rely on it to surface early problems". My experience is you'll often hit some sort of serious issue almost immediately when you try to do this, which is good. You have time to fix it and know the maximum version you can safely upgrade to as of the day of testing.
  2. Staging is typically a minor release behind dev. "Doesn't this mean you can get into a situation where you have incompatible YAMLs?" It can but it is common practice at this point to use per-environment YAMLs. Typically folks are much more cost-aware in dev environments and so some of the resource requests/limits are going to change. If you are looking to implement per-environment configuration check out Kustomize.
  3. Production I try to keep as close to staging as possible. I want to keep my developers lives as easy as possible, so I don't want to split the versions endlessly. My experience with Kubernetes patch releases has been they're pretty conservative with changes and I rarely encounter problems. My release cadence for patches on the same minor version is two weeks in staging and then out to production.
  4. IMPORTANT. Don't upgrade the minor version until it hits patch .2 AT LEAST. What does this mean?

Right now the latest version of Kubernetes is 1.26.0. I don't consider this release ready for a dev release until it hits 1.26.2. Then I start the timer on rolling from dev -> stage -> production. By the time I get the dev upgrade done and roll to staging, we're likely at the .3 release (depending on the time of year).

That's too slow. Maybe, but I've been burned quite a few times in the past by jumping too early. It's nearly impossible for the k8s team to possibly account for every use-case and guard against every regression and by the time we hit .2, there tends to be wide enough testing that most issues have been discovered. A lot of people wait until .5, which is very slow (but also the safest path).

In practice this workflow looks like this:

  • Put in the calendar when releases reach EOL which can be found here.
  • Keep track of the upcoming releases and put them in the calendar as well. You can see that whole list in their repo here.
  • You also need to do this with patch releases, which typically come out monthly.
  • If you prefer to keep track of this in RSS, good news! If you add .atom to the end of the release URL, you can add it to a reader. Example: https://github.com/kubernetes/kubernetes/releases.atom. This makes it pretty easy to keep a list of all releases. You can also just subscribe in GitHub but I find the RSS method to be a bit easier (plus super simple to script, which I'll publish later).
  • As new releases come out, roll latest to dev once it hits .2.  I typically do this as a new cluster, leaving the old cluster there in case of serious problems. Then I'll cut over deployments to new cluster and monitor for issues. In case of massive problems, switch back to old cluster and start the documentation process for what went wrong.
  • When I bump the dev environment, I then circle around and bump the stage environment to one minor release below that. I don't typically do a new cluster for stage (although you certainly can). There's a lot of debate in the k8s community over "should you upgrade existing vs make new". I do it for dev because I would rather upgrade often with fewer checks and have the option to fall back.
  • Finally we bump prod. This I rarely will make a new cluster. This is a matter of personal choice and there are good arguments for starting fresh often, but I like to maintain the history in etcd and I find with proper planning a rolling upgrade is safe.

This feels like a giant pain in the ass.

I know. Thankfully cloud providers tend to maintain their own versions which buy you a lot more time, which is typically how people are going to be using it. But I know a lot of people like to run their own clusters end to end or just need to for various reasons. It is however a pain to do this all the time.

Is there an LTS version?

So there was a Kubernetes working group set up to discuss this and their conclusion was it didn't make sense to do. I don't agree with this assessment but it has been discussed.

My dream for Kubernetes would be to add a 2 year LTS version and say "at the end of two years there isn't a path to upgrade". I make a new cluster with the LTS version, push new patches as they come out and then at the end of two years know I need to make a new cluster with the new LTS version. Maybe the community comes up with some happy path to upgrade, but logistically it would be easier to plan a new cluster every 2 years vs a somewhat constant pace of pushing out and testing upgrades.

How do I upgrade Kubernetes?

  1. See if you can upgrade safely against API paths. I use Pluto. This will check to see if you are calling deprecated or removed API paths in your configuration or helm charts. Run Pluto against local files with: pluto detect-files -d. You can also check Helm with: pluto detect-helm -owide. Adding all of this to CI is also pretty trivial and something I recommend for people managing many clusters.

2. Check your Helm releases for upgrades. Since typically things like the CNI and other dependencies like CoreDNS are installed with Helm, this is often the fastest way to make sure you are running the latest version (check patch notes to ensure they support the version you are targeting). I use Nova for this.

3. Get a snapshot of etcd. You'll want to make sure you have a copy of the data in your production cluster in the case of a loss of all master nodes. You should be doing this anyway.

3. Start the upgrade process. The steps to do this are outlined here.

If you are using managed Kubernetes

This process is much easier. Follow 1 + 2, set a pod disruption budget to allow for node upgrades and then follow the upgrade steps of your managed provider.

I messed up and waited too long, what do I do?

Don't feel bad, it happens ALL the time. Kubernetes is often set up by a team that is passionate about it, then that team is disbanded and maintenance becomes a secondary concern. Folks who inherit working clusters are (understandably) hesitant to break something that is working.

With k8s you need to go from minor -> minor in order, not jumping releases. So you need to basically (slowly) bump versions as you go. If you don't want to do that, your other option is to make a new cluster and migrate to it. I find for solo operators or small teams the upgrade path is typically easier but more time consuming.

The big things you need to anticipate are as follows:

  • Ingress. You need to really understand how traffic is coming into the cluster and through what systems.
  • Service mesh. Are you using one, what does it do and what version is it set at? Istio can be a BEAR to upgrade, so if you can switch to Linkerd you'll likely be much happier in the long term. However understanding what controls access to what namespaces and pods is critical to a happy upgrade.
  • CSI drivers. Do you have them, do they need to be upgraded, what are they doing?
  • CNI. Which one are you using, is it still supported, what is involved in upgrading it.
  • Certificates. By default they expire after a year. You get fresh ones with every upgrade, but you can also trigger a manual refresh whenever with kubeadm certs renew. If you are running an old cluster PLEASE check the expiration dates of your client certificates with: kubeadm certs check-expiration now.
  • Do you have stateful deployments? Are they storing something, where are they storing it and how do you manage them? This would be databases, redis, message queues, applications that hold state. These are often the hardest to move or interact with during an upgrade. You can review the options for moving those here. The biggest thing is to set the pod disruption budget so that there is some minimum available during the upgrade process as shown here.
  • Are you upgrading etcd? Etcd supports restoring from snapshots that are taken from an etcd process of the major.minor version, so be aware if you are going to be jumping more than a patch. Restoring might not be an option.

Otherwise follow the steps above along with the official guide and you should be ok. The good news is once you bite the bullet and do it once up to a current version, maintenance is easier. The bad news is the initial EOL -> Supported path is soul-sucking and incredibly nerve-racking. I'm sorry.

I'm running a version older than 1.21 (January 2023)

So you need to do all the steps shown above to check that you can upgrade, but my guiding rule is if the version is more than 2 EOL versions ago, it's often easier to make a new cluster. You CAN still upgrade, but typically this means nodes have been running for a long time and are likely due for OS upgrades anyway. You'll likely have a more positive experience standing up a new cluster and slowly migrating over.

You'll start with fresh certificates, helm charts, node OS versions and everything else. Switching over at the load balancer level shouldn't be too bad and it can be a good opportunity to review permissions and access controls to ensure you are following the best procedures.

I hate that advice

I know. It's not my favorite thing to tell people. I'm sorry. I don't make the rules.

Note on Node OS choices

A common trend I will see in organizations is to select whatever Linux distro they use for VMs as their Node OS. Debian, Ubuntu, Rocky, etc. I don't recommend this. You shouldn't think of Nodes as VMs that you SSH into on a regular basis and do things in. They're just platforms to run k8s on. I've had a lot of success with Flatcar Linux here. Upgrading the nodes is as easy as rebooting, you can easily define things like SSH with a nice configuration system shown here.

With Node OS I would much rather get security updates more quickly and know that I have to reboot the node on a regular basis as opposed to keeping track of traditional package upgrades and the EOL for different linux distros, then track whether reboots are required. Often folks will combine Flatcar Linux with Rancher Kubernetes Engine for a super simple and reliable k8s standup process. You can see more about that here. This is a GREAT option if you are making a new cluster and want to make your life as easy as possible in the future. Check out those docs here.

If you are going to use a traditional OS, check out kured. This allows you to monitor the "reboot-required" at /var/run/reboot-required and schedule automatic cordon, draining, uncordon of the node. It also ensures only one node is touched at a time. This is something almost everyone forgets to do with kubernetes, which is maintain the Node.

Conclusion

I hope this was helpful. The process of keeping Kubernetes upgraded is less terrible the more often you do it, but the key things are to try and get as much time in your environment baking each minor release. If you stay on a regular schedule, the process of upgrading clusters is pretty painless and idiot-proof as long as you do some checking.

If you are reading this and think "I really want to run my own cluster but this seems like a giant nightmare" I strongly recommend checking out Rancher Kubernetes Engine with Flatcar Linux. It's tooling designed to be idiot-proof and can be easily run by a single operator or a pair. If you want to stick with kubeadm it is doable, but requires more work.

Stuck? Think I missed something obvious? Hit me up here: https://c.im/@matdevdug


Make a Mastodon Bot on AWS Free Tier

Make a Mastodon Bot on AWS Free Tier

With the recent exodus from Twitter due to Elon being a deranged sociopath, many folks have found themselves moving over to Mastodon. I won't go into Mastodon except to say I've moved over there as well (@[email protected]) and have really enjoyed myself. It's a super nice community and I have a lot of hope for the ActivityPub model.

However when I got on Mastodon I found a lot of abandoned bot accounts. These accounts, for folks who don't know, tend to do things like scrape RSS feeds and pump that information into Twitter so you can have everything in one pane of glass. Finding a derelict Ars Technica bot, I figured why not take this opportunity to make a bot of my own. While this would be very easy to do with SQLite, I wanted it to be an AWS Lambda so it wouldn't rely on some raspberry pi being functional (or me remembering that it was running on some instance and then accidentally terminating it because I love to delete servers).

Criteria for the project

  • Pretty idiot-proof
  • Runs entirely within the free tier of AWS
  • Set and forget

Step 1 - DynamoDB

I've never used dynamoDB before, so I figured this could be a fun challenge. I'm still not entirely sure I used it correctly. To be honest I ran into more problems than I was expecting given its reputation as an idiot-proof database.

You can see the simple table structure I made here.

Some things to keep in mind. Because of how DynamoDB stores numbers, the type of the number is Decimal, not int or float. This can cause some strange errors when attempting to store and retrieve ID values. You can read the conversation about it here. I ended up storing the ID as a string which is probably not optimal performance but did make the error go away.

When using DynamoDB, it is vital to not use scan. Query is what I ended up using for all my requests, since then I get to make lookups on my secondary tables with the key. The difference in speed during load testing when I generated a lot of fake URLs was pretty dramatic, 100s of milliseconds vs 10s of seconds.

Source

Now that I've spent some time playing around with DynamoDB, I do see the appeal. It is a surprisingly generous free tier. I've allocated 5 provisioned Read and Write units but honestly we need a tiny fraction of that.

Step 2 - Write the Lambda

You can see my Python lambda here.

NOTE: This is not production-grade python. This is hobby-level python. Were this a work project I would have changed some things about its design. Before you ping me about collision, I calculate that with that big a range of random IDs to pull from it would take ~6 thousands of years of work in order to have a 1% probability of at least one collision. So please, for the love of all things holy, don't ping me. Wanna use UUIDs? Go for it.

For those who haven't deployed to AWS Lambda before, it's pretty easy.

  • Make sure you have Python 3.9 installed (since AWS doesn't support 3.10)
  • Copy that snippet to a directory and call it lambda_function.py
  • Change the rss_feed = to be whatever feed you want to make a bot of.
  • run python3.9 -m venv venv
  • run source venv/bin/activate
  • Then you need to install the dependencies:
    - pip install --target ./package feedparser
    - pip install --target ./package Mastodon.py
    - pip install --target ./package python-dotenv
  • You'll want to cd into the package directory and then run zip -r ../my-deployment-package.zip . to bundle the dependencies together.
  • Finally take the actual python file you want to run and copy it into the zip directory. zip my-deployment-package.zip lambda_function.py

You can also use serverless or AWS SAM to do this all but I find the ZIP file is pretty idiot-proof. Then you just upload it through the AWS web interface, but hold off on doing that. Now that we have the Python environment setup we can generate the credentials.

Step 3 - Mastodon Credentials

Now we're back in the python virtual environment we made before in the directory.

  1. Run source venv/bin/activate
  2. Start the Python 3.9 REPL
  3. Run from mastodon import Mastodon
  4. Run: Mastodon.create_app('your-app-name', scopes=['read', 'write'], api_base_url="https://c.im") (note I'm using c.im but you can use any server you normally use)
  5. Follow the steps outlined here.
  6. You'll get back three values by the end. CLIENT_ID, CLIENT_SECRET from when you registered the bot with the server and then finally an ACCESS_TOKEN after you make an account for the bot and pass the email/password.

6. Copy these values to a .env file in the same directory as the lambda_function.py file from before.

CLIENT_ID=cff45dc4cdae1bd4342079c83155ce0a001a030739aa49ab45038cd2dd739ce
CLIENT_SECRET=d228d1b0571f880c0dc865522855a07a3f31f1dbd95ad81d34163e99fee
ACCESS_TOKEN=Ihisuhdiuhdsifh-OIJosdfgojsdu-RUhVgx6zCows
Example of the .env file along the lambda_function.py

7. Run: zip my-deployment-package.zip .env to copy the secret into the zip directory.

You can also store them as environmental variables in the Lambda but I prefer to manage them like this. Make sure it's not committed in your git repo.

Step 4 - Deploy

  1. Make a new AWS Lambda function with whatever name and ensure it has the ability to access our DynamoDB table. You can get instructions on how to do that here._
  2. Upload the ZIP by just uploading it through the web interface. It's 2 MB total so should be fine.
  3. Set up an EventBridge cron job to trigger the lambda by following the instructions here.
  4. Watch as your Lambda triggers on a regular interval.

Step 5 - Cleanup

  1. Inside of the Mastodon bot account there are a few things you'll want to check. First you want to make sure that the following two options are selected under "Profile"

2. You'll probably want to add an alert for failures under Cloudwatch Alarms. AWS has docs on how to do that here.

Conclusion

Hopefully this is a fun way of adding a simple bot to Mastodon. I've had a lot of fun interacting with the Mastodon.py library. You can see the bot I ended up making here.

If you run into problems please let me know: https://c.im/@matdevdug