Terraform Cloud Review

If I were told to go off and make a hosted Terraform product, I would probably end up with a list of features that looked something like the following:

Extremely reliable state tracking
Assistance with upgrading between versions of Terraform and providers and letting users know when it looked safe to upgrade and when there might be problems between versions
Consistent running of Terraform with a fresh container image each time, providers and versions cached on the host VM so the experience is as fast as possible
As many linting, formatting and HCL optimizations I can offer, configurable on and off
Investing as much engineering work as I can afford in providing users an experience where, unlike with the free Terraform, if a plan succeeds on Terraform Cloud, the Apply will succeed
Assisting with Workspace creation. Since we want to keep the number of resources low, seeing if we can leverage machine learning to say "we think you should group these resources together as their own workspace" and showing you how to do that
Figure out some way for organizations to interact with the Terraform resources other than just running the Terraform CLI, so users can create richer experiences for their teams through easy automation that feeds back into the global source of truth that is my incredibly reliable state tracking
Try to do whatever I can to encourage more resources in my cloud. Unlimited storage, lots of workspaces, helping people set up workspaces. The more stuff in there the more valuable it is for the org to use (and also more logistically challenging for them to cancel)

This is me would be a product I would feel confident charging a lot of money for. Terraform Cloud is not that product. It has some of these features locked behind the most expensive tiers, but not enough of them to justify the price.

I've written about my feelings around the Terraform license change before. I won't bore you with that again. However since now the safest way to use Terraform is to pay Hashicorp, what does that look like? As someone who has used Terraform for years and Terraform Cloud almost daily for a year, it's a profoundly underwhelming experience.

Currently it is a little-loved product with lots of errors and sharp edges. This is as v0.1 of a version of this as I could imagine, except the pace of development has been glacial. Terraform Cloud is a "good enough" platform that seems to understand that if you could do better, you would. Like a diner at 2 AM on the side of the highway, it's primary selling point is the fact that it is there. That and the license terms you will need to accept soon.

Terraform Cloud - Basic Walkthrough

At a high level Terraform Cloud allows organizations to centralize their Projects and Workspaces and store that state with Hashicorp. It also gives you access to a Registry for you to host your own privacy Terraform modules and use them in your workspaces. The top level options look as follows:

You may be wondering "What does Usage do?" I have no idea, as the web UI has never worked for me even though I appear to have all the permissions one could have. I have seen the following since getting my account:

I'm not sure what access I lack or if the page was intended to work. It's very mysterious in that way.

There is Explorer, which lets you basically see "what versions of things do I use across the different repos". You can't do anything with that information, like I can't say "alright well upgrade these two to the version that everyone else uses". It's also a beta feature and not one that existed when I first started using the platform.

Finally there are the Workspaces, where you spend 99% of your time.

You get some ok stats here. Up in the top left you see "Needs Attention", "Errors", "Running", "Hold" and then "Applied." Even though you may have many Workspaces, you cannot change how many you see here. 20 is the correct number I guess.

Creating a Workspace

Workspaces are either based on a repo, CLI driven or you call the API. You tell it what VCS, what repo, if you want to use the root of the repo or a sub-directory (which is good because soon you'll have too many resources for one workspace for everything). You tell it Auto Apply (which is checked by default) or Manual and when to trigger a run (whenever a change, whenever specific files in a path change or whenever you push a tag). That's it.

You can see all the runs, what their status is and basically what resources have changed or will change. Any plan that you run from your laptop also show up here. Now you don't need to manage your runs here, you can still do local, but then there is absolutely no reason to use this product. Almost all of the features rely on your runs being handled by Hashicorp here inside of a Workspace.

Workspace flow

Workspaces show you when the run was, how long the plan took, what resources are associated with this (10 resources at a time even though you might have thousands. Details links you to the last run, there are tags and run triggers. Run triggers allow you to link workspaces together, so this workspace would dependent on the output of another workspace.

The settings are as follows:

Runs is pretty straight forward. States allow you to inspect the state changes directly. So you can see the full JSON of a resource and roll back to this specific state version. This can be nice for reviewing what specifically changed on each resource, but in my experience you don't get much over looking at the actual code. But if you are in a situation where something has suddenly broken and you need a fast way of saying "what was added and what was removed", this is where you would go.

NOTE: BE SUPER CAREFUL WITH THIS

The state inspector has the potential to show TONS of sensitive data. It's all the data in Terraform in the raw form. Just be aware it exists when you start using the service and take a look to ensure there isn't anything you didn't want there.

Variables are variables and the settings allow you to lock the workspace, apply Sentinel settings, set an SSH key for downloading private modules and finally if you want changes to the VCS to trigger an action here. So for instance, when you merge in a PR you can trigger Terraform Cloud to automatically apply this workspace. Nothing super new here compared to any CI/CD system, but still it is all baked in.

That's it!

No-Code Modules

One selling point I heard a lot about, but haven't actually seen anyone use. The idea is good though, where you write premade modules and push them to your private registry. Then members of your organization can just run them to do things like "stand up a template web application stack". Hashicorp has a tutorial here that I ran though and found it to work pretty much as expected. It isn't anywhere near the level of power that I would want, compared to something like Pulumi, but it is a nice step forward for automating truly constant tasks (like adding domain names to an internal domain or provisioning some SSL certificate for testing).

Dynamic Credentials

You can link Terraform Cloud and Vault, if you use it, so you no longer need to stick long-living credentials inside of the Workspace to access cloud providers. Instead you can leverage Vault to get short-lived credentials that improve the security of the Workspaces. I ran through this and did have problems getting it worked for GCP, but AWS seemed to work well. It requires some setup inside of the actual repository, but it's a nice security improvement vs leaving production credentials in this random web application and hoping you don't mess up the user scoping.

User scoping is controlled primarily through "projects", which basically trickle down to the user level. You make a project, which has workspaces, that have their own variables and then assign that to a team or business unit. That same logic is reflected inside of credentials.

Private Registry

This is one thing Hashicorp nailed. It's very easy to hook up Terraform Cloud to allow your workspaces to access internal modules backed by your private repositories. It supports the same documentation options as public modules, tracks downloads and allows for easy versioning control through git tags. I have nothing but good things to say about this entire thing.

Sharing between organizations is something they lock at the top tier, but this seems like a very niche usecase so I don't consider it to be too big of a problem. However if you are someone looking to produce a private provider or module for your customers to use, I would reach out to Hashicorp and see how they want you to do that.

The primary value for this is just to easily store all of your IaC logic in modules and then rely on the versioning inside of different environments to roll out changes. For instance, we do this for things like upgrading a system. Make the change, publish the new version to the private registry and then slowly roll it out. Then you can monitor the rollout through git grep pretty easily.

Pricing

$0.00014 per hour per resource. So a lot of money when you think "every IAM custom role, every DNS record, every SSL certificate, every single thing in your entire organization". You do get a lot of the nice features at this "standard" tier, but I'm kinda shocked they don't unlock all the enterprise features at this price point. No-code provisioning is only available at the higher levels, as well as Drift detection, Continuous validation (checks between runs to see if anything has changed ) as well as Ephemeral workspaces. The last one is a shame, because it looks like a great feature. Set up your workspace to self-destruct at regular intervals so you can nuke development environments. I'd love to use that but alas.

Problems

Oh the problems. So the runners sometimes get "stuck", which seems to usually happen after someone cancels a job in the web UI. You'll run into an issue, try to cancel a job, fix the problem and rerun the runner only to have it get stuck forever. I've sat there and watched it try to load the modules for 45 minutes. There isn't any way I have seen to tell Terraform Cloud "this runner is broken, go get me another one". Sometimes they get stuck for an unknown reason.

Since you need to make all the plans and applies remotely to get any value out of the service, it can also sometimes cause traffic jams in your org. If you work with Terraform a lot, you know you need to run plans pretty regularly. Since you need to wait for a runner every single time, you can end up wasting a lot of time sitting there waiting for another job to finish. Again I'm not sure what triggers you getting another runner. You can self host, but then I'm truly baffled at what value this tool brings.

Even if that was an option for you and you wanted to do it, its locked behind the highest subscription tier. So I can't even say "add a self-hosted runner just for plans" so I could unstick my team. This seems like an obvious add, along with a lot more runner controls so I could see what was happening and how to avoid getting it jammed up.

Conclusion

I feel bad this is so short, but there just isn't anything else to write. This is a super bare-bones tool that does what it says on the box for a lot of money. It doesn't give you a ton of value over Spacelift or or any of the others. I can't recommend it, it doesn't work particularly well and I haven't enjoyed my time with it. Managing it vs using an S3 bucket is an experience I would describe as "marginally better". It's nice that it handles contention across team mates for me, but so do all the others at a lower price.

I cannot think of a single reason to recommend this over Spacelift, which has better pricing, better tooling and seems to have a better runner system except for the license change. Which was clearly the point of the license change. However for those evaluating options, head elsewhere. This thing isn't worth the money.