It's weird that one of the reasons that you endorse AWS is that you had regular meetings with your account manager but then you regret premium support which is the whole reason you had regular meetings with your account manager.
unsnap_biceps 1 hours ago [-]
If you spend enough (or they think you'll spend enough), you'll get an account manager without the premium support contract, especially early in the onboarding
necubi 24 minutes ago [-]
Or if you’re a newish startup who they hope will eventually spend enough to justify it.
dangus 28 minutes ago [-]
As a counterpoint, I find our AWS super team to be a mix of 40% helpful, 40% “things we say are going over their head,” 20% attempting to upsell and expand our dependence. It’s nice that we have humans but I don’t think it’s a reason to choose it or not.
GCP’s architecture seems clearly better to me especially if you are looking to be global.
Every organization I’ve ever witnessed eventually ends up with some kind of struggle with AWS’ insane organizations and accounts nightmare.
GCP’s use of folders makes way more sense.
GCP having global VPCs is also potentially a huge benefit if you want your users to hit servers that are physically close to them. On AWS you have to architect your own solution with global accelerator which becomes even more insane if you need to cross accounts, which you’ll probably have to do eventually because of the aforementioned insanity of AWS account/organization best practices.
calmbonsai 3 hours ago [-]
This is the best post to HN in quite some time. Kudos to the detailed and structured break-down.
If the author had a Ko-Fi they would've just earned $50 USD from me.
I've been thinking of making the leap away from JIRA and I concur on RDS, Terraform for IAC, and FaaS whenever possible. Google support is non-existent and I only recommend GC for pure compute. I hear good things about Big Table, but I've never used in in production.
I disagree on Slack usage aside from the postmortem automation. Slack is just gonna' be messy no matter what policies are put in place.
xyzzy_plugh 36 minutes ago [-]
Anecdotally I've actually had pretty good interactions with GCP including fast turn arounds on bugs that couldn't possibly affect many other customers.
unethical_ban 2 hours ago [-]
What do you use if not slack? OPs advice is standard best practice. Respect peoples time by not expecting immediate response, and use team or function based channels as much as possible.
Other options are email of course, and what, teams for instant messages?
jasonpeacock 1 hours ago [-]
I’ve always that forums are much better suited to corporate communications than email or chat.
Organized by topics, must be threaded, and default to asynchronous communications. You can still opt in to notifications, and history is well organized and preserved.
jasonpeacock 1 hours ago [-]
The bullet points for using Slack basically describe email (and distribution lists).
It’s funny how we get an instant messaging platform and derive best practices that try to emulate a previous technology.
Btw, email is pretty instant.
unethical_ban 29 minutes ago [-]
I get it, email accomplishes a lot. But it "feels" like a place these days for one-off group chats, especially for people from different organizations. Realtime chat has its places and can also step in to that email role within a team. All my opinion, none too strongly held.
kstrauser 3 hours ago [-]
> Picking Terraform over Cloudformation: Endorse
I, too, prefer McDonald's cheeseburgers to ground glass mixed with rusty nails. It's not so much that I love Terraform (spelled OpenTofu) as that it's far and away the least bad tool I've used in the space.
orwin 35 minutes ago [-]
Terraform/openTofu is more than OK. The fact that you can use to to configure your Cisco products as well as AWS is honestly great for us. It's also a bit like ansible: if you don't manage it carefully and try to separate as much as possible early, it starts bloating, so you have to curate early.
Terragrunt is the only sane way to deploy terraform/openTofu in a professional environment though.
nine_k 45 minutes ago [-]
Any opinion on Pulumi?
gouggoug 21 minutes ago [-]
Not an opinion on Pulumi specifically, but an opinion on using imperative programming languages for infrastructure configuration: don't do it. (This includes using things like CDKTF)
Infrastructure needs to be consistent, intuitive and reproducible. Imperative languages are too unconstrained. Particularly, they allow you to write code whose output is unpredictable (for example, it'd be easy to write code that creates a resources based on the current time of day...).
With infrastructure, you want predictability and reproducibility. You want to focus more on writing _what_ your infra should look like, less _how_ to get there.
easterncalculus 15 minutes ago [-]
You can honestly do a lot of what people do with Terraform now just using Docker and Ansible. I'm surprised more people don't try to. Most clouds are supported, even private clouds and stuff like MAAS.
walt_grata 49 minutes ago [-]
I've been very happy using cdk for interacting with aws. Much better than terraform and the like.
Grimburger 1 hours ago [-]
> There are no great FaaS options for running GPU workloads
Knative on k8s works well for us, there's some oddities about it but in general does the job
kaycey2022 1 hours ago [-]
Feels like a minor glimpse into what's involved in running tech companies these days. Sure this list could be much simpler, but then so would the scope of the company's offerings. So AI would offer enough accountability to replace all of this? Agents juggling million token contexts? It's kind of hard to wrap my head around.
nine_k 47 minutes ago [-]
Agents run tools, too. You can make an LLM count by the means of language processing, but it's much more efficient to let it run a Python script.
By the same token, it's more efficient to let an LLM operate all these tools (and more) than to force an LLM to keep all of that on its "mind", that is, context.
robszumski 3 hours ago [-]
Thanks for sharing, really helpful to see your thinking. I haven't fully embraced FaaS myself but never regretted it either.
Curious to hear more about Renovate vs Dependabot. Is it complicated to debug _why_ it's making a choice to upgrade from A to B? Working on a tool to do app-specific breaking change analysis so winning trust and being transparent about what is happening is top of mind.
When were you using quay.io? In the pre-CoreOS years, CoreOS years (2014-2018), or the Red Hat years?
mwcampbell 2 hours ago [-]
I disagree on Kubernetes versus ECS. For me, the reasons to use ECS are not having to pay for a control plane, and not having to keep up with the Kubernetes upgrade treadmill.
jrjeksjd8d 3 hours ago [-]
I see you regret Datadog but there's no alternative - did you end up homebrewing metrics, or are you just living with their insane pricing model? In my experience they suck but not enough to leave.
stackskipton 2 hours ago [-]
Not author but Prometheus is perfectly acceptable alternative if you don't want to go whole Otel route.
lelandbatey 22 minutes ago [-]
Currently going through leaving DD at work. Many potential options, many companies trying to break in. The one that calls to me spiritually is: throw it all in Clickhouse (hosted Clickhouse is shockingly cheap) with a hosted HyperDX (logs and metrics UI) instance in front of it. HyperDX has its issues, but it's shocking how cheap it is to toss a couple hundred TB of logs/metrics into Clickhouse per month (compared to the kings ransom DD charges). And you can just query the raw rows, which really comes in handy for understanding some in-the-weeds metrics questions.
ink_13 36 minutes ago [-]
(2024)
Just FYI article is two years old
weedhopper 3 hours ago [-]
Great post. I even wouldn’t mind more details, especially about datadog, or as others pointed out, the kind of contradiction with aws support.
0xbadcafebee 2 hours ago [-]
Using GCP gives me the same feeling as vibe-coded source code. Technically works but deeply unsettling. Unless GCP is somehow saving you boatloads of cash, AWS is much better.
RDS is a very quick way to expand your bill, followed by EC2, followed by S3. RDS for production is great, but you should avoid the bizarre HN trope of "Postgres for everything" with RDS. It makes your database unnecessarily larger which expands your bill. Use it strategically and your cost will remain low while also being very stable and easy to manage. You may still end up DIYing backups. Aurora Serverless v2 is another useful way to reduce bill. If you want to do custom fancy SQL/host/volume things, RDS Custom may enable it.
I'm starting to think Elasticache is a code smell. I see teams adopt it when they literally don't know why they're using it. Similar to the "Postgres for everything" people, they're often wasteful, causing extra cost and introducing more complexity for no benefit. If you decide to use Elasticache, Valkey Serverless is the cheapest option.
Always use ECR in AWS. Even if you have some enterprise artifact manager with container support... run your prod container pulls with ECR. Do not enable container scanning, it just increases your bill, nobody ever looks at the scan results.
I no longer endorse using GitHub Actions except for non-business-critical stuff. I was bullish early on with their Actions ecosystem, but the whole thing is a mess now, from the UX to the docs to the features and stability. I use it for my OSS projects but that's it. Most managed CI/CD sucks. Use Drone.io for free if you're small, use WoodpeckerCI otherwise.
Buying an IP block is a complicated and fraught thing (it may not seem like it, but eventually it is). Buy reserved IPs from AWS, keep them as long as you want, you never have to deal with strange outages from an RIR not getting the correct contact updated in the correct amount of time or some foolishness.
He mentions K8s, and it really is useful, but as a staging and dev environment. For production you run into the risk of insane complexity exploding, and the constant death march of upgrades and compatibility issues from the 12 month EOL; I would not recommend even managed K8s for prod. But for staging/dev, it's fantastic. Give your devs their own namespace (or virtual cluster, ideally) and they can go hog wild deploying infrastructure and testing apps in a protected private environment. You can spin up and down things much easier than typical AWS infra (no need for terraform, just use Helm) with less risk, and with horizontal autoscaling that means it's easier to save money. Compare to the difficulty of least-privilege in AWS IAM to allow experiments; you're constantly risking blowing up real infra.
Helm is a perfectly acceptable way to quickly install K8s components, big libraries of apps out there on https://artifacthub.io/. A big advantage is its atomic rollouts which makes simple deploy/rollback a breeze. But ExternalSecrets is one of the most over-complicated annoying garbage projects I've ever dealt with. It's useful, but I will fight hard to avoid it in future. There are multiple ways to use it with arcane syntax, yet it actually lacks some useful functionality. I spent way too much time trying to get it to do some basic things, and troubleshooting it is difficult. Beware.
I don't see a lot of architectural advice, which is strange. You should start your startup out using all the AWS well-architected framework that could possibly apply to your current startup. That means things like 1) multiple AWS accounts (the more the better) with a management account & security account, 2) identity center SSO, no IAM users for humans, 3) reserved CIDRs for VPCs, 4) transit gateway between accounts, 5) hard-split between stage & prod, 6) openvpn or wireguard proxy on each VPC to get into private networks, 7) tagging and naming standards and everything you build gets the tags, 8) put in management account policies and cloudtrail to enforce limitations on all the accounts, to do things like add default protections and auditing. If you're thinking "well my startup doesn't need that" - only if your startup dies will you not need it, and it will be an absolute nightmare to do it later (ever changed the wheels on a moving bus before?). And if you plan on working for more than one startup in your life, doing it once early on means it's easier the second time. Finally if you think "well that will take too long!", we have AI now, just ask it to do the thing and it'll do it for you.
zbentley 10 minutes ago [-]
> Do not enable container scanning, it just increases your bill, nobody ever looks at the scan results.
God I wish that were true. Unfortunately, ECR scanning is often cheaper and easier to start consuming than buying $giant_enterprise_scanner_du_jour, and plenty of people consider free/OSS scanners insufficient.
Stupid self inflicted problems to be sure, but far from “nobody uses ECR scanning”.
zem 3 hours ago [-]
I would love to read more about the pros and cons of using a single database, if anyone has pointers to articles
Everything in article is excellent point but other big point is schema changes become extremely difficult because you have unknown applications possibly relying on that schema.
It's also at certain point, the database becomes absolutely massive and you will need teams of DBAs care and feeding it.
Rendered at 03:30:44 GMT+0000 (Coordinated Universal Time) with Vercel.
GCP’s architecture seems clearly better to me especially if you are looking to be global.
Every organization I’ve ever witnessed eventually ends up with some kind of struggle with AWS’ insane organizations and accounts nightmare.
GCP’s use of folders makes way more sense.
GCP having global VPCs is also potentially a huge benefit if you want your users to hit servers that are physically close to them. On AWS you have to architect your own solution with global accelerator which becomes even more insane if you need to cross accounts, which you’ll probably have to do eventually because of the aforementioned insanity of AWS account/organization best practices.
If the author had a Ko-Fi they would've just earned $50 USD from me.
I've been thinking of making the leap away from JIRA and I concur on RDS, Terraform for IAC, and FaaS whenever possible. Google support is non-existent and I only recommend GC for pure compute. I hear good things about Big Table, but I've never used in in production.
I disagree on Slack usage aside from the postmortem automation. Slack is just gonna' be messy no matter what policies are put in place.
Other options are email of course, and what, teams for instant messages?
Organized by topics, must be threaded, and default to asynchronous communications. You can still opt in to notifications, and history is well organized and preserved.
It’s funny how we get an instant messaging platform and derive best practices that try to emulate a previous technology.
Btw, email is pretty instant.
I, too, prefer McDonald's cheeseburgers to ground glass mixed with rusty nails. It's not so much that I love Terraform (spelled OpenTofu) as that it's far and away the least bad tool I've used in the space.
Terragrunt is the only sane way to deploy terraform/openTofu in a professional environment though.
Infrastructure needs to be consistent, intuitive and reproducible. Imperative languages are too unconstrained. Particularly, they allow you to write code whose output is unpredictable (for example, it'd be easy to write code that creates a resources based on the current time of day...).
With infrastructure, you want predictability and reproducibility. You want to focus more on writing _what_ your infra should look like, less _how_ to get there.
Knative on k8s works well for us, there's some oddities about it but in general does the job
By the same token, it's more efficient to let an LLM operate all these tools (and more) than to force an LLM to keep all of that on its "mind", that is, context.
Curious to hear more about Renovate vs Dependabot. Is it complicated to debug _why_ it's making a choice to upgrade from A to B? Working on a tool to do app-specific breaking change analysis so winning trust and being transparent about what is happening is top of mind.
When were you using quay.io? In the pre-CoreOS years, CoreOS years (2014-2018), or the Red Hat years?
Just FYI article is two years old
RDS is a very quick way to expand your bill, followed by EC2, followed by S3. RDS for production is great, but you should avoid the bizarre HN trope of "Postgres for everything" with RDS. It makes your database unnecessarily larger which expands your bill. Use it strategically and your cost will remain low while also being very stable and easy to manage. You may still end up DIYing backups. Aurora Serverless v2 is another useful way to reduce bill. If you want to do custom fancy SQL/host/volume things, RDS Custom may enable it.
I'm starting to think Elasticache is a code smell. I see teams adopt it when they literally don't know why they're using it. Similar to the "Postgres for everything" people, they're often wasteful, causing extra cost and introducing more complexity for no benefit. If you decide to use Elasticache, Valkey Serverless is the cheapest option.
Always use ECR in AWS. Even if you have some enterprise artifact manager with container support... run your prod container pulls with ECR. Do not enable container scanning, it just increases your bill, nobody ever looks at the scan results.
I no longer endorse using GitHub Actions except for non-business-critical stuff. I was bullish early on with their Actions ecosystem, but the whole thing is a mess now, from the UX to the docs to the features and stability. I use it for my OSS projects but that's it. Most managed CI/CD sucks. Use Drone.io for free if you're small, use WoodpeckerCI otherwise.
Buying an IP block is a complicated and fraught thing (it may not seem like it, but eventually it is). Buy reserved IPs from AWS, keep them as long as you want, you never have to deal with strange outages from an RIR not getting the correct contact updated in the correct amount of time or some foolishness.
He mentions K8s, and it really is useful, but as a staging and dev environment. For production you run into the risk of insane complexity exploding, and the constant death march of upgrades and compatibility issues from the 12 month EOL; I would not recommend even managed K8s for prod. But for staging/dev, it's fantastic. Give your devs their own namespace (or virtual cluster, ideally) and they can go hog wild deploying infrastructure and testing apps in a protected private environment. You can spin up and down things much easier than typical AWS infra (no need for terraform, just use Helm) with less risk, and with horizontal autoscaling that means it's easier to save money. Compare to the difficulty of least-privilege in AWS IAM to allow experiments; you're constantly risking blowing up real infra.
Helm is a perfectly acceptable way to quickly install K8s components, big libraries of apps out there on https://artifacthub.io/. A big advantage is its atomic rollouts which makes simple deploy/rollback a breeze. But ExternalSecrets is one of the most over-complicated annoying garbage projects I've ever dealt with. It's useful, but I will fight hard to avoid it in future. There are multiple ways to use it with arcane syntax, yet it actually lacks some useful functionality. I spent way too much time trying to get it to do some basic things, and troubleshooting it is difficult. Beware.
I don't see a lot of architectural advice, which is strange. You should start your startup out using all the AWS well-architected framework that could possibly apply to your current startup. That means things like 1) multiple AWS accounts (the more the better) with a management account & security account, 2) identity center SSO, no IAM users for humans, 3) reserved CIDRs for VPCs, 4) transit gateway between accounts, 5) hard-split between stage & prod, 6) openvpn or wireguard proxy on each VPC to get into private networks, 7) tagging and naming standards and everything you build gets the tags, 8) put in management account policies and cloudtrail to enforce limitations on all the accounts, to do things like add default protections and auditing. If you're thinking "well my startup doesn't need that" - only if your startup dies will you not need it, and it will be an absolute nightmare to do it later (ever changed the wheels on a moving bus before?). And if you plan on working for more than one startup in your life, doing it once early on means it's easier the second time. Finally if you think "well that will take too long!", we have AI now, just ask it to do the thing and it'll do it for you.
God I wish that were true. Unfortunately, ECR scanning is often cheaper and easier to start consuming than buying $giant_enterprise_scanner_du_jour, and plenty of people consider free/OSS scanners insufficient.
Stupid self inflicted problems to be sure, but far from “nobody uses ECR scanning”.
https://www.enterpriseintegrationpatterns.com/patterns/messa...
Everything in article is excellent point but other big point is schema changes become extremely difficult because you have unknown applications possibly relying on that schema.
It's also at certain point, the database becomes absolutely massive and you will need teams of DBAs care and feeding it.