GKE Autopilot Sucks
I am a big fan of GCP, and was quite excited to shift to GKE Autopilot. While I am pretty familiar with creating my own clusters with Kubespray
or using off-the-shelf GCP with Managed Node Pools, I wanted Autopilot, because the workload that I was dealing with had a fluctation in deployments and demands, and I wanted the costing to reflect as per that, not the Peak Node Pool configuration.
Yes I know that in Node Pools we can have Surge Values, but GKE here also tends to over-allocate a lot more than needed, so I wanted to be as atomic as I could.
However one big caveat is that using GKE Autopilot means that we also must use GCP Logging & Monitoring as well. In fact, you may turn off your own logs, but the System Logs, are forcibly enabled. This seemed innocent at first, but then after a couple of GKE releases, quickly became an issue.
This happened over a new year weekend, which was quite unfortunate, as it went unnoticed. By the time action was taken, we had been billed 400$+ extra than what should have been the bill.
After this fiasco, I shifted our cluster back to the original Standard Pool, with big nodes. Adjusting for over-provisioning, it was still more cost-effective than this. It is sad that this might have been more environment friendly choice, but how the pricing and services are tied together, makes this a worse experience. My takeaway personally from this is the reinforcement of belief to stay away from managed kubernetes as it is not really required in most use cases.