> For the complete documentation index, see [llms.txt](https://docs.tuannvm.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.tuannvm.com/blog/reading/2025-10-03-we-broke-our-eks-cluster-autoscaler-during-amazon-al2023-migration-and-fixed-it.md).

# 2025-10-03 We Broke Our EKS Cluster Autoscaler During Amazon AL2023 Migration (and Fixed It)

**We Broke Our EKS Cluster Autoscaler During Amazon AL2023 Migration (and Fixed It)— Here’s What We Learned**\
<https://medium.com/@dilshanwijesooriya/we-broke-our-eks-cluster-autoscaler-during-amazon-al2023-migration-and-fixed-it-heres-what-we-learned-xxxxxxxx>

***

The article details the unexpected failure of an **Amazon EKS Cluster Autoscaler** during a migration from **Amazon Linux 2 (AL2)** to **Amazon Linux 2023 (AL2023)**.

**Key Points:**

* **Reason for Migration:**
  * AL2023 offers better performance, enhanced security, and longer support.
  * Amazon will deprecate AL2 AMIs for EKS after November 26, 2025.
* **Problem Encountered:**
  * Cluster Autoscaler broke after switching to AL2023.
  * Symptoms included:
    * Datadog agents stopped sending metrics
    * Service crashes and failing health checks
    * Autoscaler logs showing `Unauthorized` errors
  * Root Cause:
    * AL2023 **disables pod access to EC2 instance metadata by default**.
    * Any pod relying on the node’s IAM role for AWS API calls (like the autoscaler) loses permissions.
* **Solution Implemented:**
  1. **Switch to IRSA (IAM Roles for Service Accounts)** for the autoscaler.
  2. **Add Kubernetes RBAC** for necessary resource access.
  3. **Disable default service account** in the Helm deployment.
  4. **Remove IAM permissions** from the nodegroup, relying solely on IRSA + RBAC.
* **Lessons Learned:**
  * AL2023 requires stricter access handling; node IAM role shortcuts no longer work.
  * IRSA + RBAC is essential for production stability.
  * Test autoscaling and draining behavior in staging before production migration.
  * Consider **EKS Pod Identity** for future migrations, though IRSA is currently reliable.

***

#### Migration and Fix Process (Mermaid Diagram)

{% @mermaid/diagram content="sequenceDiagram
participant Dev as Developer
participant EKS as EKS Cluster
participant Autoscaler as Cluster Autoscaler
participant AWS as AWS APIs

```
Dev->>EKS: Migrate nodegroup to AL2023
EKS->>Autoscaler: Deploy on new AL2023 nodes
Autoscaler->>AWS: Request node/ASG info using node IAM role
AWS-->>Autoscaler: Unauthorized

Note over Autoscaler: Autoscaler fails to scale<br>Pods crash, alerts trigger

Dev->>EKS: Configure IRSA for autoscaler
Dev->>EKS: Apply RBAC for Kubernetes access
Autoscaler->>AWS: Assume IAM role via IRSA
AWS-->>Autoscaler: Authorized

Note over Autoscaler: Autoscaler resumes scaling<br>Cluster stabilizes" %}
```

**Conclusion:**\
Migrating to AL2023 improves security but breaks legacy IAM patterns. Using **IRSA + RBAC** or **EKS Pod Identity** is mandatory for cluster components like the autoscaler to function correctly.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.tuannvm.com/blog/reading/2025-10-03-we-broke-our-eks-cluster-autoscaler-during-amazon-al2023-migration-and-fixed-it.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
