Skip to content

Conversation

@osterman
Copy link
Member

Summary

  • New blog post explaining why managed node groups are recommended over Fargate for EKS add-ons
  • Covers bootstrap deadlock issues, high availability challenges, and cost/flexibility benefits
  • Written in Cloud Posse voice: technically grounded, lightly opinionated, and story-driven

Test plan

  • Review blog post content for technical accuracy
  • Verify MDX formatting and component usage
  • Check that the post renders correctly on the website
  • Confirm date format matches existing posts
  • Validate author attribution in frontmatter

🤖 Generated with Claude Code

…EKS Add-Ons

This post explains the practical challenges of running EKS add-ons on Fargate-only clusters and why a small managed node group provides better reliability, cost efficiency, and automation for production environments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Combine "The Terraform Catch-22" with "The Problem with No Nodes" to eliminate duplication
- Fix logical inconsistency: clarify co-location issue is with MNG, not Fargate
- Add acknowledgment that recommendation diverges from official AWS guidance
- Add citations to AWS EKS Best Practices, Karpenter docs, and Fargate configuration docs
- Add context about why Fargate was initially attractive
- Document additional Fargate architectural constraints
- Note evolution of Karpenter's own defaults to MNG
- Add "Your Mileage May Vary" section acknowledging teams that successfully use Fargate
- Clarify that frequently-rebuilt dev clusters are worse candidates for Fargate
- Strengthen conclusion to focus on operational requirements determining choice

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link

@vyrwu vyrwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying the recommendation - it makes a lot of sense!


<FeatureList>
- Use Graviton-based instances (c7g.medium) to cut costs nearly in half
- Mix On-Demand nodes for reliability and Spot nodes (via Karpenter) for efficiency
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely the MNGs should have on-demand nodes only for stability, or is running Spot on MNGs also part of the recommendation?

@osterman osterman requested a review from Copilot October 16, 2025 11:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new MDX blog post recommending EKS managed node groups over Fargate for running cluster add-ons, discussing bootstrap deadlocks, HA considerations, and cost/flexibility trade-offs.

  • Introduces rationale against Fargate-only setups for production.
  • Provides comparisons of operational characteristics and cost.
  • Documents scenarios where Fargate-only may still be acceptable.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Fix confusion about which component uses which instance type:
- Static MNG runs On-Demand instances for reliability of cluster-critical add-ons
- Karpenter provisions Spot instances for dynamic application workloads
- Update "Cost and Flexibility" section to clearly distinguish the two
- Update "Lessons Learned" section to specify instance types per component

This addresses the concern that mixing Spot instances in the static MNG would
undermine the reliability we're advocating for.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The previous wording "dynamic workloads" was ambiguous and could be misread
as including cluster add-ons. This explicitly states:
- MNG with On-Demand instances = cluster add-ons (stable foundation)
- Karpenter with Spot instances = application workloads only (cost savings)

This distinction is critical to the stability argument.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
EKS Auto Mode (announced December 2024) solves the bootstrap deadlock problem
by running Karpenter and other cluster components off-cluster as AWS-managed
services. This eliminates the chicken-and-egg dependency entirely.

Added balanced coverage noting:
- How Auto Mode sidesteps the bootstrap problem
- Trade-offs: 12-15% cost premium, CNI lock-in, less control
- When it makes sense vs when MNG + Karpenter approach is still relevant

This provides readers with awareness of all current options.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@osterman osterman merged commit 204c66e into master Oct 16, 2025
2 of 3 checks passed
@osterman osterman deleted the osterman/fargate-vs-mng branch October 16, 2025 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants