Skip to content

Conversation

Luay-Sol
Copy link
Collaborator

What type of PR is this?

What this PR does / why we need it:

This pull request encompasses the migration from AWS IAM Roles for Service Accounts (IRSA) to the new IRSAv2, also known as Pod Identity for Amazon EKS. The migration involves the installation of the eks-pod-identity-agent addon and updating several critical components to utilize Pod Identity instead of the traditional IRSA module.

Changes

  • Installation of eks-pod-identity-agent Addon: Added the eks-pod-identity-agent to our EKS clusters to enable the use of Pod Identity, which provides a more secure and flexible mechanism for associating AWS IAM roles with Kubernetes service accounts.
  • Migration of Components to Use Pod Identity:
    • Cluster Autoscaler: Updated the Cluster Autoscaler to leverage Pod Identity for accessing AWS resources, enhancing security and scalability.
    • AWS Load Balancer Controller: Migrated the AWS Load Balancer Controller to use Pod Identity, ensuring secure and efficient load balancing operations.
    • EBS CSI Driver: Transitioned the EBS CSI Driver to utilize Pod Identity, improving security and efficiency in managing EBS volumes.
    • VPC CNI Plugin: Updated the VPC CNI Plugin to use Pod Identity, enhancing network security and performance.

Copy link
Contributor

@nagsubhrajitt nagsubhrajitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
annotations : try(var.use_irsa_v1 ? {
"eks.amazonaws.com/role-arn" : module.cluster_autoscaler_irsa_role[0].iam_role_arn
} : null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Luay-Sol I am not sure about the syntax but AFAIK the try function takes 2 arguments. I can see that you have made the first one but which one is the fallback expression? Am I missing anything here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not real, here's the definition of try:

try evaluates all of its argument expressions in turn and returns the result of the first one that does not produce any errors.

And below an example:

Screenshot 2024-07-18 at 9 16 10 PM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one expression with a result will work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. However, if there is only one expression (so there is no fallback), can we not just use the expression?

Copy link
Contributor

@nagsubhrajitt nagsubhrajitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@nagsubhrajitt nagsubhrajitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Luay-Sol While testing existing cluster migration to use pod identity, I found an issue. We need to address that while creating the cluster as well. Can you please take a look 🙏

source = "terraform-aws-modules/eks-pod-identity/aws"
version = "1.2.1"

name = "${var.cluster_name}-ca"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to also set use_name_prefix to false because by default it uses prefix to create all resources and the prefix is having length limit of 37. Hence, the deployment will fail for long datacenter name. I found the issue when I was testing the pod identity migration for existing cluster.

source = "terraform-aws-modules/eks-pod-identity/aws"
version = "1.2.1"

name = "${var.cluster_name}-lbc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before to use use_name_prefix

source = "terraform-aws-modules/eks-pod-identity/aws"
version = "1.2.1"

name = "${var.cluster_name}-ebs-csi"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before to use use_name_prefix

source = "terraform-aws-modules/eks-pod-identity/aws"
version = "1.2.1"

name = "${var.cluster_name}-vpc-cni"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before to use use_name_prefix

@@ -335,7 +260,7 @@ resource "aws_eks_addon" "csi-driver" {
resource "aws_eks_addon" "vpc-cni" {
cluster_name = aws_eks_cluster.cluster.name
addon_name = "vpc-cni"
service_account_role_arn = module.vpc_cni_irsa_role.iam_role_arn
service_account_role_arn = var.use_irsa_v1 ? module.vpc_cni_irsa_role.iam_role_arn : null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Luay-Sol I recommend to also add addon_version argument here, otherwise it will use default version for the EKS cluster version.

The pod identity does not work if the vpc-cni version is earlier than 1.15.5-eksbuild.1. If we don't set the addon_version explicitly here, the default version for 1.27 for example, is v1.15.1-eksbuild.1 which is earlier than the required version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagsubhrajitt do you think it is required here. Considering 1.27 it is going to be out of regular support in couple day from now (July 24).

Another consideration is this project is supposed to be used for cluster creation of latest version of k8s/eks. So I don't see a situation or a reason for supporting older versions.

Reason I am on the fence about add-ons version hard coding: it is going to be challenging for us (as maintainers) for this project as well as for the users.

}

################################################################################
# IRSA v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header should say v2

@@ -70,3 +70,15 @@ variable "use_random_suffix_in_node_group_name" {
type = bool
default = true
}

variable "use_irsa_v1" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better as an 'enum' instead of two booleans?

@lumberbaron lumberbaron merged commit 8630b95 into main Jul 26, 2024
3 checks passed
@lumberbaron lumberbaron deleted the pod-identity branch July 26, 2024 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants