-
Notifications
You must be signed in to change notification settings - Fork 21
feat(weave): Add CSI driver support for internal service authentication #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements Secrets Store CSI Driver integration for AWS to enable secure, dynamic secret mounting from AWS Secrets Manager into Kubernetes. Changes: - Install Secrets Store CSI Driver (v1.4.7) and AWS provider (v0.3.9) via Helm in kube-system namespace - Create weave_worker IAM role with IRSA for service account authentication - Configure IAM policies for read-only access to secrets - Generate random weave worker auth token and store in AWS Secrets Manager - Add Helm chart configuration for weave-trace-worker with CSI driver support - Remove direct kubernetes_secret resource (now synced via CSI driver) The CSI driver automatically syncs secrets from AWS Secrets Manager to Kubernetes, enabling secret rotation and eliminating the need to manage secrets in both places. SecretProviderClass resources are created by the operator-wandb Helm chart. This provides the infrastructure for internal service authentication where services authenticate via Kubernetes secrets that are dynamically mounted from cloud secrets managers.
WalkthroughIntroduces AWS Secrets Store CSI Driver and AWS provider via Helm, adds Weave worker IAM role/policies and attachments for Secrets Manager access, migrates Weave auth secret handling to AWS Secrets Manager and Secrets Store, creates a shared S3 bucket for bufstream, and exposes related outputs and variables. Changes
Sequence Diagram(s)sequenceDiagram
participant Pod as Kubernetes Pod
participant CSI as Secrets Store CSI Driver
participant AWS as AWS Secrets Manager
participant IAM as IAM (weave_worker role)
Pod->>CSI: Mount SecretProviderClass (requests secret)
CSI->>IAM: Assume role annotation (weave_worker)
IAM->>AWS: Call GetSecretValue / DescribeSecret
AWS-->>IAM: Return secret value
IAM-->>CSI: Provide secret
CSI-->>Pod: Mount secret (optionally sync to k8s secret)
Note right of CSI: Rotation poll interval configured (120s)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modules/app_eks/main.tf (1)
241-264: Remove the redundant policy attachment to the node role. Theweave_worker_auth_secret_readerpolicy is unnecessary and violates IRSA principles.The codebase already contains a dedicated
weave_workerIAM role (iam-roles.tf:32) with a properly scoped policyweave_worker_secrets_access(iam-policy-docs.tf:131) attached via IRSA. This policy already grants access to all secrets matching the namespace pattern, which includes theweave_worker_authsecret.The policy in main.tf:241-264 duplicates this permission at a wider scope but is incorrectly attached to the node role instead of the weave role. Attaching application-level secret access to the general node role violates least-privilege principles.
Delete the
aws_iam_policy"weave_worker_auth_secret_reader" (lines 241-258) and the corresponding policy attachment (lines 261-264) from main.tf. The existing IRSA configuration already handles the required permissions.
🧹 Nitpick comments (1)
modules/app_eks/secrets_store/variables.tf (1)
1-9: Add default values to match parent module and clarify version specifications.These variables lack default values, making the submodule non-functional without explicit caller input. Since the parent module (modules/app_eks/variables.tf) defines defaults (1.4.7 and 0.3.9), these submodule variables should either:
- Inherit defaults from the parent module, or
- Have their own defaults to support independent usage.
Additionally, consider documenting which Helm chart versions are compatible with your EKS cluster version.
Apply defaults matching the parent module:
variable "secrets_store_csi_driver_version" { type = string description = "The version of the Secrets Store CSI Driver Helm chart to install." + default = "1.4.7" } variable "secrets_store_csi_driver_provider_aws_version" { type = string description = "The version of the AWS Secrets Manager Provider for Secrets Store CSI Driver Helm chart to install." + default = "0.3.9" }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
main.tf(1 hunks)modules/app_eks/iam-policies.tf(1 hunks)modules/app_eks/iam-policy-docs.tf(1 hunks)modules/app_eks/iam-role-attachments.tf(1 hunks)modules/app_eks/iam-roles.tf(1 hunks)modules/app_eks/main.tf(2 hunks)modules/app_eks/outputs.tf(1 hunks)modules/app_eks/secrets_store/secrets_store.tf(1 hunks)modules/app_eks/secrets_store/variables.tf(1 hunks)modules/app_eks/variables.tf(1 hunks)
🔇 Additional comments (11)
modules/app_eks/iam-roles.tf (1)
31-53: Verify namespace variable usage:var.k8s_namespacevs.var.namespaceThe weave_worker role condition at line 46 uses
var.k8s_namespace, while the existing irsa role at line 22 usesvar.namespace. This difference could cause service account authentication to fail if weave-trace-worker is deployed to an unexpected namespace.Verify the deployment target and ensure the role condition matches where weave-trace-worker's service account will be created. Can you confirm:
- Is weave-trace-worker intentionally deployed to the
k8s_namespace(default: "default") rather thannamespace?- Should both roles use the same namespace variable?
modules/app_eks/iam-role-attachments.tf (1)
63-67: LGTM – Policy attachment is correctly structured.The use of
aws_iam_role_policy_attachmentfollows best practices and correctly references the weave_worker policy defined in iam-policies.tf.modules/app_eks/variables.tf (1)
258-268: Verify CSI driver and AWS provider versions are current and compatible.The defaults (1.4.7 for Secrets Store CSI Driver and 0.3.9 for AWS provider) should be validated against your minimum supported EKS cluster version and against upstream security advisories.
Please verify:
- These versions are compatible with your supported EKS cluster versions (currently defined by
var.cluster_version)- There are no published security advisories for these Helm chart versions
You can check the upstream repositories:
- Secrets Store CSI Driver: https://github.com/kubernetes-sigs/secrets-store-csi-driver
- AWS Secrets Manager Provider: https://github.com/aws/secrets-store-csi-driver-provider-aws
main.tf (2)
430-439: LGTM – weave-trace-worker IRSA configuration is correctly structured.The serviceAccount annotation properly references the weave_worker IAM role ARN, and secretsStore is enabled for CSI integration.
441-452: The review comment is based on incorrect findings and should be dismissed.All resources claimed to be missing actually exist in the codebase:
random_password.weave_worker_authis defined at modules/app_eks/main.tf:220-223 (generates a 32-character token)aws_secretsmanager_secret.weave_worker_authis defined at modules/app_eks/main.tf:225-233aws_secretsmanager_secret_version.weave_worker_authis defined at modules/app_eks/main.tf:235-238 and correctly stores the random tokenweave_worker_auth_secret_nameoutput is defined at modules/app_eks/outputs.tf:25-28module.secrets_storeis declared at modules/app_eks/main.tf:208-217The secretsStore configuration at lines 441-452 in main.tf correctly references these existing resources. No fixes are required.
Likely an incorrect or invalid review comment.
modules/app_eks/iam-policies.tf (1)
67-72: LGTM – weave_worker IAM policy is correctly structured.The policy is properly defined with clear naming and references the policy document from iam-policy-docs.tf.
modules/app_eks/iam-policy-docs.tf (1)
131-140: LGTM – Policy document follows least-privilege principle with read-only access.The policy correctly restricts the weave-worker service account to only
GetSecretValueandDescribeSecretactions on namespace-prefixed secrets, which aligns with the intended use case of reading secrets via the CSI driver.modules/app_eks/main.tf (2)
267-270: Clear architectural documentation for CSI-driver-based secret management.The comment clearly explains the shift from Terraform-managed Kubernetes secrets to CSI Driver-based secret synchronization, delegating SecretProviderClass creation to the operator-wandb Helm chart. This decoupling avoids CRD timing issues and cleanly separates infrastructure (Terraform) from application-level configuration (Helm).
208-217: Code changes are correct; variables properly declared and configured.Both variables are correctly declared in the parent module with sensible defaults (versions 1.4.7 and 0.3.9) and descriptions, and the submodule properly receives them. The module instantiation correctly passes the variables via
var.secrets_store_csi_driver_versionandvar.secrets_store_csi_driver_provider_aws_version. No changes needed.modules/app_eks/secrets_store/secrets_store.tf (2)
2-23: CSI driver Helm configuration is solid.Chart from official kubernetes-sigs repository, configuration enables both secret synchronization and rotation with a reasonable 120-second poll interval. Version is correctly parameterized for flexibility.
26-36: AWS provider Helm release is correctly sequenced and configured.Chart is from official AWS repository, version is parameterized, and the explicit
depends_onensures the base CSI driver installs before the AWS provider. This dependency ordering is critical and correctly implemented.
| output "weave_worker_auth_secret_name" { | ||
| value = aws_secretsmanager_secret.weave_worker_auth.name | ||
| description = "Name of the AWS Secrets Manager secret containing the weave worker auth token (used by SecretProviderClass)" | ||
| } | ||
|
|
||
| output "weave_worker_iam_role_arn" { | ||
| value = aws_iam_role.weave_worker.arn | ||
| description = "ARN of the IAM role for weave worker service accounts to access AWS Secrets Manager via CSI driver" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRITICAL: Output depends on undefined aws_secretsmanager_secret.weave_worker_auth resource.
The weave_worker_auth_secret_name output correctly exposes the secret name for CSI driver integration, but it references a resource that doesn't exist in the provided code (aws_secretsmanager_secret.weave_worker_auth). This will cause Terraform to fail during planning/apply.
See the related comment in main.tf for the critical fix required to define this secret resource.
The weave_worker_iam_role_arn output is correctly implemented and properly exposes the role ARN for IRSA configuration.
🤖 Prompt for AI Agents
In modules/app_eks/outputs.tf around lines 25-33, the output
weave_worker_auth_secret_name references a non-existent resource
aws_secretsmanager_secret.weave_worker_auth; either create that Secrets Manager
resource or change the output to point to the correct existing secret. Fix by
adding a aws_secretsmanager_secret "weave_worker_auth" (and corresponding
aws_secretsmanager_secret_version if needed) in the module/main.tf with an
appropriate name and secret_string, or update the output to reference the actual
resource/data source name used in the codebase (e.g.,
aws_secretsmanager_secret.<existing_name>.name or
data.aws_secretsmanager_secret.<existing_name>.name). Ensure the resource is
created in the same module or exported into this module so the output has a
valid reference.
This reverts commit ac3d04a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modules/app_eks/main.tf (1)
225-227: Increase recovery window for critical auth secret from immediate deletion to recoverable window.The code correctly uses
recovery_window_in_days = 0to request immediate deletion via AWS Secrets Manager, but this is an operational risk. For a critical Weave worker authentication token (used by the SecI Driver), accidentalterraform destroyor state corruption would result in permanent, irreversible secret loss. Change to a non-zero recovery window (7–30 days), such asrecovery_window_in_days = 7, to allow emergency recovery.
🧹 Nitpick comments (1)
modules/app_eks/main.tf (1)
241-243: Add tags to the IAM policy for consistency.Other resources in this file (e.g.,
aws_secretsmanager_secret) includeTerraformNamespaceandTerraformModuletags. Consider adding the same tags to the IAM policy for organizational consistency and resource tracking.resource "aws_iam_policy" "weave_worker_auth_secret_reader" { name = "${var.namespace}-weave-worker-auth-secret-reader" description = "Allow reading weave worker auth secret from Secrets Manager" + + tags = { + TerraformNamespace = var.namespace + TerraformModule = "terraform-aws-wandb/module/app_eks" + } policy = jsonencode({
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
modules/app_eks/main.tf(2 hunks)
🔇 Additional comments (4)
modules/app_eks/main.tf (4)
208-217: Secrets Store module addition looks good.The dependency on
module.eksis correctly placed, and version parameters are properly threaded through.
220-238: Verify secret generation and lifecycle.The random password is generated once per state and stored in Secrets Manager. Confirm that:
- The 32-character, non-special token meets the security requirements for weave-worker authentication.
- Secret rotation strategy is documented (currently only rotates if Terraform state is replaced).
241-264: Verify IAM policy attachment scope aligns with IRSA intent.The policy is attached to
aws_iam_role.node(line 262), which grants all pods on any worker node read access to the weave-worker secret. However, the PR objectives state the intent is IRSA ("weave_worker IAM role with IRSA for service account authentication").If the goal is fine-grained, pod-level access (standard IRSA practice), the policy should instead attach to a dedicated
weave_workerIRSA role that only the weave-worker service account assumes—not the broad node role.Confirm whether this attachment should target the node role (broad) or a dedicated IRSA role (fine-grained). If IRSA is intended, verify the policy attaches to the correct role (likely defined in a separate file like
iam-roles.tforiam-role-attachments.tf).
267-270: Comment clarification is helpful.The updated comment clearly explains the architecture shift to CSI driver-based secret management and the benefits (elimination of dual management, automatic rotation). This is good context for future maintainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
bufstream-shared-bucket.tf (1)
44-47: Document the security implications of attaching the policy to the node role.This policy is attached to the EKS node role, which means all pods on all nodes across all namespaces gain access to this shared bucket. This is a broad blast radius. Ensure this is the intended security model.
Consider alternatives:
- Use IRSA (IAM Roles for Service Accounts) to limit access to specific service accounts/namespaces
- Document why all-namespace access is required for this shared resource
Add a comment explaining the security tradeoff:
# Attach policy to the node role (used by all pods across all namespaces) +# WARNING: This grants access to all workloads on the cluster. If finer-grained +# access control is needed, consider using IRSA (IAM Roles for Service Accounts) +# instead. resource "aws_iam_role_policy_attachment" "bufstream_shared" {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
bufstream-shared-bucket.tf(1 hunks)data.tf(1 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
bufstream-shared-bucket.tf
[medium] 6-14: Ensure that an S3 bucket has a lifecycle configuration
(CKV2_AWS_61)
🪛 GitHub Actions: Terraform Lint and Format Check
data.tf
[warning] 14-14: Data 'aws_caller_identity' 'current' is declared but not used (terraform_unused_declarations).
🔇 Additional comments (1)
bufstream-shared-bucket.tf (1)
7-7: Clarify if the "-sa-demo" bucket naming is appropriate for production.The bucket name includes "-sa-demo", which typically suggests demo or testing code. Verify this is intentional for production infrastructure. If this is meant for production, use a more descriptive, production-appropriate name.
| resource "aws_s3_bucket" "bufstream_shared" { | ||
| bucket = "${var.namespace}-bufstream-sa-demo" | ||
| force_destroy = false | ||
|
|
||
| tags = { | ||
| Namespace = var.namespace | ||
| Role = "bufstream-shared-bucket" | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a lifecycle configuration to the S3 bucket.
The Checkov static analysis flagged this S3 bucket for missing lifecycle configuration (CKV2_AWS_61). Production S3 buckets should include lifecycle policies for automatic cleanup, retention, or transition of objects.
Apply this diff to add a basic lifecycle configuration:
resource "aws_s3_bucket" "bufstream_shared" {
bucket = "${var.namespace}-bufstream-sa-demo"
force_destroy = false
tags = {
Namespace = var.namespace
Role = "bufstream-shared-bucket"
}
}
+
+resource "aws_s3_bucket_lifecycle_configuration" "bufstream_shared" {
+ bucket = aws_s3_bucket.bufstream_shared.id
+
+ rule {
+ id = "delete-old-objects"
+ status = "Enabled"
+
+ expiration {
+ days = 90
+ }
+ }
+}Adjust the retention period based on your requirements.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resource "aws_s3_bucket" "bufstream_shared" { | |
| bucket = "${var.namespace}-bufstream-sa-demo" | |
| force_destroy = false | |
| tags = { | |
| Namespace = var.namespace | |
| Role = "bufstream-shared-bucket" | |
| } | |
| } | |
| resource "aws_s3_bucket" "bufstream_shared" { | |
| bucket = "${var.namespace}-bufstream-sa-demo" | |
| force_destroy = false | |
| tags = { | |
| Namespace = var.namespace | |
| Role = "bufstream-shared-bucket" | |
| } | |
| } | |
| resource "aws_s3_bucket_lifecycle_configuration" "bufstream_shared" { | |
| bucket = aws_s3_bucket.bufstream_shared.id | |
| rule { | |
| id = "delete-old-objects" | |
| status = "Enabled" | |
| expiration { | |
| days = 90 | |
| } | |
| } | |
| } |
🧰 Tools
🪛 Checkov (3.2.334)
[medium] 6-14: Ensure that an S3 bucket has a lifecycle configuration
(CKV2_AWS_61)
🤖 Prompt for AI Agents
In bufstream-shared-bucket.tf around lines 6 to 14, the S3 bucket resource lacks
a lifecycle configuration (Checkov CKV2_AWS_61); add a lifecycle_rule block to
the aws_s3_bucket "bufstream_shared" resource with a descriptive id, enabled =
true, and at minimum an expiration days value (e.g., 365) to automatically
remove objects, and optionally noncurrent_version_expiration, transition rules,
and abort_incomplete_multipart_upload for multipart cleanup; adjust the
expiration/transition days to match your retention requirements.
| policy = jsonencode({ | ||
| Version = "2012-10-17" | ||
| Statement = [ | ||
| { | ||
| Sid = "AllowAllS3ActionsOnSharedBufstreamBucket" | ||
| Effect = "Allow" | ||
| Action = [ | ||
| "s3:*" | ||
| ] | ||
| Resource = [ | ||
| aws_s3_bucket.bufstream_shared.arn, | ||
| "${aws_s3_bucket.bufstream_shared.arn}/*" | ||
| ] | ||
| } | ||
| ] | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Restrict IAM permissions to specific S3 actions.
The IAM policy grants s3:* (all S3 actions), which is overly permissive. Restrict this to only the actions needed by the workloads accessing this bucket (e.g., s3:GetObject, s3:PutObject).
Apply this diff to restrict permissions:
Statement = [
{
Sid = "AllowAllS3ActionsOnSharedBufstreamBucket"
Effect = "Allow"
- Action = [
- "s3:*"
- ]
+ Action = [
+ "s3:GetObject",
+ "s3:PutObject",
+ "s3:DeleteObject",
+ "s3:ListBucket"
+ ]
Resource = [
aws_s3_bucket.bufstream_shared.arn,
"${aws_s3_bucket.bufstream_shared.arn}/*"
]
}
]Adjust the action list based on what your workloads actually need to perform.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| policy = jsonencode({ | |
| Version = "2012-10-17" | |
| Statement = [ | |
| { | |
| Sid = "AllowAllS3ActionsOnSharedBufstreamBucket" | |
| Effect = "Allow" | |
| Action = [ | |
| "s3:*" | |
| ] | |
| Resource = [ | |
| aws_s3_bucket.bufstream_shared.arn, | |
| "${aws_s3_bucket.bufstream_shared.arn}/*" | |
| ] | |
| } | |
| ] | |
| }) | |
| policy = jsonencode({ | |
| Version = "2012-10-17" | |
| Statement = [ | |
| { | |
| Sid = "AllowAllS3ActionsOnSharedBufstreamBucket" | |
| Effect = "Allow" | |
| Action = [ | |
| "s3:GetObject", | |
| "s3:PutObject", | |
| "s3:DeleteObject", | |
| "s3:ListBucket" | |
| ] | |
| Resource = [ | |
| aws_s3_bucket.bufstream_shared.arn, | |
| "${aws_s3_bucket.bufstream_shared.arn}/*" | |
| ] | |
| } | |
| ] | |
| }) |
🤖 Prompt for AI Agents
In bufstream-shared-bucket.tf around lines 21 to 36, the IAM policy currently
allows "s3:*" which is overly broad; replace it with a minimal set of S3 actions
required by the workloads (for example s3:GetObject and s3:PutObject for object
access, add s3:ListBucket for listing and s3:DeleteObject only if deletes are
required), and scope object-level actions to the bucket objects ARN
("${aws_s3_bucket.bufstream_shared.arn}/*") while keeping bucket-level actions
(like ListBucket) on the bucket ARN (aws_s3_bucket.bufstream_shared.arn); adjust
the final action list to exactly match what your workloads need.
|
|
||
| data "aws_region" "current" {} | ||
|
|
||
| data "aws_caller_identity" "current" {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify that the data source is used elsewhere in the module.
The Terraform linter reports this data source as unused. While the AI summary mentions it's used in modules/app_eks/iam-policy-docs.tf, please confirm this is actually referenced, as cross-file linter detection can be incomplete.
Run the following script to verify usage:
🏁 Script executed:
#!/bin/bash
# Search for references to aws_caller_identity.current across the modules directory
rg -n 'aws_caller_identity\.current' modules/ --type tfLength of output: 2042
I can see the data source is actively used across many modules. Now I need to verify where aws_caller_identity is declared to ensure the configuration is correct.
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for all declarations of aws_caller_identity data source
rg -n 'data "aws_caller_identity"' . --type tf
Length of output: 338
I need to check if the root module itself uses this data source, which would justify keeping it in data.tf.
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for aws_caller_identity usage at root level (excluding modules/)
rg -n 'data\.aws_caller_identity\.current' . --type tf -g '!modules/**'
Length of output: 0
Remove the unused data source from data.tf line 14.
The data source declaration is unused at the root module level. In Terraform, child modules cannot access parent module data sources, and verification confirms each module that needs aws_caller_identity declares it locally. The root-level declaration provides no value and should be deleted.
🧰 Tools
🪛 GitHub Actions: Terraform Lint and Format Check
[warning] 14-14: Data 'aws_caller_identity' 'current' is declared but not used (terraform_unused_declarations).
🤖 Prompt for AI Agents
In data.tf around line 14, the root-module declaration data
"aws_caller_identity" "current" is unused and should be removed; delete that
entire data source block from data.tf so there is no unused root-level
aws_caller_identity declaration (child modules should continue to declare their
own local aws_caller_identity as needed).
e2cc2ef to
de023ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
data.tf (1)
14-14: Remove the unused root-level data source.This
aws_caller_identitydata source at the root module level is unused and causes the build to fail. The tflint warning persists because child modules cannot access parent module data sources directly; each module that needs this data must declare it locally.Apply this diff to remove it:
-data "aws_caller_identity" "current" {}If
modules/app_eks/iam-policy-docs.tfor other child modules need AWS account ID, they should declare their owndata "aws_caller_identity" "current"locally.bufstream-shared-bucket.tf (2)
6-14: Add a lifecycle configuration to the S3 bucket.The S3 bucket is missing a lifecycle policy. This was flagged in the prior review and remains unresolved. Production buckets should include lifecycle policies for automatic cleanup, retention, or transition of objects to prevent unbounded growth.
Apply this diff to add a lifecycle configuration:
resource "aws_s3_bucket" "bufstream_shared" { bucket = "${var.namespace}-bufstream-sa-demo" force_destroy = false tags = { Namespace = var.namespace Role = "bufstream-shared-bucket" } } + +resource "aws_s3_bucket_lifecycle_configuration" "bufstream_shared" { + bucket = aws_s3_bucket.bufstream_shared.id + + rule { + id = "delete-old-objects" + status = "Enabled" + + expiration { + days = 90 + } + } +}Adjust the retention period (
days) based on your actual requirements.
21-36: Restrict IAM permissions to only required S3 actions.The IAM policy grants
s3:*(all S3 actions), which is overly permissive. This was flagged in the prior review and remains unaddressed. Scope this policy to the specific S3 operations your workloads actually require.Apply this diff to restrict permissions:
Statement = [ { Sid = "AllowAllS3ActionsOnSharedBufstreamBucket" Effect = "Allow" Action = [ - "s3:*" + "s3:GetObject", + "s3:PutObject", + "s3:ListBucket" ] Resource = [ aws_s3_bucket.bufstream_shared.arn, "${aws_s3_bucket.bufstream_shared.arn}/*" ] } ]Include
s3:DeleteObjectonly if the workload requires object deletion. Adjust this list to match your actual workload needs.
🧹 Nitpick comments (1)
modules/app_eks/main.tf (1)
267-270: Confirm CSI driver secret provisioning is properly configured.The comment indicates secrets are now provisioned by the Secrets Store CSI Driver via the
operator-wandbHelm chart throughSecretProviderClassresources. Verify that:
- The operator-wandb Helm chart creates the appropriate
SecretProviderClassresource pointing to this secret in AWS Secrets Manager.- The weave-trace-worker pod spec references the CSI driver volume to mount the secret at the expected path.
- The secret name suffix (
-00) matches what the Helm chart references in theSecretProviderClass.Would you like me to search for the operator-wandb chart configuration to verify the CSI driver integration is complete?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
bufstream-shared-bucket.tf(1 hunks)data.tf(1 hunks)modules/app_eks/main.tf(2 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
bufstream-shared-bucket.tf
[medium] 6-14: Ensure that an S3 bucket has a lifecycle configuration
(CKV2_AWS_61)
🪛 GitHub Actions: Terraform Lint and Format Check
data.tf
[error] 14-14: tflint: Warning: [Fixable] data "aws_caller_identity" "current" is declared but not used (terraform_unused_declarations). This fix caused the task to fail with exit code 2.
🔇 Additional comments (3)
modules/app_eks/main.tf (3)
225-238: Secrets infrastructure setup looks good.The secret resource name change (suffix
-00), secret version creation, and IAM policy scoping are all correct. The policy grants read-only access (GetSecretValue,DescribeSecret) scoped specifically to the weave worker auth secret ARN, which follows least-privilege principles.
241-264: IAM policy attachment correctly scoped.The policy is appropriately attached to the node IAM role, enabling all pods on those nodes to retrieve the secret. The policy itself is narrowly scoped to the single weave worker secret and read-only operations.
208-217: Variables are properly defined. Bothsecrets_store_csi_driver_versionandsecrets_store_csi_driver_provider_aws_versionare declared inmodules/app_eks/variables.tfwith appropriate descriptions and defaults (1.4.7and0.3.9respectively). The module instantiation is correct.
de023ec to
4e07cdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
bufstream-shared-bucket.tf (2)
6-14: Add lifecycle configuration to the S3 bucket (flagged in prior review).The bucket lacks a lifecycle configuration (Checkov CKV2_AWS_61). Production S3 buckets should include lifecycle policies for automatic cleanup or retention management.
Apply this diff to add a basic lifecycle configuration:
resource "aws_s3_bucket" "bufstream_shared" { bucket = "${var.namespace}-bufstream-sa-demo" force_destroy = false tags = { Namespace = var.namespace Role = "bufstream-shared-bucket" } } + +resource "aws_s3_bucket_lifecycle_configuration" "bufstream_shared" { + bucket = aws_s3_bucket.bufstream_shared.id + + rule { + id = "delete-old-objects" + status = "Enabled" + + expiration { + days = 90 + } + } +}Adjust retention based on your data retention requirements.
27-29: Restrict IAM permissions to specific S3 actions (flagged in prior review).The policy grants
s3:*(all S3 actions), which exceeds the principle of least privilege. Restrict this to only the actions workloads actually need.Apply this diff to restrict permissions:
Action = [ - "s3:*" + "s3:GetObject", + "s3:PutObject", + "s3:DeleteObject", + "s3:ListBucket" ]Adjust the action list to match what your workloads actually require.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
bufstream-shared-bucket.tf(1 hunks)data.tf(1 hunks)modules/app_eks/main.tf(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- data.tf
🧰 Additional context used
🪛 Checkov (3.2.334)
bufstream-shared-bucket.tf
[medium] 6-14: Ensure that an S3 bucket has a lifecycle configuration
(CKV2_AWS_61)
🔇 Additional comments (4)
modules/app_eks/main.tf (4)
208-217: Secrets Store CSI Driver module integration looks correct.The module is properly placed, depends on
module.eks, and passes version inputs. This setup aligns with the PR objectives for dynamic secret mounting from AWS Secrets Manager.
241-258: IAM policy for secret access is properly scoped.The policy correctly restricts permissions to only
secretsmanager:GetSecretValueandsecretsmanager:DescribeSecreton the specific secret ARN, following the principle of least privilege.
261-264: Verify that node-role-level secret attachment aligns with intended access scope.The weave-worker auth secret policy is attached to the node role, granting all pods in the cluster access to read the secret. Confirm this cluster-wide access scope is intentional and acceptable for this use case, or consider using IRSA with a dedicated service account if stricter access control is needed.
This design choice depends on the threat model and workload isolation requirements. If only specific workloads should access this secret, you may want to:
- Create a dedicated service account for weave-worker
- Use IRSA (IAM Roles for Service Accounts) to limit access
- Create a separate IAM role with only the necessary permissions
Confirm whether cluster-wide access via node role is the intended design.
267-270: Comment is factually accurate; verification of external dependency required.The comment correctly documents that the Secrets Store CSI Driver manages Kubernetes secrets via the SecretProviderClass defined in the operator-wandb Helm chart. Within this repository, the infrastructure is properly set up: the
weave-worker-authsecret is created in AWS Secrets Manager (modules/app_eks/main.tf:226), and the delegation to the operator-wandb Helm chart is explicitly documented in modules/app_eks/secrets_store/secrets_store.tf:38–40.However, the operator-wandb Helm chart is an external dependency hosted at https://github.com/wandb/helm-charts. Verification that this external chart correctly defines the SecretProviderClass for
weave-worker-authcannot be completed within this repository alone. Ensure the operator-wandb Helm chart version used in your deployment includes the proper SecretProviderClass configuration.
|
|
||
| resource "aws_secretsmanager_secret" "weave_worker_auth" { | ||
| name = "${var.namespace}-weave-worker-auth" | ||
| name = "${var.namespace}-weave-worker-auth-00" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify the weave-worker secret naming convention.
The secret name was changed to include a -00 suffix: "${var.namespace}-weave-worker-auth-00". Confirm this naming is intentional (e.g., for versioning or multi-tenant isolation) rather than a temporary/test name.
🏁 Script executed:
#!/bin/bash
# Description: Verify the weave-worker secret name is referenced consistently in the codebase.
# Expected: The "-00" suffix should appear in references or be a documented versioning scheme.
# Search for references to weave-worker-auth in the codebase
rg -n 'weave-worker-auth' --type=tf --type=py --type=yamlLength of output: 314
</script> ```
🏁 Script executed:
#!/bin/bash
# Get more context around weave-worker-auth references
echo "=== Context around modules/app_eks/main.tf lines 220-250 ==="
sed -n '220,250p' modules/app_eks/main.tf
echo -e "\n=== Context around main.tf lines 440-450 ==="
sed -n '440,450p' main.tf
echo -e "\n=== Check for any -00 pattern in the repo ==="
rg '\-00' --type=tf --type=py --type=yaml -n | head -20Length of output: 1503
Remove the -00 suffix from the secret name in line 226—it appears unintentional and is not referenced elsewhere in the codebase.
The AWS Secrets Manager secret name includes a -00 suffix that does not appear in any other references or follow a documented versioning pattern. AWS Secrets Manager has built-in versioning; the -00 suffix is non-standard. Either:
- Remove the suffix:
"${var.namespace}-weave-worker-auth", or - Document the versioning scheme if intentional and implement it consistently across all related resources.
🤖 Prompt for AI Agents
In modules/app_eks/main.tf around line 226, the AWS Secrets Manager secret name
includes an unintended "-00" suffix; update the resource to remove the suffix so
the name becomes "${var.namespace}-weave-worker-auth" (or, if the suffix is
intentional, add a comment documenting the versioning scheme and make the same
suffix change consistently across all related resources and references such as
outputs, variables, IAM policies, or other modules). Ensure you search the repo
for the old name and update any references to the new name to avoid mismatches.
Implements Secrets Store CSI Driver integration for AWS to enable secure, dynamic secret mounting from AWS Secrets Manager into Kubernetes.
Changes:
The CSI driver automatically syncs secrets from AWS Secrets Manager to Kubernetes, enabling secret rotation and eliminating the need to manage secrets in both places. SecretProviderClass resources are created by the operator-wandb Helm chart.
This provides the infrastructure for internal service authentication where services authenticate via Kubernetes secrets that are dynamically mounted from cloud secrets managers.
Summary by CodeRabbit
New Features
Infrastructure Improvements