-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Double check that k8s cert file was copied #11461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Was seeing the very first felix container fail to start in some runs with a failure to mount in the cert. Check that we can stat/open/sync the file after copying it out of the API server's container.
544ddeb to
2b6008d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses intermittent FV test failures where the first felix container occasionally fails to access the Kubernetes API server certificate file. The fix adds additional verification steps after copying the certificate and switches to using --mount for better error diagnostics.
Key Changes:
- Enhanced cert file copy retry loop with additional verification steps (stat, open, sync)
- Switched from
-vto--mountsyntax for Docker volume bindings to improve error reporting - Added cleanup of cert file before each retry attempt
| "-e", "KUBERNETES_MASTER=" + kds.Endpoint, | ||
| "-e", "K8S_INSECURE_SKIP_TLS_VERIFY=true", | ||
| "-v", kds.CertFileName + ":/tmp/apiserver.crt", | ||
| "--mount", fmt.Sprintf("type=bind,source=%s,target=%s", kds.CertFileName, "/tmp/apiserver.crt"), |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The --mount syntax is missing the readonly option. Since this is a certificate file that should not be modified by the container, consider adding readonly for better security:
"--mount", fmt.Sprintf("type=bind,source=%s,target=%s,readonly", kds.CertFileName, "/tmp/apiserver.crt"),This prevents accidental modification of the certificate file from within the container.
| "-e", "K8S_API_ENDPOINT=" + kds.BadEndpoint, | ||
| "-e", "K8S_INSECURE_SKIP_TLS_VERIFY=true", | ||
| "-v", kds.CertFileName + ":/tmp/apiserver.crt", | ||
| "--mount", fmt.Sprintf("type=bind,source=%s,target=%s", kds.CertFileName, "/tmp/apiserver.crt"), |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The --mount syntax is missing the readonly option. Since this is a certificate file that should not be modified by the container, consider adding readonly for better security:
"--mount", fmt.Sprintf("type=bind,source=%s,target=%s,readonly", kds.CertFileName, "/tmp/apiserver.crt"),This prevents accidental modification of the certificate file from within the container.
2b6008d to
2cbf821
Compare
Description
Speculative fix/diags for an issue I'm seeing in FV. Sometimes the very first felix container that we launch fails with an error accessing the k8s API server's cert file. Double check that the copy of the file succeeded and open/sync it just to make sure.
Switch to using
--mounton the felix container since it's supposed to have better error reporting than-v.Related issues/PRs
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*label.docs-pr-required: This change requires a change to the documentation that has not been completed yet.docs-completed: This change has all necessary documentation completed.docs-not-required: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*label.release-note-required: This PR has user-facing changes. Most PRs should have this label.release-note-not-required: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.