-
Notifications
You must be signed in to change notification settings - Fork 174
fix: Ensure Orphan CR Retention in Longhorn When orphan-auto-deletion is Set to False #3961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
74e9877 to
8ea7fbe
Compare
shuo-wu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggested solution includes 2 parts:
1. We shouldn't clean up orphan CR when its node is down, missing manager, or Longhorn manager cannot [get disk stat](https://github.com/longhorn/longhorn-manager/blob/c4e7942684cc1f8ece900854d09126a7b1f8c0b6/util/util.go#L434-L435)
2. We still should delete orphan CR when node is deleted, evicting, or the actual data of the orphan CR no longer exist
How does this PR achieve the actual data of the orphan CR no longer exist ?
|
71d5933 to
ee881bb
Compare
|
Hi @derekbit @COLDTURNIP Could I have a question about orphan delete setting: The current code is Description: "This setting allows Longhorn to automatically delete orphan resources and their corresponding orphaned resources. \n\n" +
"Orphan resources located on nodes that are in down or unknown state will not be cleaned up automatically. \n\n" +
"List the enabled resource types in a semicolon-separated list. \n\n" +
"Available items are: \n\n" +
"- **replica-data**: replica data store \n\n" +
"- **instance**: engine and replica runtime instance \n\n",From the design perceptive. What would be the expectation?
|
|
This PR is related to longhorn/longhorn#7795 |
Leaving aside the impact of this setting, I think the 2nd one is expected. The CR reflects the status of the resource in the cluster. When a resource is not reachable from the cluster (e.g. node leave), the corresponding CR should be deleted. Once a node is unreachable by Longhorn, the underlying resource, including the landed data, are not controlled by Longhorn, except the eviction is required. Back to the auto deletion, when this setting is enabled, we expects the controller clear the orphaned resource once it is identified. Without the auto deletion enabled, the CRs are expected to reflect the detectability of the resource, and delete the them only when user asks to. When auto deletion enabled:
No matter the auto deletion is enabled or not, when a node/disk eviction requested:
No matter the auto deletion is enabled or not, when a node leaves Longhorn without eviction, and there are some resources on the leaving node:
|
Second that 2 is expected. |
This makes sense.
For the eviction case, I prefer to retain the related data resource and clean up the orphan CR only. One typical eviction case is node upgrade. After that, Longhorn may be able to continue using it. Besides, we need to be careful about one corner case for the node down/manager missing scenario. If the node is suddenly back to healthy when deleting orphan CRs, Longhorn cannot accidently clean up the related data. |
ref:7795 Signed-off-by: Nina Zhan <[email protected]>
Which issue(s) this PR fixes:
Issue longhorn/longhorn#7795
What this PR does / why we need it:
Delete orphan CR only when node is deleted, evicting, or the actual data of the orphan CR no longer exist
Special notes for your reviewer:
Additional documentation or context