-
Notifications
You must be signed in to change notification settings - Fork 481
Add documentation on how to upgrade from scratchfs to swap #34031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
alex-hunt-materialize
wants to merge
1
commit into
MaterializeInc:main
Choose a base branch
from
alex-hunt-materialize:swap_docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| --- | ||
| title: "Upgrading to swap" | ||
| description: "Upgrade procedure when upgrading to versions with swap enabled by default." | ||
| menu: | ||
| main: | ||
| parent: "installation" | ||
| weight: 69 | ||
| --- | ||
|
|
||
| Upgrades to v26 and later have swap enabled by default. | ||
| In order to provide an upgrade path without disruption to existing installations, we have introduced additional labels into the node selectors for clusterd pods. | ||
| Due to these new selector labels, your existing nodes will intentionally not be selected. | ||
| You will need to take additional actions in preparation for upgrading to v26. | ||
|
|
||
| If you wish to opt out of swap and retain the old behavior, you may set `operator.clusters.swap_enabled: false` in your helm values. | ||
| Otherwise, continue below. | ||
|
|
||
| ## Upgrade preparation steps | ||
|
|
||
| 1. Label existing scratchfs/lgalloc node groups | ||
|
|
||
| If using lgalloc on scratchfs volumes, you must add the additional `"materialize.cloud/scratch-fs": "true"` label to your existing node groups and nodes running Materialize workloads. | ||
|
|
||
| Adding this label to the node group (or nodepool) configuration will apply the label to newly spawned nodes, but depending on your cloud provider may not apply the label to existing nodes. | ||
|
|
||
| If not automatically applied, you may need to use `kubectl label` to apply the change to existing nodes. | ||
|
|
||
| 1. Modify existing scratchfs/lgalloc disk setup daemonset selector labels | ||
|
|
||
| If using our [ephemeral-storage-setup image](https://github.com/MaterializeInc/ephemeral-storage-setup-image/) as a daemonset to configure scratchfs LVM volumes for lgalloc, you must add the additional `"materialize.cloud/scratch-fs": "true"` label to multiple places: | ||
| * `spec.selector.matchLabels` | ||
| * `spec.template.metadata.labels` | ||
| * (if using `nodeAffinity`) `spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms` | ||
| * (if using `nodeSelector`) `spec.template.spec.nodeSelector` | ||
|
|
||
| You **must** use at least one of `nodeAffinity` or `nodeSelector`. | ||
|
|
||
| It is recommended to rename this daemonset to make it clear that it is only for the legacy scratchfs/lgalloc nodes (for example, change the name `disk-setup` to `disk-setup-scratchfs`). | ||
|
|
||
| 1. Create a new node group for swap | ||
|
|
||
| 1. Create a new node group (or ec2nodeclass and nodepool if using Karpenter in AWS) using an instance type with local NVMe disks. If in GCP, the disks must be in `raw` mode. | ||
|
|
||
| 1. Label the node group with `"materialize.cloud/swap": "true"`. | ||
|
|
||
| 1. If using AWS Bottlerocket AMIs (highly recommended if running in AWS), set the following in the userdata to configure the disks for swap, and enable swap in the kubelet: | ||
|
|
||
| ```toml | ||
| [settings.oci-defaults.resource-limits.max-open-files] | ||
| soft-limit = 1048576 | ||
| hard-limit = 1048576 | ||
|
|
||
| [settings.bootstrap-containers.diskstrap] | ||
| source = "docker.io/materialize/ephemeral-storage-setup-image:v0.4.0" | ||
| mode = "once" | ||
| essential = "true" | ||
| # ["swap", "--cloud-provider", "aws", "--bottlerocket-enable-swap"] | ||
| user-data = "WyJzd2FwIiwgIi0tY2xvdWQtcHJvdmlkZXIiLCAiYXdzIiwgIi0tYm90dGxlcm9ja2V0LWVuYWJsZS1zd2FwIl0=" | ||
|
|
||
| [kernel.sysctl] | ||
| "vm.swappiness" = "100" | ||
| "vm.min_free_kbytes" = "1048576" | ||
| "vm.watermark_scale_factor" = "100" | ||
| ``` | ||
|
|
||
| 1. If not using AWS or not using Bottlerocket AMIs, and your node group supports it (Azure does not as of 2025-11-05), add a startup taint. This taint will be removed after the disk is configured for swap. | ||
|
|
||
| ```yaml | ||
| taints: | ||
| - key: startup-taint.cluster-autoscaler.kubernetes.io/disk-unconfigured | ||
| value: "true" | ||
| effect: NoSchedule | ||
| ``` | ||
|
|
||
| 1. Create a new disk-setup-swap daemonset | ||
|
|
||
| If using Bottlerocket AMIs in AWS, you may skip this step, as you should have configured swap using userdata previously. | ||
|
|
||
| Create a new daemonset using our [ephemeral-storage-setup image](https://github.com/MaterializeInc/ephemeral-storage-setup-image/) to configure the disks for swap and to enable swap in the kubelet. | ||
|
|
||
| The arguments to the init container in this daemonset need to be configured for swap. See the examples in the linked git repository for more details. | ||
|
|
||
| This daemonset should run only on the new swap nodes, so we need to ensure it has the `"materialize.cloud/swap": "true"` label in several places: | ||
|
|
||
| * `spec.selector.matchLabels` | ||
| * `spec.template.metadata.labels` | ||
| * (if using `nodeAffinity`) `spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms` | ||
| * (if using `nodeSelector`) `spec.template.spec.nodeSelector` | ||
|
|
||
| You **must** use at least one of `nodeAffinity` or `nodeSelector`. | ||
|
|
||
| It is recommended to name this daemonset to clearly indicate that it is for configuring swap (ie: `disk-setup-swap`), as opposed to other disk configurations. | ||
|
|
||
| 1. (Optional) Configure environmentd to also use swap | ||
|
|
||
| Swap is enabled by default for clusterd, but not for environmentd. If you'd like to enable swap for environmentd, add `"materialize.cloud/swap": "true"` to the `environmentd.node_selector` helm value. | ||
|
|
||
| 1. Upgrade the Materialize operator helm chart to v26 | ||
|
|
||
| The cluster size definitions for existing Materialize instances will not be changed at this point, but any newly created Materialize instances, or upgraded Materialize instances will pick up the new sizes. | ||
|
|
||
| Do not create any new Materialize instances at versions less than v26, or perform any rollouts to existing Materialize instances to versions less than v26. | ||
|
|
||
| 1. Upgrade existing Materialize instances to v26 | ||
|
|
||
| The new v26 pods should go to the new swap nodes. | ||
|
|
||
| You can verify that swap is enabled and working by `exec`ing into a clusterd pod and running `cat /sys/fs/cgroup/memory.swap.max`. If you get a number greater than 0, swap is enabled and the pod is allowed to use it. | ||
|
|
||
| 1. (Optional) Delete old scratchfs/lgalloc node groups and disk-setup-scratchfs daemonset | ||
|
|
||
| If you no longer have anything running on the old scratchfs/lgalloc nodes, you may delete their node group and the disk-setup-scratchfs daemonset. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: [Catching up on the topic] How does this affect the deployment guideline pages:
https://materialize.com/docs/installation/install-on-gcp/appendix-deployment-guidelines/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q2: This is specifically for upgrading to v26?
(double-checking to find out where best to have this for visibility for people who'll need to do this as well as not get in the way of people who won't need it going forward -- new deployment people + people who took the steps to upgrade)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old terraform is not updated for swap support. If they are using that, they don't need this yet. When they migrate to the new terraform monorepo, then they will need do do these operations.
We haven't developed the procedure for migrating from the old terraform to the new one.