You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/configuration/nodegroup.md
+34-6Lines changed: 34 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,12 +29,12 @@ node_groups:
29
29
taint_effect: NoExecute
30
30
max_node_age: 24h
31
31
aws:
32
-
fleet_instance_ready_timeout: 1m
33
-
launch_template_id: lt-1a2b3c4d
34
-
launch_template_version: "1"
35
-
lifecycle: on-demand
36
-
instance_type_overrides: ["t2.large", "t3.large"]
37
-
resource_tagging: false
32
+
fleet_instance_ready_timeout: 1m
33
+
launch_template_id: lt-1a2b3c4d
34
+
launch_template_version: "1"
35
+
lifecycle: on-demand
36
+
instance_type_overrides: ["t2.large", "t3.large"]
37
+
resource_tagging: false
38
38
```
39
39
40
40
## Options
@@ -273,3 +273,31 @@ When not at the minimum, the natural scaling up and down of the node group will
273
273
node group.
274
274
275
275
This is an optional feature and by default is disabled.
276
+
277
+
### `unhealthy_node_grace_period`
278
+
279
+
Defines the minimum age of a node before it can be tested to check if it is unhealthy.
280
+
281
+
When enabled, instances can be tested periodically to determine if they are healthy. Escalator will pause all scaling activity and flush out unhealthy instances if they go above a configured threshold for the nodegroup. It will continuously do this until enough instances in the nodegroup are healthy and normal scaling activity can resume.
282
+
283
+
Cordoned nodes are skipped and can never be considered unhealthy.
284
+
285
+
This is an optional field. The default value is empty, which disables the feature.
286
+
287
+
### `health_check_newest_nodes_percent`
288
+
289
+
**[Only used if `unhealthy_node_grace_period` is set.]**
290
+
291
+
The percentage of nodes (ordered by age from newer to older) in the nodegroup that are considered when checking for the maximum allowed unhealthy nodes in the nodegroup. The nodes captured by this percentage form the "test set" to be checked. Only nodes which are older than `unhealthy_node_grace_period` will be included in the test set.
292
+
293
+
This field is required.
294
+
295
+
### `max_unhealthy_nodes_percent`
296
+
297
+
**[Only used if `unhealthy_node_grace_period` is set.]**
298
+
299
+
The maximum percentage of unhealthy nodes in the test set from `health_check_newest_nodes_percent`. Beyond this threshold all scaling activity is paused and unhealthy nodes are flushed out.
300
+
301
+
> **Note:** The valid range for `max_unhealthy_nodes_percent` is `0%` to `99%`.
302
+
303
+
This is an optional field. If not set, it will default to `0%`.
0 commit comments