-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Description
When deploying the slurm helm chart (not the operator necessarily), on startup the slurm-controller pod complains about decrypting the JWT token.
[2025-11-07T12:38:23] error: auth_p_verify: jwt_decode failure: Invalid argument
[2025-11-07T12:38:23] error: slurm_unpack_received_msg: [10.42.242.171:48216] auth_g_verify: REQUEST_NODE_INFO has authentication error: Invalid authentication credential
[2025-11-07T12:38:23] error: slurm_unpack_received_msg: [10.42.242.171:48216] Protocol authentication error
This is a Rancher managed RKE2 cluster (k8s - v1.31.12).
I was suspicious that FIPS was in the way (this cluster has backend hosts of RedHat 9 w/FIPS enabled), so I tested the deployment against an RKE1 system - no fips, and a different RKE2 cluster - no fips, and both started up. I'm not surprised this is an issue as FIPS enablement causes lots of problems with various software.
Steps to Reproduce
Not sure how easy it is to test/reproduce on your side, but a cluster that has nodes that are FIPS enabled I'm thinking would show the issue.
Expected Behavior
No JWT issues
Additional Context
Is there a way to not use the JWT auth perhaps, or to make it generate the key differently? I would have hoped that the JWT token auto-generation would have worked because it was done on a FIPS enabled system when it ran, but maybe what it's generating isn't right. I don't have anything else that I think I can debug with it. Also I'm not sure if there's anything you even can do about it (maybe there's a way to disable the token entirely, or to use something else in a setting?), but I wanted to bring this to your attention.