Skip to content

S3 mount problems #5050

@EllisonTravel

Description

@EllisonTravel

Hi there,

Thanks for working on this item: #4705
This work around is inconsistent and not really a good production solution.
Here is what administrators said:
Any other suggestions?
thanks

However, this approach is the direct cause of the issues you’ve been experiencing — including system slowness, scheduler hangs, and intermittent task failures.

Upon checking the Supervisor logs show that queue workers were repeatedly being terminated by SIGKILL between Oct 18 and Oct 28, confirming the instability. The workers were re-configured and restarted on Oct 29, and have remained in a stable RUNNING state since then. Please refer to the attachment. These worker terminations are consistent with latency and blocking caused by the s3fs mount.

The s3fs mount introduces high latency and blocking behaviour because it’s not a real filesystem but a network-based FUSE layer. As long as s3fs remains in use, these issues will persist.

"the s3fs mount is the root cause of the recent performance and scheduler issues. "

after reviewing the logs and system behavior in detail, it became clear that the instability was specifically due to the latency and blocking introduced by the s3fs layer. This is also supported by the Supervisor logs, which showed repeated worker terminations between Oct 18 and Oct 28. Since the reconfiguration and restart on Oct 29, the workers have remained stable, further confirming that the issue was related to the s3fs mount.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions