Skip to content

[BUG] SparkApplication hangs in "submitted" state when driver pod fails with ImagePullBackOff #2667

@Vadim-elo

Description

@Vadim-elo

What happened?

Currently, when the Spark driver pod fails to pull container images and enters ErrImagePull/ImagePullBackOff states, the SparkApplication remains stuck in the "submitted" state indefinitely. The operator doesn't provide any mechanism to detect or resolve this situation.

Is there currently any built-in mechanism in Spark Operator to handle driver image pull failures?

If not, are there any recommended workarounds to address this issue?

Would the maintainers be open to implementing proper failure detection for image pull errors?

✋ I also found the same closed issue without any answers and solutions: #1737

Reproduction Code

Simply provide an invalid image name, for example, an invalid tag

Expected behavior

SparkApplication should transition to "failed" state with appropriate error message when image pull failures occur

Actual behavior

SparkApplication hangs in "submitted" state when driver pod cannot pull images

Environment & Versions

  • Kubernetes Version: 1.25.11
  • Spark Operator Version: 2.3.0
  • Apache Spark Version: >= 3.5.0

Additional context

No response

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions