Skip to content

[ST] RecoveryST.testRecoveryFromKafkaServiceDeletion() is flaky #12134

@david-simon

Description

@david-simon

Bug Description

The aforementioned test fails intermittently in our pipeline at Cloudera. The service gets recreated but the clients fail to connect to the cluster.
I think although the Service exists the corresponding DNS entries have not been written yet. This could be fixed by waiting for the service to be resolvable or adding a retry to the client Jobs (currently backoffLimit is 0).

If any of these seem reasonable I'd be happy to implement it.

Steps to reproduce

  1. Run RecoveryST.testRecoveryFromKafkaServiceDeletion() repeatedly
  2. Observe flakiness (about 1 in 100 on our infra)

Expected behavior

The test passes

Strimzi version

0.47

Kubernetes version

Kubernetes 1.32.1

Installation method

No response

Infrastructure

K3s on EC2

Configuration files and logs

Test log:

io.strimzi.test.WaitException: Timeout after 280000 ms waiting for client Jobs to finish successfully
	at io.strimzi.test.TestUtils.waitFor(TestUtils.java:130)
	at io.strimzi.systemtest.utils.ClientUtils.waitForClientsSuccess(ClientUtils.java:87)
	at io.strimzi.systemtest.utils.ClientUtils.waitForInstantClientSuccess(ClientUtils.java:54)
	at io.strimzi.systemtest.utils.ClientUtils.waitForInstantClientSuccess(ClientUtils.java:42)
	at io.strimzi.systemtest.operators.RecoveryST.verifyStabilityBySendingAndReceivingMessages(RecoveryST.java:139)
	at io.strimzi.systemtest.operators.RecoveryST.testRecoveryFromKafkaServiceDeletion(RecoveryST.java:84)

Producer job log:

Exception in thread "main" org.apache.kafka.common.KafkaException: Failed to construct kafka producer
	at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:463)
	at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:301)
	at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:328)
	at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:313)
	at io.strimzi.kafka.KafkaProducerClient.<init>(KafkaProducerClient.java:50)
	at io.strimzi.Main.main(Main.java:39)
Caused by: org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers
	at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:104)
	at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:63)
	at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:59)
	at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:438)
	... 5 more

K8s dump:
test_0 4.zip

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions