I have been following the GRPC example to add chaos testing to my distributed system. The determinism has been extremely nice for reproducing issues! However, I keep running into issues that TCP would generally handle on its own.
I've run into situations like this when setting builder.fail_rate(0.01).repair_rate(0.9):
- Client and server do the TCP 3-way handshake
- Client sends its request packet, which gets dropped
- The packet is never re-sent, but also the link isn't considered broken
- The client task hangs until the request deadline hits, and no more packets are sent out
Is this considered normal? Please forgive me if my understanding of TCP is rusty 😅