Skip to content

Conversation

@flub
Copy link
Contributor

@flub flub commented Nov 13, 2025

Description

For some odd reason AuthenticationError was given a branch that had
quinn::ConnectionError inside it. But logically the connection error
has nothing to do with authentication. That should have been a red
flag.

AuthenticationError itself is almost always wrapped in
ConnectingError, which does correctly have a quinn::ConnectionError
branch. And the few places where it was directly returned to the user
it arguably should have been wrapped in a ConnectingError.

The result of this is that before this fix you would get a very
confusing authentication error if the remote client closed the
connection right at the same time as the handshake completed for
it (yes, this is difficult to do at the right time, and it only
happens for the client since that completes the handshake one network
hop before the server). But this was no authentication error, it is
simply a closed connection. The new error structure captures this
correctly.

Similarly the InternalConsistencyError belongs on the ConnectingError.
Though that one should be impossible to produce since it's supposed to
be an invariant.

Breaking Changes

  • AuthenticationError loses the ConnectionError and
    InternalConsistencyError branches. Both are on the
    ConnectingError instead.
  • OutgoingZeroRttConnection::handshake_completed and
    IncomingZeroRttConnection now return a ConnectingError instead
    of AuthenticationError.

Notes & open questions

While I've managed to trigger this error somewhat occasionally using a
program that races the closing with the completed handshake in a very
tight loop before applying this fix. I'm completely failing to
trigger it since applying this fix, so I can admire the beautiful new
error reporting this fix should give. It's a bit confusing.

edit: it is confusing. But it is correct. Because my flaky failure does always close the connection after it completes the handshake. So ALPN and EndpointId are always available. And then you yield a valid Connection when awaiting an Incoming, it just is already closed.

This fix could also be made against main. I believe the same commit should be able to be cherry-picked and will probably apply fairly clean. Do you think I should make it against main?

Change checklist

  • Self-review.
  • All breaking changes documented.
    • List all breaking changes in the above "Breaking Changes" section.

For some odd reason AuthenticationError was given a branch that had
quinn::ConnectionError inside it.  But logically the connection error
has nothing to do with authentication.  That should have been a red
flag.

AuthenticationError itself is almost always wrapped in
ConnectingError, which does correctly have a quinn::ConnectionError
branch.  And the few places where it was directly returned to the user
it arguably **should** have been wrapped in a ConnectingError.

The result of this is that before this fix you would get a very
confusing authentication error if the remote client closed the
connection right at the same time as the handshake completed for
it (yes, this is difficult to do at the right time, and it only
happens for the client since that completes the handshake one network
hop before the server).  But this was no authentication error, it is
simply a closed connection.  The new error structure captures this
correctly.

Similarly the InternalConsistencyError belongs on the ConnectingError.
Though that one should be impossible to produce since it's supposed to
be an invariant.
@flub flub requested review from Frando and ramfox November 13, 2025 17:46
// If the authentication error raced with a connection error, the connection
// error wins.
if let Some(conn_err) = conn.close_reason() {
return Err(e!(ConnectingError::ConnectionError { source: conn_err }));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this actually fixes the racy thing I had. It took me a while to figure out why. But basically what happens is that because the handshake does complete on the client before the connection is closed, we never get the auth_err here, because all the info is available. So it never checks on whether the connection is already closes. Which in turn apparently means you just get a closed Connection. Which is pretty neat.

I don't think this means this branch can be removed though, because if the client closes before it completes the handshake you can get here again.

@n0bot n0bot bot added this to iroh Nov 13, 2025
@github-project-automation github-project-automation bot moved this to 🏗 In progress in iroh Nov 13, 2025
Copy link
Member

@ramfox ramfox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Fixes a panic and makes the error reporting much more clear!

@ramfox
Copy link
Member

ramfox commented Nov 13, 2025

Ah, yes to answer your question: yes I think this should be made against main.

@flub
Copy link
Contributor Author

flub commented Nov 14, 2025

Ah, yes to answer your question: yes I think this should be made against main.

I'm retracting this offer and will push it here only. Turns out main doesn't have the fallible EndpointStateActor::register_connection yet, and I don't really want to deal with losing that and then having to re-apply it when merging main into here.

@github-actions
Copy link

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3663/docs/iroh/

Last updated: 2025-11-14T09:14:12Z

@flub flub merged commit dab9d5f into feat-multipath Nov 14, 2025
16 of 28 checks passed
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in iroh Nov 14, 2025
@flub flub deleted the flub/accepting-connecting-err branch November 14, 2025 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

5 participants