-
Notifications
You must be signed in to change notification settings - Fork 165
feat: async and multi-result set APIs WIP #3607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| /// being returned successfully. ADBC_STATUS_NOT_FOUND is returned | ||
| /// when there are no more result sets. | ||
| ADBC_EXPORT | ||
| AdbcStatusCode AdbcStatementNextResultSet(struct AdbcStatement* statement, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if this also worked for "schema-only" evaluation. (I started down this road last year and my work-in-progress was at main...CurtHagenlocher:arrow-adbc:MoreResults.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have AdbcStatementExecuteSchema for schema-only evaluation, so you're suggesting a NextResultSet function for that one too?
|
Do we want async variants of things like GetObjects, Prepare, etc.? |
|
From a tactical perspective, should 1.2-related work all happen on a separate branch (at least until we feel the proposed changes have been proven out)? On an unrelated note, are there Arrow interop tests yet for the device APIs? I seem to recall that there weren't any a year ago. |
Yeah, I think that'll have to be what happens due to the header changes. |
c/include/arrow-adbc/adbc.h
Outdated
| /// | ||
| /// A partition can be retrieved from AdbcPartitions. | ||
| /// | ||
| /// This AdbcConnection must outlive the ArrowAsyncDeviceStreamHandler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to comment on the lifetime of serialized_partition too - does it need to live until the call returns or until the callback finishes?
c/include/arrow-adbc/adbc.h
Outdated
| ADBC_EXPORT | ||
| AdbcStatusCode AdbcStatementExecuteQueryAsync( | ||
| struct AdbcStatement* statement, struct ArrowAsyncDeviceStreamHandler* handler, | ||
| int64_t* rows_affected, struct AdbcError* error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would rows_affected be populated? Maybe it should be included in schema metadata or something instead...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it could go in ArrowAsyncProducer's additional_metadata. Or, provide an AdbcAsyncProducerGetMetadata(ArrowAsyncProducer*) like how we have an AdbcErrorFromArrayStream. (Speaking of which, we need an equivalent of that for the async stream.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's a good point. Since it's an asynchronous execution it wouldn't be populated before returning. I guess we can define a canonical metadata key to indicate total rows?
|
Not to expand the scope too much, but looking at #1514 I wonder if we should add this parameter to Execute. (It might be cleaner to just add a separate call for it, though.) |
I'm not sure if an async variant of Would an async variant for Prepare be something like |
|
Added more functions based on the comments and back-and-forth here. The current collection of new functions is:
personally, I'd prefer a separate call for this and do it in a follow-up PR rather than in this one. But I'm open to it here if there's consensus. |
|
Prepare may have to do I/O, hence it should have an async variant. |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Probably an async wrapper around a sync array stream in nanoarrow would help fill some of these in.
It's a bummer we have to repeat so much documentation but I suppose the existing documentation has mostly stayed the same.
Not to expand the scope too much, but looking at #1514 I wonder if we should add this parameter to Execute. (It might be cleaner to just add a separate call for it, though.)
I would love this (although I think a separate function makes sense). I can write that up as separate PR if that would be helpful.
Wouldn't it make more sense to just have nanoarrow implement the Async stream handler from the C Data interface?
It would! Thanks! |
Added StatementPrepareAsync |
|
|
Just a shower thought (that can be split into a separate discussion) but I wonder, instead of adding async and non-async versions of APIs that don't return data, we could just add a set of |
I like the idea that lets us avoid needing to have async and non-async versions of the functions, but I think this might be too complex. @CurtHagenlocher @paleolimbot thoughts? |
| /// being returned successfully. ADBC_STATUS_NOT_FOUND is returned | ||
| /// when there are no more result sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like overloading the return code. I think it'd be more consistent to return a schema with no release callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can update this to do that, then I'll attempt to implement these APIs and see how it goes
A draft for a possible update to the ADBC API to add several driver methods:
This would also be a 1.2.0 revision of the ADBC API
Actually implementing these functions would be done in a separate PR once we've reached consensus on this addition to the API