-
Notifications
You must be signed in to change notification settings - Fork 57
Description
This crate currently supports registering remote tables from different providers. However, it currently lacks support for registering remote views and procedures, making it difficult to query from views and parametrized view. This feature would be particularly useful for many people who want datafusion to only have partial access to a table ( for ex. in multi-tenancy ) or hiding a very complicated vendor specific logic abstracted behind a view.
Problems
- Currently, there is no support for passing table function arguments to a
SqlTableor any of vendor-specific implementation. - Many vendor implementations rely on specific query for fetching schema when a table is created. These queries are tailored for fetching table schema only and do not work for fetching schema of a view for example.
- Additional changes need to happen in the
datafusion-federationcrate to support this functionality.
Proposed Solution:
Based on some experimenting that I have done till now, I want to propose following changes to bring this feature.
Either extending SqlTable ( and any similar vendor specific Table Provider ) to store optional arguments as sqlpaser::ast::TableFunctionArgs. Or create a new type similar to SqlTable called SqlView
Extending SyncDbConnection and AsyncDbConnection with new method get_view_schema for retrieving schema for a view.
When creating a SQL string for SqlExec , First create a basic logical plan with an alias set, unparse the logical plan to a Statement. Then use the following visitor to append the arguments if any.
pub struct TableArgReplace {
pub(crate) args: TableFunctionArgs,
}
impl ast::VisitorMut for TableArgReplace {
type Break = ();
fn pre_visit_table_factor(&mut self, table_factor: &mut ast::TableFactor) -> ControlFlow<Self::Break> {
if let ast::TableFactor::Table { args, .. } = table_factor {
*args = Some(self.args.clone())
}
ControlFlow::Continue(())
}
}Similar approach can done in datafusion-federation sql module.
Other Benefits
This will also allow for sort of curried table functions. Essentially allowing for defining a function like read_from(path), which can the internally create a new table provider that calls a remote view read_from(path, cluster)
Open Questions:
- Should this functionality be a part of
SqlTableor a new type will be better for it? - What TableFactory for creating views will look like ?
- Which vendors support fetching query for a view and how. ( I have some ideas for PostgreSQL and MySQL implementation )
- Can we automatically discover arguments required for querying a view and register it for them as a table function in datafusion ?
We’d love feedback from the community on this proposal. Are there use cases we might be overlooking? Would this fit well with scope of this project.
Note
I will update this thread and provide a PR or reference to my forked branch that people can look at and see if this seems okay.