Skip to content

Proposal: Support for Parameterized Views #301

@trueleo

Description

@trueleo

This crate currently supports registering remote tables from different providers. However, it currently lacks support for registering remote views and procedures, making it difficult to query from views and parametrized view. This feature would be particularly useful for many people who want datafusion to only have partial access to a table ( for ex. in multi-tenancy ) or hiding a very complicated vendor specific logic abstracted behind a view.

Problems

  • Currently, there is no support for passing table function arguments to a SqlTable or any of vendor-specific implementation.
  • Many vendor implementations rely on specific query for fetching schema when a table is created. These queries are tailored for fetching table schema only and do not work for fetching schema of a view for example.
  • Additional changes need to happen in the datafusion-federation crate to support this functionality.

Proposed Solution:

Based on some experimenting that I have done till now, I want to propose following changes to bring this feature.
Either extending SqlTable ( and any similar vendor specific Table Provider ) to store optional arguments as sqlpaser::ast::TableFunctionArgs. Or create a new type similar to SqlTable called SqlView

Extending SyncDbConnection and AsyncDbConnection with new method get_view_schema for retrieving schema for a view.

When creating a SQL string for SqlExec , First create a basic logical plan with an alias set, unparse the logical plan to a Statement. Then use the following visitor to append the arguments if any.

pub struct TableArgReplace {
	pub(crate) args: TableFunctionArgs,
}

impl ast::VisitorMut for TableArgReplace {
	type Break = ();
	fn pre_visit_table_factor(&mut self, table_factor: &mut ast::TableFactor) -> ControlFlow<Self::Break> {
		if let ast::TableFactor::Table { args, .. } = table_factor {
			*args = Some(self.args.clone())
		}
		ControlFlow::Continue(())
	}
}

Similar approach can done in datafusion-federation sql module.

Other Benefits

This will also allow for sort of curried table functions. Essentially allowing for defining a function like read_from(path), which can the internally create a new table provider that calls a remote view read_from(path, cluster)

Open Questions:

  • Should this functionality be a part of SqlTable or a new type will be better for it?
  • What TableFactory for creating views will look like ?
  • Which vendors support fetching query for a view and how. ( I have some ideas for PostgreSQL and MySQL implementation )
  • Can we automatically discover arguments required for querying a view and register it for them as a table function in datafusion ?

We’d love feedback from the community on this proposal. Are there use cases we might be overlooking? Would this fit well with scope of this project.

Note

I will update this thread and provide a PR or reference to my forked branch that people can look at and see if this seems okay.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions