Skip to content

Conversation

@rjzamora
Copy link
Member

Brief proposal for a performance-motivated metadata update in Dask-DataFrame.

NOTE: Although this proposal is distinct from a high-level graph or query-optimization system, it should make such a system much easier to implement! I say this, because we will still want to be tracking and managing the same kind of metadata in one place.

See also: dask/dask#9473

@rjzamora
Copy link
Member Author

cc @jrbourbeau @mrocklin for viz

There is not much to review here yet, but I am starting to organize my thoughts on this, and I'm feeling somewhat confident that this would be a worthwhile effort. The rough POC wasn't very difficult to put together, and I think proper HLG/HLE-optimization would require us to do much of the same refactoring anyway (unless we choose to do high-level optimization in a completely new library, that is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant