Skip to content

MaxRowsError for DuckDB table #557

@asterix314

Description

@asterix314

I'm using

  • vegafusion 2.0.1
  • altair 5.5.0
  • duckdb 1.1.3

After loading a dudkdb table from a csv file (some 20K lines),

import duckdb

housing = duckdb.read_csv("housing.csv")

I encountered the MaxRowsError when trying to draw a histogram with altair, even after enabling the "vegafusion" data transformer. The code works when I convert the DuckDBPyRelation to a polars dataframe (alt.Chart(housing.pl())), though.

import altair as alt
alt.data_transformers.enable("vegafusion")
alt.renderers.enable("jupyter")

(
    alt.Chart(housing).mark_bar()
    .encode(
        alt.X('population').bin(maxbins=50).title(None),
        alt.Y('count()').title(None))
    .properties(width=200, height=100)
)

The error message was:

---------------------------------------------------------------------------
MaxRowsError                              Traceback (most recent call last)
File ~/.cache/pypoetry/virtualenvs/lab-home-wtptIbf4-py3.12/lib/python3.12/site-packages/altair/vegalite/v5/api.py:1998, in TopLevelMixin.to_dict(self, validate, format, ignore, context)
   1995     except TypeError:
   1996         # Non-narwhalifiable type supported by Altair, such as dict
   1997         data = original_data
-> 1998     copy.data = _prepare_data(data, context)
   1999     context["data"] = data
   2001 # remaining to_dict calls are not at top level

File ~/.cache/pypoetry/virtualenvs/lab-home-wtptIbf4-py3.12/lib/python3.12/site-packages/altair/vegalite/v5/api.py:283, in _prepare_data(data, context)
    281 elif not isinstance(data, dict) and _is_data_type(data):
    282     if func := data_transformers.get():
--> 283         data = func(nw.to_native(data, pass_through=True))
    285 # convert string input to a URLData
    286 elif isinstance(data, str):

File ~/.cache/pypoetry/virtualenvs/lab-home-wtptIbf4-py3.12/lib/python3.12/site-packages/altair/utils/_vegafusion_data.py:105, in vegafusion_data_transformer(data, max_rows)
    100     return {"url": VEGAFUSION_PREFIX + table_name}
    101 else:
    102     # Use default transformer for geo interface objects
    103     # # (e.g. a geopandas GeoDataFrame)
    104     # Or if we don't recognize data type
--> 105     return default_data_transformer(data)

File ~/.cache/pypoetry/virtualenvs/lab-home-wtptIbf4-py3.12/lib/python3.12/site-packages/altair/vegalite/data.py:42, in default_data_transformer(data, max_rows)
     39     return pipe
     41 else:
---> 42     return to_values(limit_rows(data, max_rows=max_rows))

File ~/.cache/pypoetry/virtualenvs/lab-home-wtptIbf4-py3.12/lib/python3.12/site-packages/altair/utils/data.py:165, in limit_rows(data, max_rows)
    162     values = data
    164 if max_rows is not None and len(values) > max_rows:
--> 165     raise_max_rows_error()
    167 return data

File ~/.cache/pypoetry/virtualenvs/lab-home-wtptIbf4-py3.12/lib/python3.12/site-packages/altair/utils/data.py:148, in limit_rows.<locals>.raise_max_rows_error()
    135 def raise_max_rows_error():
    136     msg = (
    137         "The number of rows in your dataset is greater "
    138         f"than the maximum allowed ({max_rows}).\n\n"
   (...)
    146         "on how to plot large datasets."
    147     )
--> 148     raise MaxRowsError(msg)

MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (5000).

Try enabling the VegaFusion data transformer which raises this limit by pre-evaluating data
transformations in Python.
    >> import altair as alt
    >> alt.data_transformers.enable("vegafusion")

Or, see https://altair-viz.github.io/user_guide/large_datasets.html for additional information
on how to plot large datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions