-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
Currently if X and y have common columns the error ValueError: Xandy must not share column names is thrown.
Would it be possible possible to check for common columns in X and y after the recipe has been applied?
Given that Drop and Select would be there, It would make more sense to enforce no column columns after the pipeline has processed, not before.
import pandas as pd
import ibis
import ibis_ml as ml
con = ibis.duckdb.connect()
df = pd.DataFrame({
'cat1': ['AA', 'BBB', 'AA', 'BBB', 'CCC'],
'cat2': ['X', 'Y', 'Y', 'X', 'Z'],
'value': [10, 20, 30, 40, 50]
})
tbl = con.create_table("tmp", df, overwrite=True)
tr_oe = ml.Recipe(
ml.OrdinalEncode(ml.string(), min_frequency=2),
ml.Drop("value")
).fit(tbl, tbl.value)
# ValueError: `X` and `y` must not share column names
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
backlog