dataviper

dataviper
Example
Why?
Data Sources
APIs
Issues and TODOs

dataviper

dataviper is a SQL-based tool to get the basic data preparation done in easy way, with doing

Create "Data Profile" report of a table
One-hot encode for "Categorical Columns" and create a "one-hot" table
Check "Joinability" between 2 tables
// TODO: and more

Example

pip install dataviper

Your main.py will look like this

from dataviper import Client
from dataviper.source import MySQL

client = Client(source=MySQL({
    'host': 'localhost',
    'user': 'root',
    'password': 'xxxxxx',
    'database': 'your_database'
}))

with client.connect():
    profile = client.profile('Your_Table')
    profile.to_excel()

python3 main.py
# Then you will get 'profile_Your_Table.xlsx' 🤗

Why?

It's known that "Data Profiling" needs to be done with scanning all the rows in a table. If you try to do this naively by pandas or any libraries which internally use pandas, it's not avoidable to use bunch of memory of your local machine and freeze your work.

dataviper is a SQL-based Data Profiling tool, which simply and dynamically generates SQLs and lets the database machine do the annoying calculation.

With dataviper, you don't have to have massive local computer. All you need are a stable network and reachable SQL db.

Data Sources

You can choose your data source from

APIs

`profile`

Create "Data Profile" excel file of a specified table.

When you have Sales table like this

id	region	sales_type	price	rep_id
1	jp	phone	240	115723
2	us	web	90	125901
3	jp	web	560	8003
4	us	shop	920	182234
5	jp	NULL	90	92231
6	us	shop	180	100425
7	us	shop	70	52934

do

with client.connect() as conn:
    table_name = 'Sales'
    profile = client.profile(table_name, example_count=3)
    profile.to_excel()

then you will get profile_Sales.xlsx file with

column_name	data_type	null_count	null_%	unique_count	unique_%	min	max	avg	std	example_top_3	example_last_3
id	bigint	0	0	7	100.00	1	7	4.0	2.0	[1,2,3]	[5,6,7]
region	varchar	0	0	2	28.57					[jp,us,jp]	[jp,us,us]
sales_type	varchar	1	14.28	3	42.85					[phone,web,web]	[None,shop,shop]
price	int	0	0	6	85.71	70	920	307.1428	295.379	[240,90,560]	[90,180,70]
rep_id	int	0	0	7	100.00	8003	182234	96778.7142	51195.79065	[115723,125901,8003]	[92231,100425,52934]

`pivot`

Spread categorical columns to N binary columns.

When you have Sales table like above, do

with client.connect() as conn:
    table_name = 'Sales'
    key = 'id'
    categorical_columns = ['region', 'sales_type']
    profile = client.get_schema(table_name)
    client.pivot(profile, key, categorical_columns)

then you will get Sales_PIVOT_YYYYmmddHHMM table with

id	region_jp	region_us	sales_type_phone	sales_type_web	sales_type_shop
1	1	0	1	0	0
2	0	1	0	1	0
3	1	0	0	1	0
4	0	1	0	0	1
5	1	0	0	0	0
6	0	1	0	0	1
7	0	1	0	0	1

`joinability`

Count how much 2 tables can be joined.

When you have Sales table like above, and Reps table like this

id	name	tenure
8003	Hiromu	9
8972	Ochiai	6
52934	Taro	1
92231	otiai10	2
100425	Hanako	7
125901	Chika	3
182234	Mary	5
199621	Jack	1

do

with client.connect() as conn:
    report = client.joinability(on={'Sales': 'rep_id', 'Reps': 'id'})
    report.to_excel()

then you will get join_Sales_Reps.xlsx file with

table	key	total	match	match_%	drop	drop_%
Sales	[rep_id]	7	6	85.714	1	14.285
Reps	[id]	8	6	75.00	2	25.00

Issues and TODOs

https://github.com/otiai10/dataviper/issues

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
dataviper		dataviper
dockerfiles		dockerfiles
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dataviper

Example

Why?

Data Sources

APIs

`profile`

`pivot`

`joinability`

Issues and TODOs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

otiai10/dataviper

Folders and files

Latest commit

History

Repository files navigation

dataviper

Example

Why?

Data Sources

APIs

profile

pivot

joinability

Issues and TODOs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

`profile`

`pivot`

`joinability`

Packages