python_scripts

A selection of Python scripts for file processing, validation, and splitting.

Requirements

Python 3.x
pandas
openpyxl
tqdm
colorama
tabulate

Setting Up a Virtual Environment

It is recommended to use a virtual environment to manage dependencies. Follow these steps to create and activate a virtual environment:

Create a virtual environment:
```
python3 -m venv venv
```
Activate the virtual environment:
- On Linux and macOS:
```
source venv/bin/activate
```
- On Windows:
```
.\venv\Scripts\activate
```

Install the required packages:

pip install pandas openpyxl tqdm colorama tabulate

Scripts

1. `file-splitter.py`

This script splits large CSV or Excel files into smaller files compatible with spreadsheet applications like Google Sheets, Excel, or LibreOffice.

Usage

Basic usage:

python file-splitter.py large_data_file.csv

Advanced usage:

python file-splitter.py large_data_file.xlsx --output-dir=split_files --target-app=google_sheets --output-format=csv --max-rows=30000 --max-size-mb=15

Command-Line Arguments

input_file: Path to the large CSV or Excel file to split
--output-dir: Directory to save output files (optional)
--target-app: Target application (excel, google_sheets, or libreoffice)
--output-format: Format for output files (csv or excel)
--max-rows: Maximum number of rows per output file
--max-size-mb: Maximum file size in MB per output file
--chunk-size: Number of rows to process at a time (for memory efficiency)

2. `data-validator.py`

This script validates CSV and Excel files before database ingestion by checking for various issues like missing values, duplicate primary keys, data type consistency, and more.

Usage

Basic usage:

python data-validator.py input_file.csv --pk id

Advanced usage:

python data-validator.py input_file.xlsx --pk "id,email" --required "name,email,phone" --config data-validator-config.json

Command-Line Arguments

input_file: Path to the CSV or Excel file to validate
--pk: Column name(s) to use as primary key(s) for duplicate detection
--required: Comma-separated list of columns that must not contain empty values
--config: Path to JSON configuration file with validation rules

3. `csv-excel-processor.py`

This script processes input CSV or Excel files by reading input files, validating column headers, renaming specified columns, transforming data, generating unique identifiers, and outputting the processed data to a new file.

Usage

Basic usage:

python csv-excel-processor.py input_file.csv

Advanced usage:

python csv-excel-processor.py input_file.csv --output-file=processed_file.csv --output-format=csv --chunk-size=5000

Command-Line Arguments

input_file: Path to the input CSV or Excel file
--output-file: Path to the output file (optional)
--output-format: Format of the output file (csv or excel)
--chunk-size: Number of rows to process at a time

License

This repository is licensed under the MIT License. See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
csv-column-fixer.py		csv-column-fixer.py
csv-excel-processor.py		csv-excel-processor.py
csv-excel-processor_events_output_check_duplicate_hard_id.py		csv-excel-processor_events_output_check_duplicate_hard_id.py
csv-excel-processor_profile_events_output.py		csv-excel-processor_profile_events_output.py
data-validator-config.json		data-validator-config.json
data-validator.py		data-validator.py
extract_emails_from_eml.py		extract_emails_from_eml.py
file-splitter.py		file-splitter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

python_scripts

Requirements

Setting Up a Virtual Environment

Scripts

1. `file-splitter.py`

Usage

Command-Line Arguments

2. `data-validator.py`

Usage

Command-Line Arguments

3. `csv-excel-processor.py`

Usage

Command-Line Arguments

License

About

Uh oh!

Releases

Packages

Languages

License

StefTzor/python_scripts

Folders and files

Latest commit

History

Repository files navigation

python_scripts

Requirements

Setting Up a Virtual Environment

Scripts

1. file-splitter.py

Usage

Command-Line Arguments

2. data-validator.py

Usage

Command-Line Arguments

3. csv-excel-processor.py

Usage

Command-Line Arguments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. `file-splitter.py`

2. `data-validator.py`

3. `csv-excel-processor.py`

Packages