Skip to content

A machine learning-based anomaly detection system for railway sensor data. Utilizes One-Class SVM and Isolation Forest to detect potential equipment failures, optimize maintenance, and enhance safety.

Notifications You must be signed in to change notification settings

altairje/Trainomaly

Repository files navigation

Doors Anomaly Detection Repository

This repository provides a set of scripts to train anomaly detection models on door sensor data and then run inference (predict anomalies) on new data. The workflow relies on Python 3.11 and several common data science libraries. All dependencies and environment setup are managed by Poetry via a pyproject.toml file.


1. Repository Structure

.
├── Doors_example/                      # Dataset
│   ├── csv_labeled/                    # Folder for labeled CSV files
│   ├── csv_unlabeled/                  # Folder for unlabeled CSV files
│   ├── label_json_mini/                # Dataet in JSON format
│   └── doors_anomaly_detector_config   # Configuration file for system
├── my_utils.py                         # Common utility functions: CSV loading, resampling, wagon filtering, metrics plotting, etc.
├── my_anomaly_train.py                 # Main script to train anomaly detection models
├── my_anomaly_inference.py             # Main script to run anomaly detection inference using trained models
├── trained_models/                     # Directory where trained model files (.pkl) are saved
├── training_stats/                     # Directory for saving training-related statistics (if needed)
├── result/                             # Directory for saving inference results (.csv)
├── pyproject.toml                      # Poetry configuration for environment dependencies
└── README.md                           # This file

2. Environment Setup

Follow the steps below to create and activate the environment using Poetry:

# 1. Install Poetry if not already installed:
pip install poetry

# 2. Install dependencies from pyproject.toml:
poetry install

# 3. Activate the environment shell:
poetry shell

From within this Poetry environment, you can run all scripts without conflicts.

3. Training Models

my_anomaly_train.py handles reading multiple CSV files, optionally resampling them, filtering for a specific wagon, and training one or more anomaly detection models. The script:

  1. Reads multiple CSV files (paths are currently hardcoded as examples).
  2. Filters columns for a specific wagon if needed.
  3. Resamples the data (e.g., every 10s).
  4. Splits the data into train/validation.
  5. Trains models (e.g., IsolationForest, OneClassSVM), either one per sensor or a single multivariate model.
  6. Saves trained model files (.pkl) in trained_models/.

To run:

python my_anomaly_train.py

By default, you'll see multiple .pkl files in trained_models/ (e.g., model_one_class_svm_ALL.pkl or model_isolation_forest_01_1_0040.pkl).

4. Running Inference

my_anomaly_inference.py loads those .pkl model files from trained_models/, reads new CSV test data, does the same optional resampling/filtering, and applies each model to predict anomalies. The script:

  1. Reads CSV files for test data (hardcoded in the script).
  2. Resamples/filters similarly to training.
  3. Loads each .pkl from trained_models/.
  4. Produces a DataFrame with columns:
    • <sensor_name>
    • anomaly_score (the negative of decision_function, higher => more anomalous)
    • anomaly_probability (a basic sigmoid transform of the score)
    • prediction ("anomaly" or "normal")
  5. If the test data includes an anomaly column, it will compute and optionally plot Precision/Recall/F1/etc.
  6. Saves results to result/ as CSV.

To run:

python my_anomaly_inference.py

5. Where Outputs Go

  • Trained model files (.pkl) are saved in trained_models/.
  • Training stats (if any) go to training_stats/.
  • Inference results (.csv) appear in result/.

6. Customizations

  • Dates/paths: Hardcoded in my_anomaly_train.py and my_anomaly_inference.py. Change them to suit your data structure.
  • Rolling features: If use_features=True, a range of rolling stats and correlations is computed in _add_features(). Tweak the rolling windows or remove that logic to reduce complexity.
  • Individual vs. multivariate: If individual_model=True, you’ll get multiple .pkl files (one per sensor). If False, you get just one file per model type (ALL in the filename).

About

A machine learning-based anomaly detection system for railway sensor data. Utilizes One-Class SVM and Isolation Forest to detect potential equipment failures, optimize maintenance, and enhance safety.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published