Skip to content

maplexgitx0302/NTUHEPML-CWoLa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Higgs Production Classifier Using Weak Supervision

This repository accompanies the paper Higgs Production Classifier Using Weak Supervision. It implements deep-learning methods to distinguish Vector Boson Fusion (VBF) from Gluon–Gluon Fusion (GGF) Higgs production modes using the Classification Without Labels (CWoLa) framework.

The project extends the ideas in Classification Without Labels: Learning from Mixed Samples in High Energy Physics and adapts full-event and particle-level architectures for weakly supervised learning directly from mixed samples.


Repository Structure

NTUHEPML-CWoLa/
│
├── data/                 # Place Zenodo + Google Drive data here
├── src/                  # Model implementations and utilities
├── scripts/              # Training, inference, and helper scripts
├── notebooks/            # Analysis and figure-generation notebooks
├── output/               # Saved checkpoints (after running)
├── figures/              # Paper figures (if downloaded from Drive)
└── environment.yml       # Conda environment specification

Environment Setup

1. Install Miniconda

# Linux example
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh

2. Create the project environment

conda env create -f environment.yml      # first time
conda env update -f environment.yml      # when environment.yml changes

3. Activate / deactivate

conda activate cwola
conda deactivate

4. (Optional) VSCode .env setup

# Path to Miniconda environment packages
PYTHONPATH=~/miniconda3/envs/cwola/lib/python3.12/site-packages

# Or add project source
PYTHONPATH=~/miniconda3/envs/cwola/lib/python3.12/site-packages:~/NTUHEPML-CWoLa/src

5. Dataset Access

  • Simulated Higgs dataset: available on Zenodo. The downloaded dataset should be placed in the directory data/.
  • Additional files (cuts, checkpoints, figures): on Google Drive

Place the following google drive folders into corresponding local folders:

Folder Contents Notes
data/ .npy files must be combined with Zenodo dataset
output/ pretrained model checkpoints for reproducing paper results
figures/ paper figures generated by notebooks in ./notebooks

Models

For detail PyTorch implementation, see src/model_*.py

CNN_EventCNN

A convolutional architecture inspired by VBF vs. GGF Higgs with Full-Event Deep Learning, processing full-event images for binary classification.

ParT_Light

A lighter implementation of the Particle Transformer Particle Transformer for Jet Tagging. It uses attention mechanisms adapted for particle-wise inputs.


Training

All training pipelines rely on PyTorch and Lightning. Use script/run.sh to avoid manually activating Conda each time.

Example usage:

python ./training.py --channel diphoton --data_mode supervised
python ./training.py --channel diphoton --data_mode jet_flavor
python ./training.py --channel diphoton --data_mode jet_flavor --num_phi_augmentation 5

Command-line arguments

--channel Selects decay channel:

  • diphoton : $H\to\gamma\gamma$
  • zz4l : $H\to ZZ \to 4\ell$
  • za2l : $H\to Z\gamma \to 2\ell\gamma$

--data_mode

  • supervised: Uses simulated truth labels.
  • jet_flavor (CWoLa mode): Generates mixed samples weighted by cross sections, branching ratios, and luminosities. Labels correspond to the sample origin (not true VBF/GGF).

--num_phi_augmentation Applies φ-rotation augmentation (integer number of rotations applied).


Inference

Model checkpoints are saved to:

./output/<channel>/<data_mode>/<timestamp>/

To run inference:

  1. Identify the timestamp folder for your trained model.

  2. Edit the inference_info_list inside ./script/inference.py.

  3. Run via run.sh or manually:

    python ./inference.py

The inference script will load the specified checkpoint(s) and generate evaluation outputs.


Citation

If you use this repository or the accompanying datasets, please cite the corresponding paper.

About

Applying "CWoLa" on simulated Higgs dataset with CNN and Particle Transformer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published