This repository accompanies the paper Higgs Production Classifier Using Weak Supervision. It implements deep-learning methods to distinguish Vector Boson Fusion (VBF) from Gluon–Gluon Fusion (GGF) Higgs production modes using the Classification Without Labels (CWoLa) framework.
The project extends the ideas in Classification Without Labels: Learning from Mixed Samples in High Energy Physics and adapts full-event and particle-level architectures for weakly supervised learning directly from mixed samples.
NTUHEPML-CWoLa/
│
├── data/ # Place Zenodo + Google Drive data here
├── src/ # Model implementations and utilities
├── scripts/ # Training, inference, and helper scripts
├── notebooks/ # Analysis and figure-generation notebooks
├── output/ # Saved checkpoints (after running)
├── figures/ # Paper figures (if downloaded from Drive)
└── environment.yml # Conda environment specification
# Linux example
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.shconda env create -f environment.yml # first time
conda env update -f environment.yml # when environment.yml changesconda activate cwola
conda deactivate# Path to Miniconda environment packages
PYTHONPATH=~/miniconda3/envs/cwola/lib/python3.12/site-packages
# Or add project source
PYTHONPATH=~/miniconda3/envs/cwola/lib/python3.12/site-packages:~/NTUHEPML-CWoLa/src
- Simulated Higgs dataset: available on Zenodo. The downloaded dataset should be placed in the directory
data/. - Additional files (cuts, checkpoints, figures): on Google Drive
Place the following google drive folders into corresponding local folders:
| Folder | Contents | Notes |
|---|---|---|
data/ |
.npy files |
must be combined with Zenodo dataset |
output/ |
pretrained model checkpoints | for reproducing paper results |
figures/ |
paper figures | generated by notebooks in ./notebooks |
For detail PyTorch implementation, see src/model_*.py
A convolutional architecture inspired by VBF vs. GGF Higgs with Full-Event Deep Learning, processing full-event images for binary classification.
A lighter implementation of the Particle Transformer Particle Transformer for Jet Tagging. It uses attention mechanisms adapted for particle-wise inputs.
All training pipelines rely on PyTorch and Lightning. Use script/run.sh to avoid manually activating Conda each time.
Example usage:
python ./training.py --channel diphoton --data_mode supervised
python ./training.py --channel diphoton --data_mode jet_flavor
python ./training.py --channel diphoton --data_mode jet_flavor --num_phi_augmentation 5--channel
Selects decay channel:
-
diphoton:$H\to\gamma\gamma$ -
zz4l:$H\to ZZ \to 4\ell$ -
za2l:$H\to Z\gamma \to 2\ell\gamma$
--data_mode
supervised: Uses simulated truth labels.jet_flavor(CWoLa mode): Generates mixed samples weighted by cross sections, branching ratios, and luminosities. Labels correspond to the sample origin (not true VBF/GGF).
--num_phi_augmentation
Applies φ-rotation augmentation (integer number of rotations applied).
Model checkpoints are saved to:
./output/<channel>/<data_mode>/<timestamp>/
To run inference:
-
Identify the timestamp folder for your trained model.
-
Edit the
inference_info_listinside./script/inference.py. -
Run via
run.shor manually:python ./inference.py
The inference script will load the specified checkpoint(s) and generate evaluation outputs.
If you use this repository or the accompanying datasets, please cite the corresponding paper.