Higgs Production Classifier Using Weak Supervision

This repository accompanies the paper Higgs Production Classifier Using Weak Supervision. It implements deep-learning methods to distinguish Vector Boson Fusion (VBF) from Gluon–Gluon Fusion (GGF) Higgs production modes using the Classification Without Labels (CWoLa) framework.

The project extends the ideas in Classification Without Labels: Learning from Mixed Samples in High Energy Physics and adapts full-event and particle-level architectures for weakly supervised learning directly from mixed samples.

Repository Structure

NTUHEPML-CWoLa/
│
├── data/                 # Place Zenodo + Google Drive data here
├── src/                  # Model implementations and utilities
├── scripts/              # Training, inference, and helper scripts
├── notebooks/            # Analysis and figure-generation notebooks
├── output/               # Saved checkpoints (after running)
├── figures/              # Paper figures (if downloaded from Drive)
└── environment.yml       # Conda environment specification

Environment Setup

1. Install Miniconda

# Linux example
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh

2. Create the project environment

conda env create -f environment.yml      # first time
conda env update -f environment.yml      # when environment.yml changes

3. Activate / deactivate

conda activate cwola
conda deactivate

4. (Optional) VSCode `.env` setup

# Path to Miniconda environment packages
PYTHONPATH=~/miniconda3/envs/cwola/lib/python3.12/site-packages

# Or add project source
PYTHONPATH=~/miniconda3/envs/cwola/lib/python3.12/site-packages:~/NTUHEPML-CWoLa/src

5. Dataset Access

Simulated Higgs dataset: available on Zenodo. The downloaded dataset should be placed in the directory data/.
Additional files (cuts, checkpoints, figures): on Google Drive

Place the following google drive folders into corresponding local folders:

Folder	Contents	Notes
`data/`	`.npy` files	must be combined with Zenodo dataset
`output/`	pretrained model checkpoints	for reproducing paper results
`figures/`	paper figures	generated by notebooks in `./notebooks`

Models

For detail PyTorch implementation, see src/model_*.py

CNN_EventCNN

A convolutional architecture inspired by VBF vs. GGF Higgs with Full-Event Deep Learning, processing full-event images for binary classification.

ParT_Light

A lighter implementation of the Particle Transformer Particle Transformer for Jet Tagging. It uses attention mechanisms adapted for particle-wise inputs.

Training

All training pipelines rely on PyTorch and Lightning. Use script/run.sh to avoid manually activating Conda each time.

Example usage:

python ./training.py --channel diphoton --data_mode supervised
python ./training.py --channel diphoton --data_mode jet_flavor
python ./training.py --channel diphoton --data_mode jet_flavor --num_phi_augmentation 5

Command-line arguments

--channel Selects decay channel:

diphoton : $H\to\gamma\gamma$
zz4l : $H\to ZZ \to 4\ell$
za2l : $H\to Z\gamma \to 2\ell\gamma$

--data_mode

supervised: Uses simulated truth labels.
jet_flavor (CWoLa mode): Generates mixed samples weighted by cross sections, branching ratios, and luminosities. Labels correspond to the sample origin (not true VBF/GGF).

--num_phi_augmentation Applies φ-rotation augmentation (integer number of rotations applied).

Inference

Model checkpoints are saved to:

./output/<channel>/<data_mode>/<timestamp>/

To run inference:

Identify the timestamp folder for your trained model.
Edit the inference_info_list inside ./script/inference.py.
Run via run.sh or manually:
```
python ./inference.py
```

The inference script will load the specified checkpoint(s) and generate evaluation outputs.

Citation

If you use this repository or the accompanying datasets, please cite the corresponding paper.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
config		config
notebooks		notebooks
scripts		scripts
src		src
.env		.env
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Higgs Production Classifier Using Weak Supervision

Repository Structure

Environment Setup

1. Install Miniconda

2. Create the project environment

3. Activate / deactivate

4. (Optional) VSCode `.env` setup

5. Dataset Access

Models

CNN_EventCNN

ParT_Light

Training

Command-line arguments

Inference

Citation

About

Uh oh!

Releases

Packages

Languages

maplexgitx0302/NTUHEPML-CWoLa

Folders and files

Latest commit

History

Repository files navigation

Higgs Production Classifier Using Weak Supervision

Repository Structure

Environment Setup

1. Install Miniconda

2. Create the project environment

3. Activate / deactivate

4. (Optional) VSCode .env setup

5. Dataset Access

Models

CNN_EventCNN

ParT_Light

Training

Command-line arguments

Inference

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

4. (Optional) VSCode `.env` setup

Packages