Investigate combined honest trees + isotonic calibration

## Background
Honest decision trees build upon conventional decision trees by splitting the samples into two sets: one for learning the decision tree structure and the other for learning the classification posterior probabilities. In practice, this provides better calibration (i.e. the estimated probabilities are closer to the true probabilities). See [this paper for details](https://arxiv.org/abs/1907.00325).

The [code](https://github.com/rflperry/ProgLearn/blob/UF/proglearn/forest.py#L564) and [experiments](https://github.com/rflperry/ProgLearn/tree/UF/benchmarks) for the above paper are located in a fork of ProgLearn. The minimum working code and tutorial is seen in [this notebook](https://nbviewer.org/github/EYezerets/ProgLearn/blob/sklearnUF/docs/tutorials/honest_posteriorestimates_runtime.ipynb). This code is separate from the honest tree code used in ProgLearn as there is no need for transfer/lifelong learning. As an upside, the code has been optimized for maximum efficiency and benchmarked.

## Request
An [issue was made in sklearn](https://github.com/scikit-learn/scikit-learn/issues/19710) and the simulations and paper attracted developer interest. The paper explored the performance of honest decision forests against the traditional forest as well as two other [calibration methods, sigmoid and isotonic](https://scikit-learn.org/stable/modules/calibration.html). A developer expressed interest in the results of combining honest trees with isotonic calibration given that isotonic calibration seems to do better than just honest posteriors. **The request is thus to run the simulations and cc18 experiments from the [paper](https://arxiv.org/abs/1907.00325) with the added honest + isotonic forest method to see if this combined approach gives better calibration results than either approach alone.**

## Proposed Workflow
As the current honest forest code and experiments lie on a fork, it may be worthwhile to first create a new repository for just the optimized honest forest code and experiments as a separate entity from proglearn. Either way, the rough workflow would be:

1. Write an `HonestTreeClassifier` class and then rewrite the [`UncertaintyForest`](https://github.com/rflperry/ProgLearn/blob/UF/proglearn/forest.py#L564) class to use the new honest trees. Consider renaming `UncertaintyForest` to `HonestForestClassifier` for consistency. Currently there is no honest decision tree code, rather `UncertaintyForest` builds an ensemble of honest decision trees using regular decision trees.
2. Verify that this honest decision tree can be used as the base estimator for the [sklearn isotonic calibration](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html#sklearn.calibration.CalibratedClassifierCV) just like the regular sklearn decision tree can be. This may require editing the honest tree class to conform to sklearn specific needs. This is probably the hardest step.
3. Rerun the [overlapping Gaussian simulation](https://github.com/rflperry/ProgLearn/tree/UF/benchmarks/uf_experiments/overlapping_gaussians) using this method too and determine the results.
4. If the method seems promising, run on the real [cc18 data experiments](https://github.com/rflperry/ProgLearn/blob/5c22d6fcb1b41e4ba588bd8a48eae3ee3b468e43/benchmarks/uf_experiments/cc18/run_all.py).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate combined honest trees + isotonic calibration #530

Background

Request

Proposed Workflow

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate combined honest trees + isotonic calibration #530

Description

Background

Request

Proposed Workflow

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions