Evaluation and agreement scripts for the DISCOSUMO project. Each evaluation script takes both manual annotations as automatic summarization output. The formatting of these files is highly project-specific. However, the evaluation functions for precision, recall, ROUGE, Jaccard, Cohen's kappa and Fleiss' kappa may be applicable to other domains too.
- The script
agreement_and_eval.plalso implements three baselines: random, position and length. - The script
eval.plis the only script that assumes that the ground truth labels are in the featurefile (also for the oracle ranking) and therefore only takes one argument. - The script
agreement_fleisskappa.pydoes not evaluate but only implements Fleiss' Kappa, and takes only the manual annotations as input.
The postids_per_thread.queries.txt, 106long20threads.postfeats.norm.out and 106long20threads.threadfeats.out are needed for some of the scripts to know the post ids per thread or number of posts per thread.
See the LICENSE file for license rights and limitations (GNU-GPL v3.0).