This repository contains the data collected and crowdsourcing codebase used in What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks? (to appear at ACL-IJCNLP 2021).
datacontains all collected data for our four crowdsourcing protocols.- Each question has human validations and model predictions.
data/intermediate_stagescontains questions collected in the iterative feedback stages of thecrowdandexpertprotocols.
interfacecontains the codebase used in our data collection.- Refer to
interface/README.mdfor running the application.
- Refer to
interface_screenshotscontains images showing the user interfaces for the tutorial, writing, and validation tasks used for all four protocols.
The collected data is released under Creative Commons Attribution 4.0 International License.
@inproceedings{nangia-etal-2021-ingredients,
title={What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult {NLU} Data Collection Tasks?},
author={Nikita Nangia, Saku Sugawara, Harsh Trivedi, Alex Warstadt, Clara Vania, Samuel R. Bowman},
booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing},
month=aug,
year={2021},
address = {Online},
publisher = {Association for Computational Linguistics},