Skip to content
This repository was archived by the owner on Aug 18, 2021. It is now read-only.
This repository was archived by the owner on Aug 18, 2021. It is now read-only.

Data Quality Score Algorithm - Potential Issue #20

@RicardoAReyes

Description

@RicardoAReyes

The new Data Quality Score algorithm code and documentation appears to have some potential errors. And we should consider reviewing the solution to ensure that all required fields are being score and optional fields are weighted less in point value.

Algorithm Documentation:

https://github.com/GSA/code-gov/blob/master/data_quality_scoring.md

Algorithm Rules:

https://github.com/GSA/code-gov-harvester/blob/master/libs/rules/index.js

The documentation shows the field's point value assignment for the Metadata Schema 2.0.0 required and optional fields. The screenshot below shows the documentation point assignment per field, and the source code list for fields that are being evaluated on reach repo.

Screen Shot 2019-05-30 at 10 06 36 AM

The solution is missing the following fields:

  agency
  measurementType
  releases

Keep in mind that "organization" is an optional field nested under "releases".

Also, the algorithm evaluates each repo on all required and all options fields, not just the required fields. Meaning that each repo is graded on a 158 total points scale, not just the 71 required fields total points.

 (Repo Total Points / 158 ) * 10 = repo score. 

Perhaps consider scaling the optional fields as a bonus point value to the overall score?

i.e.

 (Repo Required Fields Points / 71 ) * 10 + (Repo Optional Fields Points / 87)*10 = repo score. 

Or consider alternative solution in which optional fields are not negatively impacting the required fields point score. Not all agencies are populating their code.json metadata file with optional fields on their releases/repos and that impacts their overall Data Quality Score.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions