Skip to content

Add --optimized mode #55

@igorbrigadir

Description

@igorbrigadir

The current output favors preserving as much information as possible from the original json, but there is some duplication, and a bunch of columns can be removed as they're rarely super useful.

The new --optimized mode will generate CSVs that drop a bunch of columns to save space:

edit_controls.edits_remaining
edit_controls.editable_until
entities.cashtags
entities.hashtags
entities.mentions
withheld.scope
withheld.copyright
author.id
author.entities.description.cashtags
author.entities.description.hashtags
author.entities.description.mentions
author.url
author.withheld.scope
author.withheld.copyright
geo.coordinates.coordinates
geo.coordinates.type
geo.country
geo.full_name
geo.geo.type
matching_rules
__twarc.retrieved_at
__twarc.url
__twarc.version

(exact list to be revised later)

These are the most commonly not present or duplicate ones, where the missing data can be inferred from the columns left over, or with the cashtags, hashtags, mentions, with twitter-text for example.

Should probably fix #36 and #47 first before this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions