Skip to content

Optimise bitcoin export so it doesn't query all partitions every day #6

@medvedev1088

Description

@medvedev1088

Right now load_dag scans data in all partitions every day. In particular enrich transactions sql https://github.com/blockchain-etl/bitcoin-etl-airflow/blob/master/dags/resources/stages/enrich/sqls/transactions.sql needs to join inputs and outputs and requires scanning all past data.

An alternative is to enrich transactions in export_dag using https://github.com/blockchain-etl/bitcoin-etl#enrich_transactions.

This will reduce the BigQuery costs significantly.


This might also require changing timestamp field type from int to iso 8601 in export jobs (breaking compatibility change so will bump the version to 2.*), so that raw tables can be partitioned by this field. Now the raw tables are not partitioned which makes the enrich job scan whole table.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions