Skip to content

Conversation

@henry410213028
Copy link
Collaborator

@henry410213028 henry410213028 commented Aug 2, 2025

Types of changes

  • New feature

Description

2025 年開始無法直接透過 API 串接資料, 因此改開發一個匯入資料的腳本 cli/load_kktix_csv.py

此腳本根據配置檔案與資料轉換格式為 API 的輸出, 目的是復用之前匯入資料庫的程式 dags/ods/kktix_ticket_orders/udfs/kktix_loader.py

執行, 使用 tests/kktix_load_csv 的測試資料示範

# on project root
# setup .venv python environment
source .venv/bin/activate
export PYTHONPATH=./dags

python cli/load_kktix_csv.py \
    --gcp_credential_file "$HOME/.config/gcloud/pycontw-225217_credentials.json" \
    --gcp_project_id pycontw-225217 \
    --year 2023 \
    --event_name "PyCon Example 2023" \
    --ticket_group reserved \
    --meta_field_mapping_file tests/kktix_load_csv/default_meta_field_mapping.json \
    --data_field_names_file tests/kktix_load_csv/default_data_field_names.txt \
    --attendees_csv_file tests/kktix_load_csv/example-attendees.csv \
    --orders_csv_file tests/kktix_load_csv/example-orders.csv \
    --is_dry_run

腳本的參數說明可以使用 python cli/load_kktix_csv.py --help 查看

詳情已記錄於相關 文件

Checklist

  • [v] Add test cases to all the changes you introduce
  • [v] Run make lint and make test locally to ensure all linter checks and testing pass
  • [v] Update the documentation if necessary

Copy link
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is no longer part of a dag. should we move out and make it a separate project or put it in somewhere like cli/...

"""從文字檔載入 data 欄位名稱列表"""
logging.info(f"Loading data field names from {file_path}")
with file_path.open("r", encoding="utf-8") as f:
names = [line.strip() for line in f if line.strip()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
names = [line.strip() for line in f if line.strip()]
names = [name for line in f if (name := line.strip())]

Comment on lines 301 to 304
payload = []
for event_raw_data in event_raw_data_array:
sanitized_event_raw_data = _sanitize_payload(event_raw_data)
payload.append(sanitized_event_raw_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
payload = []
for event_raw_data in event_raw_data_array:
sanitized_event_raw_data = _sanitize_payload(event_raw_data)
payload.append(sanitized_event_raw_data)
payload = [
_sanitize_payload(event_raw_data)
for event_raw_data in event_raw_data_array:
]

for record in payload[:5]:
print(json.dumps(record, indent=2, ensure_ascii=False))

# save to payload.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# save to payload.json
# save to payload.json

logging.info("資料成功載入 BigQuery。")

except Exception as e:
logging.error(f"執行過程中發生未預期的錯誤: {e}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logging.error(f"執行過程中發生未預期的錯誤: {e}")
logging.exception(f"執行過程中發生未預期的錯誤: ")


def _load_to_bigquery(payload: list[dict]) -> None:
def load_to_bigquery_ods(
payload: list[dict], project_id: Optional[str], credential_file: Optional[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
payload: list[dict], project_id: Optional[str], credential_file: Optional[str]
payload: list[dict], project_id: str | None, credential_file: str | None


def _load_to_bigquery_dwd(payload: list[dict]) -> None:
def load_to_bigquery_dwd(
payload: list[dict], project_id: Optional[str], credential_file: Optional[str], ticket_group: Optional[str] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
payload: list[dict], project_id: Optional[str], credential_file: Optional[str], ticket_group: Optional[str] = None
payload: list[dict], project_id: str | None, credential_file: str | None, ticket_group: str | None = None

@henry410213028
Copy link
Collaborator Author

Thanks for your review. I’ve moved this script into the ./cli folder.

@henry410213028 henry410213028 merged commit ced111e into master Sep 21, 2025
2 checks passed
@henry410213028 henry410213028 deleted the load-kktix-csv branch September 21, 2025 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants