Skip to content

Commit ada523a

Browse files
mattijndsmedia
andauthored
feat: Adopt UAX-31 compliant dataset names (#702)
* update dataset names * maxsplit 1 * feat: add uax-31 compliance to build_datapackage.py This commit introduces a new naming strategy to ensure every resource has a unique, UAX-31 compliant identifier. The new implementation works as follows: - A preliminary scan of the `/data` directory identifies dataset basenames that have multiple file extensions. - A new `make_uax31_name` function sanitizes the filename to create a valid Python identifier (replaces hyphens, prefixes numbers). - For datasets with multiple formats, the file format is appended as a suffix to the name to guarantee uniqueness. - Note: Adding a new format for an existing dataset will rename the original resource to include a suffix. * feat: Enforce snake_case for dataset names in datapackage Updates the `build_datapackage.py` script to ensure all generated dataset names are `snake_case`. The changes include: - A new `to_snake_case` helper function to convert camelCase strings. - The `make_uax31_name` function now uses this helper to sanitize all dataset names before they are written to `datapackage.json`. - This resolves issues where filenames like `londonBoroughs.json` would result in a non-standard `camelCase` identifier. --------- Co-authored-by: dsmedia <[email protected]>
1 parent b387277 commit ada523a

File tree

3 files changed

+226
-149
lines changed

3 files changed

+226
-149
lines changed

0 commit comments

Comments
 (0)