-
-
Notifications
You must be signed in to change notification settings - Fork 217
Commit ada523a
feat: Adopt UAX-31 compliant dataset names (#702)
* update dataset names
* maxsplit 1
* feat: add uax-31 compliance to build_datapackage.py
This commit introduces a new naming strategy to ensure every resource has a unique, UAX-31 compliant identifier.
The new implementation works as follows:
- A preliminary scan of the `/data` directory identifies dataset basenames that have multiple file extensions.
- A new `make_uax31_name` function sanitizes the filename to create a valid Python identifier (replaces hyphens, prefixes numbers).
- For datasets with multiple formats, the file format is appended as a suffix to the name to guarantee uniqueness.
- Note: Adding a new format for an existing dataset will rename the original resource to include a suffix.
* feat: Enforce snake_case for dataset names in datapackage
Updates the `build_datapackage.py` script to ensure all generated dataset names are `snake_case`.
The changes include:
- A new `to_snake_case` helper function to convert camelCase strings.
- The `make_uax31_name` function now uses this helper to sanitize all dataset names before they are written to `datapackage.json`.
- This resolves issues where filenames like `londonBoroughs.json` would result in a non-standard `camelCase` identifier.
---------
Co-authored-by: dsmedia <[email protected]>1 parent b387277 commit ada523aCopy full SHA for ada523a
File tree
Expand file treeCollapse file tree
3 files changed
+226
-149
lines changedOpen diff view settings
Filter options
- scripts
Expand file treeCollapse file tree
3 files changed
+226
-149
lines changedOpen diff view settings
0 commit comments