Skip to content

Commit c44b9f0

Browse files
authored
Weave generated reference improvements (#48)
* Fix Weave reference generation bugs Observed in wandb#1888 * Temporarily include fixes summary report
1 parent ec5b387 commit c44b9f0

File tree

8 files changed

+296
-132
lines changed

8 files changed

+296
-132
lines changed

WEAVE-REFERENCE-FIXES-SUMMARY.md

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# Weave Reference Generation Script Fixes
2+
3+
This document summarizes the fixes applied to the Weave reference documentation generation scripts based on PR #1888 feedback.
4+
5+
## Issues Fixed
6+
7+
### 1. Models Reference Files Being Renamed (CRITICAL BUG)
8+
**Problem**: `fix_casing.py` was incorrectly targeting `models/ref/python/public-api` files instead of Weave reference docs.
9+
10+
**Fix**: Updated `fix_casing.py` to only target `weave/reference/python-sdk` files.
11+
- Changed path from `models/ref/python/public-api` to `weave/reference/python-sdk`
12+
- Removed the logic that was renaming Models API files (ArtifactCollection, etc.)
13+
- Added clear comments indicating this should NEVER touch Models reference docs
14+
15+
**Files Modified**:
16+
- `scripts/reference-generation/weave/fix_casing.py`
17+
18+
### 2. TypeScript SDK Using PascalCase Filenames
19+
**Problem**: TypeScript SDK files were being generated with PascalCase filenames (e.g., `Dataset.mdx`, `WeaveClient.mdx`), which causes Git case-sensitivity issues.
20+
21+
**Fix**: Updated generation scripts to use lowercase filenames throughout.
22+
- Modified `generate_typescript_sdk_docs.py` to convert filenames to lowercase when creating `.mdx` files
23+
- Updated function and type-alias extraction to use lowercase filenames
24+
- Updated internal links to use lowercase paths
25+
26+
**Files Modified**:
27+
- `scripts/reference-generation/weave/generate_typescript_sdk_docs.py` (lines 259, 319-320, 369-370, 379)
28+
- `scripts/reference-generation/weave/fix_casing.py` (simplified to just convert to lowercase)
29+
30+
### 3. H1 in service-api/index.mdx
31+
**Problem**: The generated `service-api/index.mdx` had both a frontmatter title and an H1, which is redundant in Mintlify.
32+
33+
**Fix**: Removed the H1 heading since Mintlify uses the frontmatter title.
34+
35+
**Files Modified**:
36+
- `scripts/reference-generation/weave/generate_service_api_spec.py` (line 31)
37+
38+
### 4. Duplicate H3 Headings in service-api.mdx
39+
**Problem**: The `service-api.mdx` file had duplicate category sections (e.g., "### Calls" appeared on both line 23 and line 158), listing the same endpoints twice.
40+
41+
**Fix**: Added deduplication logic to prevent duplicate categories and duplicate endpoints.
42+
- Track which categories have been written to prevent duplicate H3 headings
43+
- Deduplicate endpoints within each category by (method, path) tuple
44+
- This prevents the same endpoint from being listed multiple times if it appears in the OpenAPI spec with duplicate tags
45+
46+
**Files Modified**:
47+
- `scripts/reference-generation/weave/update_service_api_landing.py` (lines 99-118)
48+
49+
### 5. Markdown Table Formatting Errors (------ lines)
50+
**Problem**: Python SDK docs contained standalone lines with just dashes (`------`) which break markdown parsing.
51+
52+
**Example**: In `trace_server_interface.mdx`, lines like 22, 30, 39, etc. had `------` that created invalid table structures.
53+
54+
**Fix**: Added regex pattern to remove these malformed table separators.
55+
- Pattern: `\n\s*------+\s*\n``\n\n`
56+
- This removes lines that are just dashes with optional whitespace
57+
58+
**Files Modified**:
59+
- `scripts/reference-generation/weave/generate_python_sdk_docs.py` (lines 258-260)
60+
61+
## Testing Recommendation
62+
63+
Before merging, test the fixes by running the reference generation locally:
64+
65+
```bash
66+
# From the docs repository root
67+
cd scripts/reference-generation/weave
68+
python generate_weave_reference.py
69+
```
70+
71+
Then verify:
72+
1. No files in `models/ref/python/public-api` were modified
73+
2. All TypeScript SDK files in `weave/reference/typescript-sdk/` have lowercase filenames
74+
3. `weave/reference/service-api/index.mdx` has no H1 heading
75+
4. `weave/reference/service-api.mdx` has no duplicate H3 category headings
76+
5. No `------` lines in `weave/reference/python-sdk/trace_server/trace_server_interface.mdx`
77+
6. In `docs.json`, modules under `weave/reference/python-sdk/trace/` are grouped as "Core" (not "Other")
78+
7. In `docs.json`, the Service API `openapi` configuration uses the local spec (not a GitHub URL) if sync_openapi_spec.py was run with `--use-local`
79+
80+
### 6. Incorrect Section Grouping ("Core" → "Other")
81+
**Problem**: Python SDK modules in the `trace/` directory were being incorrectly grouped as "Other" instead of "Core" in docs.json navigation.
82+
83+
**Root Cause**: The path checking logic in `update_weave_toc.py` was checking `if parts[0] == "weave"`, but paths are relative to `python-sdk/`, so `parts[0]` is actually the module subdirectory (`trace`, `trace_server`, etc.), not `weave`.
84+
85+
**Fix**: Corrected the path checking logic to check the actual first path component.
86+
- Changed from checking `parts[0] == "weave"` then `parts[1] == "trace"`
87+
- To directly checking `parts[0] == "trace"`, `parts[0] == "trace_server"`, etc.
88+
89+
**Files Modified**:
90+
- `scripts/reference-generation/weave/update_weave_toc.py` (lines 33-45)
91+
92+
### 7. OpenAPI Configuration Being Overwritten
93+
**Problem**: `update_weave_toc.py` was unconditionally overwriting the OpenAPI spec configuration in docs.json to use a remote URL, ignoring the local spec that `sync_openapi_spec.py` downloads and configures.
94+
95+
**Impact**: Even though `sync_openapi_spec.py` downloads the OpenAPI spec locally and can configure docs.json to use it, `update_weave_toc.py` would immediately overwrite it with a remote GitHub URL, defeating the purpose of the local spec.
96+
97+
**Fix**: Removed the Service API OpenAPI configuration code from `update_weave_toc.py`. This script should only manage Python/TypeScript SDK navigation, not the OpenAPI spec source.
98+
- Deleted lines 209-224 that were setting `page["openapi"]` to remote URLs
99+
- Added comment noting that OpenAPI configuration is managed by `sync_openapi_spec.py`
100+
101+
**Files Modified**:
102+
- `scripts/reference-generation/weave/update_weave_toc.py` (lines 206-207)
103+
104+
### 8. Missing Root Module Documentation (CRITICAL - WEAVE PACKAGE REGRESSION)
105+
**Problem**: The generated `python-sdk.mdx` file is only 8 lines (just frontmatter), completely missing all the important API documentation for functions like `init()`, `publish()`, `ref()`, `get()`, etc.
106+
107+
**Expected**: The current version (Weave 0.52.10) has 2074 lines documenting all the core Weave functions and classes.
108+
109+
**Root Cause**: **This is a WEAVE PACKAGE REGRESSION, not a script bug.**
110+
111+
Something changed in Weave between versions **0.52.10** (current docs) and **0.52.16** (PR version) that broke documentation generation for the root `weave` module. The generation scripts haven't changed, and lazydocs hasn't changed - so this is an upstream issue in the Weave package itself.
112+
113+
Possible causes:
114+
1. Changes to `weave/__init__.py` that affect how the module exports its public API
115+
2. Module structure refactoring that lazydocs can't handle
116+
3. New import patterns or lazy loading that breaks introspection
117+
118+
**Status**: **CRITICAL UPSTREAM BUG** - This makes the Python SDK documentation completely unusable for version 0.52.16.
119+
120+
**Action Required**: Report this to the Weave team immediately:
121+
1. File an issue: https://github.com/wandb/weave/issues
122+
2. Include: "Documentation generation broken in 0.52.16 - root module exports not discoverable by lazydocs"
123+
3. Mention: "Works fine in 0.52.10, broken in 0.52.16"
124+
4. Tag: @dbrian57 or relevant Weave maintainers
125+
126+
**Recommendation**:
127+
- **DO NOT MERGE PR #1888** - it will break Python SDK documentation
128+
- Either: Fix the Weave package and regenerate docs
129+
- Or: Stay on 0.52.10 documentation until the Weave package is fixed
130+
131+
**Files to Investigate** (in Weave repo):
132+
- `weave/__init__.py` between versions 0.52.10 and 0.52.16
133+
- Any structural changes to the weave package in that version range
134+
135+
### 9. OpenAPI Spec Validation (New Feature)
136+
**Enhancement**: Added validation to detect issues in the OpenAPI spec itself, which can help identify upstream problems.
137+
138+
**Features**:
139+
- Detects duplicate endpoint definitions (same method+path defined multiple times)
140+
- Identifies endpoints appearing in multiple categories/tags
141+
- Warns when critical issues like duplicate endpoints are found
142+
- Suggests reporting issues to the Weave team when spec problems are detected
143+
144+
**Files Modified**:
145+
- `scripts/reference-generation/weave/sync_openapi_spec.py` (added `validate_spec()` function and integration in `main()`)
146+
147+
This will help identify if duplicate H3s or other issues originate from the OpenAPI spec rather than our generation scripts.
148+
149+
## Files Modified Summary
150+
151+
1. `scripts/reference-generation/weave/fix_casing.py`
152+
2. `scripts/reference-generation/weave/generate_typescript_sdk_docs.py`
153+
3. `scripts/reference-generation/weave/generate_service_api_spec.py`
154+
4. `scripts/reference-generation/weave/update_service_api_landing.py`
155+
5. `scripts/reference-generation/weave/generate_python_sdk_docs.py`
156+
6. `scripts/reference-generation/weave/update_weave_toc.py`
157+
7. `scripts/reference-generation/weave/sync_openapi_spec.py` (new validation feature)
158+
159+
All fixes are backward compatible and will take effect on the next reference documentation generation run.

scripts/reference-generation/weave/fix_casing.py

Lines changed: 24 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -12,108 +12,47 @@
1212
from pathlib import Path
1313

1414
def fix_typescript_casing(base_path):
15-
"""Fix TypeScript SDK file casing."""
16-
print("Fixing TypeScript SDK file casing...")
15+
"""Fix TypeScript SDK file casing - ensure all files use lowercase."""
16+
print("Fixing TypeScript SDK file casing to lowercase...")
1717

18-
ts_base = Path(base_path) / "weave/reference/typescript-sdk/weave"
18+
ts_base = Path(base_path) / "weave/reference/typescript-sdk"
1919
if not ts_base.exists():
2020
print(f" TypeScript SDK path not found: {ts_base}")
2121
return
2222

23-
# Define correct names for each directory
24-
casing_rules = {
25-
"classes": {
26-
"dataset": "Dataset",
27-
"evaluation": "Evaluation",
28-
"weaveclient": "WeaveClient",
29-
"weaveobject": "WeaveObject",
30-
},
31-
"interfaces": {
32-
"callschema": "CallSchema",
33-
"callsfilter": "CallsFilter",
34-
"weaveaudio": "WeaveAudio",
35-
"weaveimage": "WeaveImage",
36-
},
37-
"functions": {
38-
# Functions should be lowercase/camelCase
39-
"init": "init",
40-
"login": "login",
41-
"op": "op",
42-
"requirecurrentcallstackentry": "requireCurrentCallStackEntry",
43-
"requirecurrentchildsummary": "requireCurrentChildSummary",
44-
"weaveaudio": "weaveAudio",
45-
"weaveimage": "weaveImage",
46-
"wrapopenai": "wrapOpenAI",
47-
},
48-
"type-aliases": {
49-
"op": "Op", # Type alias Op is uppercase
50-
"opdecorator": "OpDecorator",
51-
"messagesprompt": "MessagesPrompt",
52-
"stringprompt": "StringPrompt",
53-
}
54-
}
23+
# All TypeScript SDK files should use lowercase filenames for consistency
24+
# This applies to classes, functions, interfaces, and type-aliases
25+
subdirs_to_check = ["classes", "functions", "interfaces", "type-aliases"]
5526

56-
for dir_name, rules in casing_rules.items():
57-
dir_path = ts_base / dir_name
27+
for subdir in subdirs_to_check:
28+
dir_path = ts_base / subdir
5829
if not dir_path.exists():
5930
continue
6031

6132
for file in dir_path.glob("*.mdx"):
62-
basename = file.stem.lower()
63-
if basename in rules:
64-
correct_name = rules[basename]
65-
if file.stem != correct_name:
66-
new_path = file.parent / f"{correct_name}.mdx"
67-
print(f" Renaming: {file.name}{correct_name}.mdx")
68-
shutil.move(str(file), str(new_path))
33+
# Convert filename to lowercase
34+
lowercase_name = file.stem.lower()
35+
if file.stem != lowercase_name:
36+
new_path = file.parent / f"{lowercase_name}.mdx"
37+
print(f" Renaming: {file.name}{lowercase_name}.mdx")
38+
shutil.move(str(file), str(new_path))
6939

7040
def fix_python_casing(base_path):
71-
"""Fix Python SDK file casing."""
72-
print("Fixing Python SDK file casing...")
41+
"""Fix Python SDK file casing for WEAVE reference docs only."""
42+
print("Fixing Weave Python SDK file casing...")
7343

74-
py_base = Path(base_path) / "models/ref/python/public-api"
44+
# IMPORTANT: This should ONLY touch Weave reference docs, never Models reference docs
45+
py_base = Path(base_path) / "weave/reference/python-sdk"
7546
if not py_base.exists():
76-
print(f" Python SDK path not found: {py_base}")
47+
print(f" Weave Python SDK path not found: {py_base}")
7748
return
7849

79-
# Python class files that should be uppercase
80-
uppercase_files = {
81-
"artifactcollection": "ArtifactCollection",
82-
"artifactcollections": "ArtifactCollections",
83-
"artifactfiles": "ArtifactFiles",
84-
"artifacttype": "ArtifactType",
85-
"artifacttypes": "ArtifactTypes",
86-
"betareport": "BetaReport",
87-
"file": "File",
88-
"member": "Member",
89-
"project": "Project",
90-
"registry": "Registry",
91-
"run": "Run",
92-
"runartifacts": "RunArtifacts",
93-
"sweep": "Sweep",
94-
"team": "Team",
95-
"user": "User",
96-
}
97-
98-
# Files that should remain lowercase
99-
lowercase_files = ["api", "artifacts", "automations", "files", "projects",
100-
"reports", "runs", "sweeps", "_index"]
50+
# For Weave Python SDK, we generally want lowercase filenames
51+
# Only specific files might need special casing - currently none known
52+
# Most Weave modules use lowercase with underscores (e.g., weave_client.mdx)
10153

102-
for file in py_base.glob("*.mdx"):
103-
basename = file.stem.lower()
104-
105-
if basename in uppercase_files:
106-
correct_name = uppercase_files[basename]
107-
if file.stem != correct_name:
108-
new_path = file.parent / f"{correct_name}.mdx"
109-
print(f" Renaming: {file.name}{correct_name}.mdx")
110-
shutil.move(str(file), str(new_path))
111-
elif basename in lowercase_files:
112-
# Ensure these stay lowercase
113-
if file.stem != basename:
114-
new_path = file.parent / f"{basename}.mdx"
115-
print(f" Renaming: {file.name}{basename}.mdx")
116-
shutil.move(str(file), str(new_path))
54+
print(f" Weave Python SDK files are generated with correct casing")
55+
print(f" No casing changes needed for Weave reference documentation")
11756

11857
def main():
11958
"""Main function to fix all casing issues."""

scripts/reference-generation/weave/generate_python_sdk_docs.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,10 @@ def generate_module_docs(module, module_name: str, src_root_path: str, version:
255255
# Remove <b>` at the start of lines that don't have a closing </b>
256256
content = re.sub(r'^- <b>`([^`\n]*?)$', r'- \1', content, flags=re.MULTILINE)
257257

258+
# Remove malformed table separators that lazydocs sometimes generates
259+
# These appear as standalone lines with just dashes (------) which break markdown parsing
260+
content = re.sub(r'\n\s*------+\s*\n', '\n\n', content)
261+
258262
# Fix parameter lists that have been broken by lazydocs
259263
# Strategy: Parse all parameters into a structured format, then reconstruct them properly
260264
def fix_parameter_lists(text):

scripts/reference-generation/weave/generate_service_api_spec.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,6 @@ def main():
2828
description: "REST API endpoints for the Weave service"
2929
---
3030
31-
# Weave Service API
32-
3331
The Weave Service API provides REST endpoints for interacting with the Weave tracing service.
3432
3533
## Available Endpoints

scripts/reference-generation/weave/generate_typescript_sdk_docs.py

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -255,8 +255,9 @@ def convert_to_mintlify_format(docs_dir):
255255
# Just ensure .md extension is removed (already done above)
256256
pass
257257

258-
# Write as .mdx file
259-
mdx_file = md_file.with_suffix('.mdx')
258+
# Write as .mdx file with lowercase filename (avoid Git case sensitivity issues)
259+
lowercase_stem = md_file.stem.lower()
260+
mdx_file = md_file.parent / f"{lowercase_stem}.mdx"
260261
mdx_file.write_text(content)
261262

262263
# Remove original .md file
@@ -314,17 +315,18 @@ def extract_members_to_separate_files(docs_path):
314315
315316
{func_content.replace(f'### {func_name}', f'# {func_name}')}"""
316317

317-
# Write the function file
318-
func_file = functions_dir / f"{func_name}.mdx"
318+
# Write the function file with lowercase filename (avoid Git case sensitivity issues)
319+
func_filename = func_name.lower()
320+
func_file = functions_dir / f"{func_filename}.mdx"
319321
func_file.write_text(func_file_content)
320-
functions_found.append(func_name)
321-
print(f" ✓ Extracted {func_name}.mdx")
322+
functions_found.append(func_filename)
323+
print(f" ✓ Extracted {func_filename}.mdx")
322324

323325
if functions_found:
324326
# Remove the detailed function documentation from index
325327
content = function_pattern.sub('', content)
326328

327-
# Update the Functions section with links
329+
# Update the Functions section with links (functions_found already has lowercase names)
328330
functions_section = "\n### Functions\n\n"
329331
for func in functions_found:
330332
functions_section += f"- [{func}](functions/{func})\n"
@@ -363,17 +365,18 @@ def extract_members_to_separate_files(docs_path):
363365
364366
{alias_content.replace(f'### {alias_name}', f'# {alias_name}')}"""
365367

366-
# Write the type alias file
367-
alias_file = type_aliases_dir / f"{alias_name}.mdx"
368+
# Write the type alias file with lowercase filename (avoid Git case sensitivity issues)
369+
alias_filename = alias_name.lower()
370+
alias_file = type_aliases_dir / f"{alias_filename}.mdx"
368371
alias_file.write_text(alias_file_content)
369-
print(f" ✓ Extracted {alias_name}.mdx")
372+
print(f" ✓ Extracted {alias_filename}.mdx")
370373

371374
# Remove all extracted type aliases from index
372375
content = type_alias_pattern.sub('', content)
373376

374-
# Update Type Aliases section with links to all extracted type aliases
377+
# Update Type Aliases section with links to all extracted type aliases (use lowercase filenames)
375378
if type_aliases:
376-
type_aliases_links = [f"- [{name}](type-aliases/{name})" for _, name in type_aliases if _.startswith(f"### {name}\n\nƬ ")]
379+
type_aliases_links = [f"- [{name}](type-aliases/{name.lower()})" for _, name in type_aliases if _.startswith(f"### {name}\n\nƬ ")]
377380
if type_aliases_links:
378381
type_aliases_section = "\n### Type Aliases\n\n" + "\n".join(sorted(type_aliases_links)) + "\n"
379382

0 commit comments

Comments
 (0)