-
Notifications
You must be signed in to change notification settings - Fork 11
feat: use msgspec for faster JSON encoding and decoding #493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,7 +7,7 @@ | |
| from collections.abc import Callable | ||
| from typing import Any | ||
|
|
||
| import orjson | ||
| import msgspec | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chainFix bytes/str mixing in When snippet = json_str[:200] + ("..." if len(json_str) > 200 else "")For bytes input this becomes You can keep the msgspec backend but make the snippet construction bytes‑safe like this: def load_json_str(
json_str: str | bytes, func: Callable = lambda x: x
) -> dict[str, Any]:
@@
- try:
- # Note: msgspec is strict on UTF-8 conformance and provides fast JSON parsing
- return func(msgspec.json.decode(json_str))
- except msgspec.DecodeError as e:
- snippet = json_str[:200] + ("..." if len(json_str) > 200 else "")
- _logger.exception(f"Failed to parse JSON string: '{snippet}' - {e!r}")
- raise
+ try:
+ # Note: msgspec is strict on UTF-8 conformance and provides fast JSON parsing
+ return func(msgspec.json.decode(json_str))
+ except msgspec.DecodeError as e:
+ if isinstance(json_str, bytes):
+ snippet = json_str[:200].decode("utf-8", errors="replace")
+ else:
+ snippet = json_str[:200]
+ if len(json_str) > 200:
+ snippet += "..."
+ _logger.exception(f"Failed to parse JSON string: '{snippet}' - {e!r}")
+ raiseSeparately, the current CLAUDE/guidelines still say “Always use orjson for JSON operations”, but this helper (and the PR as a whole) intentionally standardizes on msgspec. It would be good to update those docs to avoid future confusion. Based on learnings, this keeps the documented guidance in sync with the new implementation. Also applies to: 79-99 🏁 Script executed: #!/bin/bash
# Inspect load_json_str implementation to verify bytes/str mixing issue
cat src/aiperf/common/utils.py | sed -n '1,120p'Length of output: 4041 🏁 Script executed: #!/bin/bash
# Check CLAUDE.md for orjson guidance that may be outdated
fd -e md --type f | head -20 | xargs grep -l "orjson" 2>/dev/null || echo "No orjson references found in md files"Length of output: 22 🏁 Script executed: #!/bin/bash
# Verify msgspec API: check if msgspec.json.decode and msgspec.DecodeError are valid
python3 << 'EOF'
import msgspec
# Check if msgspec.json.decode exists
print("msgspec.json.decode exists:", hasattr(msgspec.json, 'decode'))
# Check if msgspec.DecodeError exists
print("msgspec.DecodeError exists:", hasattr(msgspec, 'DecodeError'))
# Test bytes + str behavior
test_bytes = b"test"[:3]
print(f"Type of bytes[:3]: {type(test_bytes)}")
try:
result = test_bytes + "..."
except TypeError as e:
print(f"bytes + str raises TypeError: {e}")
EOFLength of output: 119 [/run_scripts] [/web_search] 🏁 Script executed: #!/bin/bash
# Check ATTRIBUTIONS-Python.md and other docs for orjson and messaging guidance
grep -r "orjson\|msgspec" --include="*.md" .Length of output: 375 🏁 Script executed: #!/bin/bash
# Look for coding guidelines or CLAUDE files mentioning JSON
fd -e md --type f -name "*CLAUDE*" -o -name "*guideline*" -o -name "*coding*" | xargs cat 2>/dev/null | head -100Length of output: 230 🌐 Web query: 💡 Result: Here’s the short API info and examples for msgspec.json.decode and the DecodeError/ValidationError behavior.
Sources:
References Fix bytes/str mixing in When snippet = json_str[:200] + ("..." if len(json_str) > 200 else "")For bytes input this becomes Fix by normalizing snippet to str: def load_json_str(
json_str: str | bytes, func: Callable = lambda x: x
) -> dict[str, Any]:
@@
- try:
- # Note: msgspec is strict on UTF-8 conformance and provides fast JSON parsing
- return func(msgspec.json.decode(json_str))
- except msgspec.DecodeError as e:
- snippet = json_str[:200] + ("..." if len(json_str) > 200 else "")
- _logger.exception(f"Failed to parse JSON string: '{snippet}' - {e!r}")
- raise
+ try:
+ # Note: msgspec is strict on UTF-8 conformance and provides fast JSON parsing
+ return func(msgspec.json.decode(json_str))
+ except msgspec.DecodeError as e:
+ if isinstance(json_str, bytes):
+ snippet = json_str[:200].decode("utf-8", errors="replace")
+ else:
+ snippet = json_str[:200]
+ if len(json_str) > 200:
+ snippet += "..."
+ _logger.exception(f"Failed to parse JSON string: '{snippet}' - {e!r}")
+ raise |
||
|
|
||
| from aiperf.common import aiperf_logger | ||
| from aiperf.common.aiperf_logger import AIPerfLogger | ||
|
|
@@ -90,11 +90,9 @@ def load_json_str( | |
| run validation checks on the object. Defaults to identity function. | ||
| """ | ||
| try: | ||
| # Note: orjson may not parse JSON the same way as Python's standard json library, | ||
| # notably being stricter on UTF-8 conformance. | ||
| # Refer to https://github.com/ijl/orjson?tab=readme-ov-file#str for details. | ||
| return func(orjson.loads(json_str)) | ||
| except orjson.JSONDecodeError as e: | ||
| # Note: msgspec is strict on UTF-8 conformance and provides fast JSON parsing | ||
| return func(msgspec.json.decode(json_str)) | ||
| except msgspec.DecodeError as e: | ||
| snippet = json_str[:200] + ("..." if len(json_str) > 200 else "") | ||
| _logger.exception(f"Failed to parse JSON string: '{snippet}' - {e!r}") | ||
| raise | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify msgspec version compatibility.
The addition of msgspec~=0.20.0 looks good. Ensure this version is compatible with Python 3.10+ and provides the required JSON encoding/decoding APIs used throughout the codebase.
🌐 Web query:
💡 Result:
The official msgspec latest stable release is 0.19.0 (published Dec 27, 2024). [1]
msgspec 0.19.0 provides wheels for CPython 3.10 (and newer CPython versions listed on the release files), so the official msgspec release line supports Python 3.10+. [1]
There is a community fork named msgspec-x which has a 0.20.0 release (Jun 21, 2025). That fork’s 0.20.0 lists support for CPython 3.10+ in its PyPI release metadata. If you meant the official project, 0.20.0 is not an official msgspec release; 0.19.0 is the latest official stable. [2]
Sources:
[1] msgspec on PyPI — release files and metadata.
[2] msgspec-x on PyPI (version 0.20.0).
I need to check the codebase to determine which msgspec package is intended (official or the fork).
🏁 Script executed:
Length of output: 77213
🏁 Script executed:
Length of output: 1241
🏁 Script executed:
Length of output: 0
Fix invalid msgspec version constraint that will cause installation failures.
Official msgspec's latest stable version is 0.19.0, not 0.20.0. The current constraint
msgspec~=0.20.0cannot be satisfied by the official package, causing pip install to fail. Version 0.19.0 supports Python 3.10+ as required by your project. Either downgrade tomsgspec~=0.19.0or explicitly specify the msgspec-x fork if version 0.20.0 is intentional.🤖 Prompt for AI Agents