-
Notifications
You must be signed in to change notification settings - Fork 57
E2E Tests Fixing - Ensuring User workflows work as intended/aren't broken #694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @rmusser01, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on enhancing the robustness and reliability of the End-to-End (E2E) test suite. By adjusting how test fixtures are imported and made available, it ensures that critical user workflow simulations are properly configured and can effectively act as canaries for detecting regressions. The changes aim to improve the foundational setup of E2E tests, making them more dependable in verifying core application functionalities. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughReorganizes e2e pytest fixtures into plugin modules, adds run-slow CLI option and rate_limits marker, centralizes sender-name sanitization, propagates HTTP redirect config, tightens HTML sanitization, updates Kokoro TTS pacing/validation, adjusts CI smoke flows, and adds many docs and test wiring changes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as Chat Endpoint
participant ChatSvc as ChatService
participant DB
rect rgb(240,248,255)
Note over Client,API: Centralized sender sanitization for persistence and LLM payloads
end
Client->>API: POST /api/v1/chats (create/message)
API->>API: sanitized = sanitize_sender_name(character_name)
API->>ChatSvc: build_context_and_messages(..., assistant_name=sanitized)
ChatSvc->>DB: persist message(sender=sanitized, content)
DB-->>ChatSvc: ack
ChatSvc-->>API: assembled response/stream
API-->>Client: SSE / HTTP response
sequenceDiagram
participant Pytest
participant PluginMgr as Plugins
participant Test as E2E Test
rect rgb(245,255,240)
Note over Pytest,PluginMgr: Plugin-driven e2e fixture discovery and selective env toggles
end
Pytest->>PluginMgr: load pytest_plugins (e2e_fixtures, e2e_state_fixtures, chat_fixtures, media_fixtures)
Test->>Pytest: marked with `@pytest.mark.rate_limits`
Pytest->>PluginMgr: autouse `_rate_limit_env_toggle` clears TEST_MODE/TESTING for test scope
Pytest->>Test: run test with plugin fixtures and restored env
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an issue in the end-to-end test setup by ensuring the test_user_credentials fixture is properly exposed. By re-exporting this fixture in conftest.py, it becomes available across the entire test suite, which is crucial for fixtures like authenticated_client that depend on it. This change aligns with pytest best practices for managing shared fixtures, improves the robustness of the test configuration, and likely resolves fixture discovery errors. The implementation is correct and well-commented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tldw_Server_API/tests/e2e/conftest.py (1)
13-20: LGTM! Fixture dependency correctly addressed.The addition of
test_user_credentialsto the imports is necessary and correct, asauthenticated_clientdepends on it as a fixture parameter (see fixtures.py line 678). The explanatory comment also improves code clarity about pytest's fixture discovery mechanism.Optional: Remove unnecessary
noqadirective.The static analysis tool indicates that the
# noqa: F401directive on line 20 is unnecessary because the F401 check (unused imports) is not enabled in your Ruff configuration. You can safely remove it:-) # noqa: F401 +)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tldw_Server_API/tests/e2e/conftest.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tldw_Server_API/tests/e2e/conftest.py (1)
tldw_Server_API/tests/e2e/fixtures.py (4)
api_client(636-664)authenticated_client(679-718)data_tracker(819-824)test_user_credentials(668-675)
🪛 Ruff (0.14.4)
tldw_Server_API/tests/e2e/conftest.py
20-20: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (11)
Docs/Design/MCP-Unified-Extraction.md (1)
56-77: Add language specifier to code block for proper syntax highlighting.Line 56 begins a fenced code block without specifying a language. While this renders, adding a language specifier (e.g.,
tree,text, ornone) improves clarity for tools and syntax highlighting.-``` +```text mcp_unified/Docs/Audit/README.md (1)
186-186: Use American English consistently: "in future" → "in the future".Line 186 uses British English phrasing. If the guide targets American English (as most technical documentation does), update for consistency.
-- PII scanning can be tuned centrally in future; currently Audit and RAG use separate detectors. +- PII scanning can be tuned centrally in the future; currently Audit and RAG use separate detectors.Docs/Code_Documentation/Guides/Character_Chat_Code_Guide.md (1)
280-280: Fix list indentation inconsistency.Line 280 has incorrect indentation for a list continuation. Markdown list items at the same level must have consistent indentation.
- Message metadata/tool-calls: store via endpoints that accept `tool_calls` and retrieve with `db.get_message_metadata(message_id)` (see `character_messages.py`). - Rate limits: tune in `character_rate_limiter.py` or via env/settings (`CHARACTER_RATE_LIMIT_*`, `MAX_*`). - Provider integration: Character Chat builds standard OpenAI-style `messages` for `/api/v1/chat/completions`. Extend provider logic in the Chat module (`core/Chat/*`). - Persistence & ownership: API endpoints set `client_id` automatically for conversations/messages. If you insert via DB helpers directly, ensure `client_id` is populated (string user ID); ownership checks depend on it. + - Persistence & ownership: API endpoints set `client_id` automatically for conversations/messages. If you insert via DB helpers directly, ensure `client_id` is populated (string user ID); ownership checks depend on it..github/workflows/ci.yml (1)
132-143: Fixture discovery guard looks correct; consider richer failure outputThe step correctly asserts that
test_user_credentialsis discoverable usingpytest --fixtures -qplusrg/grep, and avoids failing on collection errors via|| true. If you ever need easier debugging when the fixture disappears, you could optionally print a small excerpt ofFIXTURES_OUTPUT(orpytest’s exit code) before exiting, but the current implementation is functionally solid.tldw_Server_API/app/core/Character_Chat/modules/character_utils.py (1)
133-147: Sender sanitization helper is solid; centralize its useThe implementation of
sanitize_sender_namematches the documented behavior and is safe for DB storage, and exporting it via__all__is helpful.To avoid drift, it would be good to replace the duplicated inline sanitization logic in caller code (e.g., in
character_chat_sessions.py) with calls to this helper so one place defines the rules for sender normalization.Also applies to: 184-191
Docs/Code_Documentation/Chatbook_Developer_Guide.md (1)
839-845: Confirm placeholder style matches actual DB adapter behaviorThe cross-backend note about preferring Postgres-style
$1, $2, ...placeholders is clear, and the example SQL was updated accordingly. Please just double-check that the underlyingCharactersRAGDB/adapter layer for Chatbooks actually accepts$Nplaceholders for the backend(s) you use here (or rewrites them for SQLite); if not, the snippet should mirror the real call site to avoid confusing future readers.tldw_Server_API/app/api/v1/endpoints/character_chat_sessions.py (2)
217-243: Avoid using the sanitized name in user-visible greeting placeholdersThe seeding block now derives:
raw_name = character.get('name') or 'Assistant' char_name = str(raw_name).replace(' ', '_') for _ch in ("<", ">", "|", "\\", "/"): char_name = char_name.replace(_ch, "") ... content = replace_placeholders(choice_text, char_name, 'User') ... 'sender': char_name,Using
char_namefor the DBsenderis reasonable, but feeding the same sanitized value intoreplace_placeholdersmeans user-visible text will show underscores (and stripped characters) instead of the character’s display name.Consider separating display vs. storage:
- Use the original display name for placeholder replacement.
- Use
sanitize_sender_name(display_name)(fromcharacter_utils) for thesenderfield.That keeps greetings readable while still normalizing DB sender values.
959-981: Deduplicate inline sender sanitization and reuse existing character lookupBoth the completion persistence block and
persist_streamed_assistant_messagerepeat this inline pattern:char_card = db.get_character_card_by_id(conversation.get('character_id')) or {} raw_name = (char_card.get('name') or 'Assistant') if isinstance(char_card, dict) else 'Assistant' assistant_sender = str(raw_name).replace(' ', '_') for _ch in ("<", ">", "|", "\\", "/"): assistant_sender = assistant_sender.replace(_ch, "")A couple of improvements:
- Use the shared
sanitize_sender_namehelper instead of open-coding the transformation in multiple places, so future changes to the sanitization rules remain centralized.- In
character_chat_completion, you already fetchedcharacter = db.get_character_card_by_id(...)earlier; you can reuse that instead of querying the DB again to derive the sender.Conceptually:
from tldw_Server_API.app.core.Character_Chat.modules.character_utils import sanitize_sender_name display_name = (character or {}).get("name") or "Assistant" assistant_sender = sanitize_sender_name(display_name)and the same pattern in the persist endpoint. Functionality today is correct; this just reduces duplication and extra DB calls.
Also applies to: 1464-1476
Docs/Development/Testing.md (1)
24-30: Add languages to fenced code blocks for linting and clarityThe
pytest_pluginsexample and the CI guard snippet are plain fenced blocks without a language (```). Consider tagging them as```pythonand```bash(orsh) to satisfy MD040 and improve syntax highlighting:-``` +```python pytest_plugins = [ ... ]-``` +```bash pytest --fixtures -q tldw_Server_API/tests/e2e | rg -q '^test_user_credentials$'Also applies to: 75-77
tldw_Server_API/tests/_plugins/e2e_fixtures.py (1)
11-38: Re-export surface looks good; consider minor lint cleanupsThe plugin correctly re-exports the core e2e fixtures and helpers. Two small follow-ups:
- Drop the unused
# noqa: F401on the import line since Ruff reports it as unnecessary.- Optionally sort
__all__to satisfyRUF022if you want to keep Ruff fully green.Functionally this module is fine as-is.
tldw_Server_API/app/core/Chat/chat_service.py (1)
30-38: Unify all assistantnamesanitization viasanitize_sender_nameThe new use of
sanitize_sender_namein history reconstruction and current-turn persistence looks correct and matches the helper’s contract. To avoid future drift, consider reusing the same helper in the two remaining paths that still hand-roll sanitization:
- In
save_callbackinsideexecute_streaming_call(lines ~1160–1172), replace the inlineasst_name.replace(...).replace(...)chain withsanitize_sender_name(character_card_for_context.get("name"))(with a sensible"Assistant"fallback).- In the non-streaming
should_save_responseblock (lines ~1863–1872), similarly deriveasst_nameviasanitize_sender_name.This keeps all DB/LLM assistant sender names going through a single, well-defined sanitizer.
Also applies to: 751-799, 1160-1172, 1863-1872
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
.github/workflows/ci.yml(1 hunks)Docs/API-related/API_Notes.md(1 hunks)Docs/Audit/README.md(4 hunks)Docs/AuthNZ/AUTHNZ_PERMISSION_MATRIX.md(2 hunks)Docs/Code_Documentation/Chatbook_Developer_Guide.md(1 hunks)Docs/Code_Documentation/Guides/Audit_Module_Code_Guide.md(1 hunks)Docs/Code_Documentation/Guides/AuthNZ_Code_Guide.md(1 hunks)Docs/Code_Documentation/Guides/Character_Chat_Code_Guide.md(1 hunks)Docs/Design/MCP-Unified-Extraction.md(1 hunks)Docs/Development/Testing.md(1 hunks)Docs/Published/API-related/API_Notes.md(1 hunks)Docs/Published/Code_Documentation/Chatbook_Developer_Guide.md(1 hunks)Docs/Published/User_Guides/User_Guide.md(1 hunks)New-User-Guide.md(2 hunks)tldw_Server_API/app/api/v1/endpoints/character_chat_sessions.py(3 hunks)tldw_Server_API/app/core/Character_Chat/modules/character_utils.py(2 hunks)tldw_Server_API/app/core/Chat/chat_service.py(3 hunks)tldw_Server_API/tests/_plugins/e2e_fixtures.py(1 hunks)tldw_Server_API/tests/_plugins/e2e_state_fixtures.py(1 hunks)tldw_Server_API/tests/_plugins/media_fixtures.py(1 hunks)tldw_Server_API/tests/conftest.py(2 hunks)tldw_Server_API/tests/e2e/conftest.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- Docs/API-related/API_Notes.md
🧰 Additional context used
🧬 Code graph analysis (4)
tldw_Server_API/app/core/Chat/chat_service.py (1)
tldw_Server_API/app/core/Character_Chat/modules/character_utils.py (1)
sanitize_sender_name(133-147)
tldw_Server_API/tests/_plugins/media_fixtures.py (1)
tldw_Server_API/tests/e2e/fixtures.py (3)
create_test_file(721-725)create_test_pdf(728-734)cleanup_test_file(764-769)
tldw_Server_API/app/api/v1/endpoints/character_chat_sessions.py (1)
tldw_Server_API/app/core/DB_Management/ChaChaNotes_DB.py (2)
get_character_card_by_id(3631-3656)add_message(4830-4978)
tldw_Server_API/tests/_plugins/e2e_fixtures.py (1)
tldw_Server_API/tests/e2e/fixtures.py (7)
APIClient(49-559)test_user_credentials(668-675)data_tracker(819-824)create_test_file(721-725)create_test_pdf(728-734)cleanup_test_file(764-769)AssertionHelpers(831-902)
🪛 Gitleaks (8.29.0)
Docs/Published/User_Guides/User_Guide.md
[high] 104-106: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.
(curl-auth-header)
🪛 LanguageTool
Docs/Audit/README.md
[locale-violation] ~186-~186: The phrase ‘in future’ is British English. Did you mean: “in the future”?
Context: .... - PII scanning can be tuned centrally in future; currently Audit and RAG use separate d...
(IN_FUTURE)
Docs/Code_Documentation/Guides/Audit_Module_Code_Guide.md
[grammar] ~135-~135: Ensure spelling is correct
Context: ...on and risk scoring; high‑risk triggers ad‑hoc flush. - `await log_login(user_id, user...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~181-~181: Ensure spelling is correct
Context: ...sk events and buffer thresholds trigger ad‑hoc flushes (tracked futures are awaited du...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
Docs/Code_Documentation/Guides/AuthNZ_Code_Guide.md
[uncategorized] ~78-~78: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ck_rate_limit` uses a token bucket; see Rate Limiting below. ## Sessions, Tokens, Revocation...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
Docs/Code_Documentation/Guides/Character_Chat_Code_Guide.md
[style] ~48-~48: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...robabilities/cooldowns. Applied to text prior to sending to providers. - Rate Limiting: ...
(EN_WORDINESS_PREMIUM_PRIOR_TO)
Docs/Development/Testing.md
[grammar] ~67-~67: Ensure spelling is correct
Context: ...surface explicit. - Prefer composition: re‑use helpers from `tldw_Server_API.tests.e2e...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
Docs/Design/MCP-Unified-Extraction.md
56-56: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Docs/Code_Documentation/Guides/Character_Chat_Code_Guide.md
280-280: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1
(MD005, list-indent)
280-280: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
Docs/Development/Testing.md
24-24: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
75-75: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.4)
tldw_Server_API/tests/_plugins/media_fixtures.py
26-27: try-except-pass detected, consider logging the exception
(S110)
26-26: Do not catch blind exception: Exception
(BLE001)
48-52: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
tldw_Server_API/tests/_plugins/e2e_fixtures.py
11-11: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
25-38: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
tldw_Server_API/tests/_plugins/e2e_state_fixtures.py
73-73: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
111-111: Do not catch blind exception: Exception
(BLE001)
120-120: Unused function argument: call
(ARG001)
142-142: Unused function argument: exitstatus
(ARG001)
159-164: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
tldw_Server_API/tests/conftest.py (1)
7-12: Top-level plugin registration aligns with pytest plugin guidanceRegistering shared plugins via
pytest_pluginsin the top-level testsconftest.pyis a good fit with the pytest≥8 recommendation and ensures fixtures like your e2e helpers are visible across the suite (and to the CI fixture-discovery step). Just keep the fully qualified module paths in sync with any future_pluginspackage moves.Also applies to: 91-97
New-User-Guide.md (1)
87-115: Updated provider examples and default-provider docs read wellThe two request styles in 3.5 and the new 4.6 section align with the chat API’s provider/model semantics (explicit
api_provider,provider/modelprefix, and default provider behavior). The examples are clear and should be very helpful for new users.Also applies to: 178-218
tldw_Server_API/tests/e2e/conftest.py (1)
12-19: Plugin registration for e2e suite looks goodRegistering the shared plugins here is consistent with the new plugin architecture and keeps this
conftest.pyfocused on markers and options. This should make e2e fixtures easier to reuse and reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tldw_Server_API/tests/e2e/fixtures.py (2)
429-475: Fix the fallback endpoint call—it's incompatible with the import endpoint.The
/api/v1/characters/importendpoint expects a file upload via multipart form data, but line 471-472 sendsjson=character_data. This will fail at runtime.The import endpoint signature is:
character_file: UploadFile = File(...)Change lines 469-475 to send a file instead:
# Fallback (should rarely be used): try file-import endpoint if non-dict provided import json response = self.client.post( f"{API_PREFIX}/characters/import", files={"character_file": ("character.json", json.dumps(character_data).encode())} ) response.raise_for_status() return response.json()Note: The original review comment's premise about endpoint path inconsistency was incorrect—both endpoints correctly use
API_PREFIX. The actual issue is the incompatible request format in the fallback logic.
469-475: Fix the fallback to use correct multipart/form-data format for file import endpoint.The fallback at lines 471-475 sends
json=character_datato the/characters/importendpoint, but this endpoint expectscharacter_file: UploadFile = File(...)(multipart/form-data). The correct tests usefiles={"character_file": ...}format. This fallback will fail if ever triggered.Options:
- Convert to proper file upload:
files={"character_file": (None, json.dumps(character_data), "application/json")}- Raise an error for non-dict inputs instead of attempting the fallback
- Document what non-dict types are actually expected (if any)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
Docs/API-related/Chatbook_API_Documentation.md(2 hunks)Docs/Code_Documentation/Guides/AuthNZ_Code_Guide.md(1 hunks)Docs/Code_Documentation/Guides/Character_Chat_Code_Guide.md(1 hunks)Docs/Code_Documentation/Guides/Chatbooks_Code_Guide.md(1 hunks)tldw_Server_API/app/api/v1/endpoints/media.py(1 hunks)tldw_Server_API/app/core/Chatbooks/README.md(2 hunks)tldw_Server_API/tests/e2e/fixtures.py(1 hunks)tldw_Server_API/tests/e2e/test_custom_benchmark_real.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Docs/Code_Documentation/Guides/Character_Chat_Code_Guide.md
🧰 Additional context used
🧬 Code graph analysis (2)
tldw_Server_API/tests/e2e/fixtures.py (1)
tldw_Server_API/app/core/LLM_Calls/LLM_API_Calls.py (4)
post(131-150)json(85-86)json(101-102)raise_for_status(104-111)
tldw_Server_API/tests/e2e/test_custom_benchmark_real.py (1)
tldw_Server_API/tests/Evaluations/test_evaluations_unified.py (1)
client(53-56)
🪛 GitHub Actions: CI
tldw_Server_API/app/api/v1/endpoints/media.py
[error] 2280-2280: SyntaxError: invalid syntax
🪛 GitHub Actions: pre-commit
tldw_Server_API/app/api/v1/endpoints/media.py
[error] 2280-2280: SyntaxError: invalid syntax
[error] 2280-2280: pre-commit: debug statements hook failed due to AST parse error in this file.
🪛 LanguageTool
Docs/Code_Documentation/Guides/AuthNZ_Code_Guide.md
[uncategorized] ~119-~119: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ck_rate_limituses a token bucket; see Rate Limiting below. Also handy DI: -get_optional_...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
Docs/Code_Documentation/Guides/Chatbooks_Code_Guide.md
[grammar] ~15-~15: Ensure spelling is correct
Context: ...ook” ZIP with a JSON manifest. - Import chatbooks back into a user’s workspace with confl...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~96-~96: Ensure spelling is correct
Context: ...s, max concurrent jobs, file size caps, chatbook count. - Quota checks occur in the endp...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 Ruff (0.14.4)
tldw_Server_API/app/api/v1/endpoints/media.py
2280-2280: Expected class, function definition or async function definition after decorator
(invalid-syntax)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (5)
Docs/API-related/Chatbook_API_Documentation.md (1)
7-7: LGTM! Clear documentation improvements.The Developer Code Guide reference and ASCII Request Flow diagram provide helpful context for developers working with the Chatbooks API. The flow diagrams clearly illustrate sync vs async export/import patterns.
Also applies to: 48-67
tldw_Server_API/tests/e2e/test_custom_benchmark_real.py (1)
63-89: LGTM! Clean migration to unified Evaluations API.The test has been properly updated to use the new unified API shape:
- Dataset structure with
input/expected/metadatafields aligns with the new schema- Safe name generation with underscores ensures schema compliance
- Updated endpoint path from nested to flat structure is correct
The changes maintain test coverage while adapting to the new API surface.
Also applies to: 178-180
Docs/Code_Documentation/Guides/AuthNZ_Code_Guide.md (1)
1-344: Excellent comprehensive developer guide.This guide provides clear, actionable documentation for developers working with the AuthNZ module. The structure is logical, code examples are practical, and coverage of both single-user and multi-user modes is thorough.
Key strengths:
- Clear separation of concerns (sessions, tokens, API keys, virtual keys)
- Practical code examples with proper dependency injection patterns
- Good coverage of cross-backend SQL placeholder handling (lines 267-273)
- Helpful troubleshooting section
tldw_Server_API/app/core/Chatbooks/README.md (1)
5-5: LGTM! Helpful documentation additions.The Developer Code Guide reference and detailed ASCII flow diagram enhance the technical documentation. The flow diagram clearly illustrates the job lifecycle for both sync and async operations, making it easier for developers to understand the system behavior.
Also applies to: 42-69
Docs/Code_Documentation/Guides/Chatbooks_Code_Guide.md (1)
1-170: Excellent comprehensive code guide for Chatbooks.This guide provides thorough coverage of the Chatbooks module architecture, data flows, and developer workflows. The structure is logical and the content is practical.
Highlights:
- Clear explanation of jobs storage separation (lines 64-66)
- Practical curl examples for API usage (lines 135-148)
- Helpful "Common Gotchas & Tips" section (lines 150-157)
- Good extension guidance for adding new content types
The guide will be valuable for developers working with or extending the Chatbooks functionality.
| except Exception: | ||
| pass | ||
|
|
||
| async with client.stream("GET", cur_url, follow_redirects=False, timeout=60.0) as get_resp: |
Check failure
Code scanning / CodeQL
Full server-side request forgery Critical
user-provided value
The full URL of this request depends on a
user-provided value
The full URL of this request depends on a
user-provided value
The full URL of this request depends on a
user-provided value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
tldw_Server_API/tests/e2e/test_concurrent_operations.py (1)
349-355: Logic bug: race analysis dict used as booleandetect_race_condition returns a dict. This condition always truthy.
Use the actual indicator:
- has_duplicates = ConcurrentTestHelper.detect_race_condition(successful_results) - if has_duplicates: + analysis = ConcurrentTestHelper.detect_race_condition(successful_results) + if analysis.get('duplicate_ids'): print(f" ✓ Content deduplication working correctly (same IDs for same content)") else: print(f" ⚠ Warning: Expected deduplication for identical content uploads")tldw_Server_API/tests/e2e/test_external_services.py (1)
73-120: Non‑429 HTTP errors in LLM rate‑limit test are silently ignoredIn
test_llm_rate_limit_handling, anyhttpx.HTTPStatusErrorwith a status other than 429 just gets appended toerrors, and the test still passes if no 429 was seen. That means repeated 4xx/5xx responses (e.g., auth misconfig, server error) will not fail or skip the test.To keep the test honest while still focusing on 429 behavior, re‑raise or delegate non‑429 errors to
SmartErrorHandler:for i in range(5): # Send 5 rapid requests try: response = api_client.chat_completion( @@ except httpx.HTTPStatusError as e: - errors.append((e.response.status_code, e.response.json())) - if e.response.status_code == 429: + errors.append((e.response.status_code, e.response.json())) + if e.response.status_code == 429: @@ - break + break + else: + # Treat non-429 as real failures (or env issues) instead of silently passing + SmartErrorHandler.handle_error(e, "llm rate limit handling")This keeps the test focused on rate limits but ensures other HTTP failures are surfaced.
tldw_Server_API/app/api/v1/endpoints/media.py (3)
182-186: Duplicate GET "/" routes — ambiguous handlerTwo handlers register the same path/method ("/"): list_media_endpoint (with both "" and "/") and list_all_media. FastAPI will keep the last registration, leading to confusing behavior and brittle E2E tests.
Apply one of the following minimal fixes.
Option A (rename the newer route to a distinct path):
@@ -@router.get( - "/", +@router.get( + "/all", status_code=status.HTTP_200_OK, summary="List All Media Items", tags=["Media Management"], response_model=MediaListResponse )Option B (drop the duplicate slash alias on the older endpoint):
@@ -@router.get( - "/", - tags=["Media Management"], - summary="List Media (slash)", -)Pick one and update any client/tests accordingly.
Also applies to: 2161-2170
414-432: SSRF guard missing for process-code URL downloadsUnlike document-like paths, process-code schedules URL downloads without assert_url_safe pre‑check. Add a per‑URL policy check and record per‑item errors instead of batch‑failing.
Apply:
@@ - if urls: - # Use module-local httpx.AsyncClient so tests can monkeypatch it - async with httpx.AsyncClient() as client: - tasks = [ - _download_url_async( - client=client, url=u, target_dir=temp_dir, - allowed_extensions=CODE_ALLOWED_EXTENSIONS, check_extension=True - ) for u in urls - ] - results = await asyncio.gather(*tasks, return_exceptions=True) - for url, res in zip(urls, results): + if urls: + # Vet URLs for SSRF before scheduling downloads + vetted_urls: List[str] = [] + for u in urls: + try: + assert_url_safe(u) + vetted_urls.append(u) + except HTTPException as he: + batch["results"].append({ + "status": "Error", "input_ref": u, "processing_source": None, + "media_type": "code", "error": f"URL blocked by policy: {he.detail}", + "metadata": {}, "content": None, "chunks": None, "analysis": None, "keywords": None, + "warnings": None, "analysis_details": {}, "db_id": None, "db_message": "Processing only endpoint." + }) + batch["errors_count"] += 1 + if not vetted_urls: + # continue to final ordering/response + pass + else: + async with httpx.AsyncClient() as client: + tasks = [ + _download_url_async( + client=client, url=u, target_dir=temp_dir, + allowed_extensions=CODE_ALLOWED_EXTENSIONS, check_extension=True + ) for u in vetted_urls + ] + results = await asyncio.gather(*tasks, return_exceptions=True) + for url, res in zip(vetted_urls, results):
617-622: Redis cache key uses non‑deterministic Python hashhash(frozenset(...)) changes across processes/boots; cache keys won’t be stable between workers. Use a stable digest over a canonical encoding.
@@ - return f"cache:{request.url.path}:{hash(frozenset(params.items()))}" + raw = json.dumps(params, sort_keys=True, separators=(",", ":")) + digest = hashlib.sha256(raw.encode("utf-8")).hexdigest()[:16] + return f"cache:{request.url.path}:{digest}"
♻️ Duplicate comments (2)
tldw_Server_API/app/api/v1/endpoints/media.py (2)
1518-1535: Repeat: backend‑aware upsert for version metadataSame issue/pattern as above. Apply the backend‑aware ON CONFLICT variant and narrow/log the exception as shown in the previous diff.
1698-1715: Repeat: backend‑aware upsert for advanced upsert pathSame fix as for Lines 1434-1452: replace INSERT OR REPLACE with backend‑aware upsert and avoid broad except/pass.
🧹 Nitpick comments (20)
Docs/Code_Documentation/Guides/AuthNZ_Code_Guide.md (1)
125-125: Fix compound adjective phrasing.Line 125 uses "handy DI" which, if intended as a compound adjective, should be hyphenated. Consider rewording to "Also useful DI:" or "Additional helpful DI:" for better readability.
tldw_Server_API/tests/e2e/test_concurrent_operations.py (3)
37-43: Good: conditional sleep helper_maybe_sleep speeds CI by skipping sleeps under TEST_MODE. Consider centralizing in a shared test util to avoid duplication across files.
68-99: Fix per-task timing; current durations are near-zerostart_time is captured after each future completes, so durations/timing are misleading.
Apply:
@@ - with ThreadPoolExecutor(max_workers=max_workers) as executor: - # Submit all tasks - future_to_args = {} + with ThreadPoolExecutor(max_workers=max_workers) as executor: + # Submit all tasks + future_to_args = {} + future_start = {} for args in args_list: - future = executor.submit(func, *args) + future = executor.submit(func, *args) future_to_args[future] = args + future_start[future] = time.time() @@ - for future in as_completed(future_to_args): - start_time = time.time() + for future in as_completed(future_to_args): args = future_to_args[future] + started = future_start.get(future, None) @@ try: result = future.result(timeout=30) results['successful'].append({ 'args': args, 'result': result, - 'duration': time.time() - start_time, + 'duration': (time.time() - started) if started else None, 'timestamp': time.time() # Add timestamp for ordering }) except Exception as e: results['failed'].append({ 'args': args, 'error': str(e), 'error_type': type(e).__name__, - 'duration': time.time() - start_time, + 'duration': (time.time() - started) if started else None, 'timestamp': time.time() }) results['errors'].append(e) - - results['timing'].append(time.time() - start_time) + if started is not None: + results['timing'].append(time.time() - started)
436-436: Convert time.sleep rate limiting delays to _maybe_sleep for consistencyLines 989 and 1098 in
test_concurrent_operations.pyusetime.sleep(1.0)for rate limiting delays. These should be converted to_maybe_sleep(1.0)to align with the CI speed optimization pattern already established throughout the file. Line 288 (RPS pacing calculation) is correctly an exception and should remain astime.sleep().tldw_Server_API/tests/e2e/test_database_operations.py (3)
31-36: Good: conditional sleep helper_maybe_sleep aligns with TEST_MODE behavior and speeds CI.
Consider moving _maybe_sleep to a shared test util to avoid duplication.
41-41: Ruff F811 (fixture name redefinition) in testsLinters may flag pytest fixture parameters as redefinitions. If this is noisy in CI, add a file-level directive or per-line ignore.
Options:
- Add at top:
# ruff: noqa: F811(file scope), or- Add per-line:
# noqa: F811on the function line.
44-44: Replace remainingtime.sleep()calls with_maybe_sleep()for consistency and CI speed.The review is accurate. The
_maybe_sleep()function defined in the file (lines 31–35) conditionally sleeps only whenTEST_MODEis not set, reducing CI execution time. Lines 44, 131, and 139 correctly use it, but other directtime.sleep()calls remain at lines 63, 362, 431, 446, 484, 540, 544, 561, and 571. Replace these with_maybe_sleep()for consistency.tldw_Server_API/Config_Files/config.txt (1)
260-276: HTTP redirect controls: looks good; document semanticsSettings map cleanly to envs consumed by config.py. Recommend noting:
- Booleans accept true/false only; ints should be small (e.g., 0–20).
- Cross-host and scheme downgrade are off by default; keep that for safety.
tldw_Server_API/app/core/config.py (1)
1409-1435: Apply HTTP env propagation hardening refactor (typed parsing, normalization, clamping, narrower exceptions)Verification confirms HTTP_* environment variables are actively consumed in media.py (lines 8724–8730) and http_client.py (line 89). The proposed refactoring is compatible:
- Boolean normalization to "true"/"false" strings is compatible with consumer code that checks
str(value).strip().lower() in {"1", "true", "yes", "on"}- Integer parsing of max_redirects with clamping (0–20) works with consumer code that calls
int(os.getenv(...))- Narrowed exception handling from bare
Exceptionto(configparser.Error, ValueError, OSError)is more precise and saferApply the suggested diff to improve type safety and robustness.
tldw_Server_API/app/core/Ingestion_Media_Processing/Upload_Sink.py (1)
893-961: HTML sanitizer hardening looks good; consider narrowing/logging bare exceptionsThe additional BeautifulSoup/regex pre-pass and post-
bleachanchor handling are solid defense-in-depth. The innertry/except Exception: ... passblocks (e.g., when decomposing/extracting tags or comments, and in the regex fallback) make debugging failures harder and trigger lint warnings.If you want to quiet linters and keep observability without risking test flakiness, consider logging at debug level or catching more specific exceptions in those inner blocks while keeping the outer
try/fallback as-is.tldw_Server_API/tests/e2e/test_external_services.py (3)
267-279: Async polling fallback swallows all exceptions without signalThe new AsyncOperationHandler polling in
test_bulk_embedding_loadis a nice improvement over fixed sleeps, but theexcept Exception: passmakes it impossible to see why a media item wasn’t accessible.Consider at least logging a short warning so failures remain non‑fatal but visible:
- for media_id in media_ids[:3]: - try: - AsyncOperationHandler.wait_for_completion( - check_func=lambda m=media_id: api_client.get_media_item(m), - success_condition=lambda r: bool(r), - timeout=10, - poll_interval=0.5, - context=f"Media {media_id} accessibility", - ) - except Exception: - # Non-fatal in resilience test - pass + for media_id in media_ids[:3]: + try: + AsyncOperationHandler.wait_for_completion( + check_func=lambda m=media_id: api_client.get_media_item(m), + success_condition=lambda r: bool(r), + timeout=10, + poll_interval=0.5, + context=f"Media {media_id} accessibility", + ) + except Exception as exc: + # Non-fatal in resilience test, but log for visibility + print(f"Bulk embedding accessibility check failed for {media_id}: {type(exc).__name__}: {exc}")
633-651: Redirect handling is tolerant; consider logging parse failures explicitlyThe updated
test_redirect_handlinglogic is more flexible (accepts both successful processing and 207‑style error wrappers) and focuses on detecting redirect‑related error messages, which is reasonable.The inner
try/except Exceptionaround buildingerrsmeans any unexpected response structure silently falls back to treating the redirect as “handled successfully.” If you want slightly stronger diagnostics without making the test brittle, you could log the exception when parsing results fails, similar to:- errs = [] - try: - errs = [r for r in response.get("results", []) if isinstance(r, dict) and str(r.get("status")) == "Error"] - except Exception: - errs = [] + errs = [] + try: + errs = [ + r for r in response.get("results", []) + if isinstance(r, dict) and str(r.get("status")) == "Error" + ] + except Exception as exc: + print(f"Redirect test: unexpected results format: {type(exc).__name__}: {exc}") + errs = []This keeps the test forgiving but leaves breadcrumbs if the contract drifts.
459-566: Rate‑limit enforcement tests are observational rather than assertiveThe new
TestRateLimitingEnforcementsuite and@pytest.mark.rate_limitsmarker give nice visibility into how rate limiting behaves in different scenarios. Note that:
test_rate_limit_enforcementandtest_rate_limit_resetdon’t assert that a limit is ever hit; they just print a warning when it isn’t.test_authenticated_vs_anonymous_limitsreports counts but doesn’t assert on their relative sizes.If that’s intentional to keep CI “chill” and avoid environment‑dependent flakiness, this is fine. If you want these to guard behavior more strictly, you could turn the “⚠” paths into
pytest.skipor soft assertions (e.g., assert within a very wide expected range) so regressions are more visible in test results than only via logs.Docs/Code_Documentation/Guides/Chatbooks_Code_Guide.md (1)
13-178: Guide content is strong; consider using proper markdown headings for toolingThe structure and references in this guide are excellent. Markdown linters are flagging patterns like
**Scope & Goals**,**Quick Map (Where Things Live)**, etc., as “emphasis used instead of a heading” (MD036). If you want cleaner docs tooling and navigation, consider converting these to real headings:-**Scope & Goals** +## Scope & Goals @@ -**Quick Map (Where Things Live)** +## Quick Map (Where Things Live) @@ -**Key Endpoints** +## Key Endpoints…and similarly for the other major sections. The “chatbook” spelling looks intentional for this domain; I’d keep it as-is.
tldw_Server_API/tests/e2e/fixtures.py (2)
430-468: Character JSON create path is good; clarify or fix non‑dict fallbackThe new
import_characterbehavior that maps legacy sample keys into a JSON payload and POSTs to/characters/is a big improvement and aligns with the schema.Two minor points:
- The type hint
character_data: Dict[str, Any]plus theisinstance(character_data, dict)check makes the “non-dict” fallback effectively dead code. If you want to support other input types (e.g., raw JSON string or file path), consider changing the annotation toAnyorUnion[...], otherwise you could drop the fallback entirely.- The fallback branch posts
json=character_datato/characters/import, which is a file-oriented endpoint; if this ever executes with a non‑dict (e.g., string path), it’s likely to 4xx. Either keep it unreachable by design, or adjust it to send multipart files when you actually support that mode.The new
delete_chathelper used by cleanup is fine and correctly tolerates both 200 and 204 without assuming a JSON body.Also applies to: 505-510
864-912: Server-side cleanup wiring works; note notes delete semantics and silent failures
TestDataTracker.cleanup_resources+ the updateddata_trackerfixture give you a nice centralized teardown for prompts, notes, chats, characters, and media, with an environment override (E2E_PRESERVE_ARTIFACTS) for debugging. A couple of nuances:
- The delete helpers (
delete_prompt,delete_note,delete_character, etc.) can raise, but all exceptions are swallowed. That’s fine for “never break teardown,” but if you care about cleanup quality, consider logging the exception type/message (even truncated) instead of barepass.- The app’s
DELETE /notes/{note_id}endpoint, perapp/api/v1/endpoints/notes.py, expects anexpected_versionheader for optimistic locking; yourAPIClient.delete_notedoes not send one. In practice that means note deletions may consistently return 422 and be silently ignored here, leaving note rows behind. If you want notes to be cleaned, you could either:
- add an
expected-versionheader (e.g., defaulting to1or a value from metadata), or- call a dedicated admin/cleanup endpoint if available.
The overall pattern of doing file cleanup first and server-side cleanup second, guarded by
preserve, is solid.Also applies to: 915-929
tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py (1)
1438-1530: Well-structured function with minor refinement opportunity.The
get_embedding_config()function is well-designed, properly creating Pydantic model instances and handling configuration fallbacks. The provider:model key convention is consistent and clear.Consider adding a descriptive comment for the timeout override section (lines 1518-1528) to clarify its purpose for test environments:
- # Optional: test override for model unload timeout - # If TEST_EMBEDDINGS_UNLOAD_TIMEOUT_SECONDS (or EMBEDDINGS_UNLOAD_TIMEOUT_SECONDS) is set, - # apply it to all configured models. This is helpful to shorten timers during pytest runs. + # Apply test-time override for model unload timeout to all models. + # Useful for faster test execution by shortening unload timers. + # Respects: TEST_EMBEDDINGS_UNLOAD_TIMEOUT_SECONDS or EMBEDDINGS_UNLOAD_TIMEOUT_SECONDS try: timeout_env = os.getenv("TEST_EMBEDDINGS_UNLOAD_TIMEOUT_SECONDS") or os.getenv("EMBEDDINGS_UNLOAD_TIMEOUT_SECONDS") if timeout_env: timeout_val = int(timeout_env) for model_cfg in config["embedding_config"]["models"].values(): - # Pydantic models allow attribute mutation by default if hasattr(model_cfg, "unload_timeout_seconds"): + # Direct mutation is safe; Pydantic BaseModel allows it by default model_cfg.unload_timeout_seconds = timeout_val except Exception as _e: - # Do not fail configuration if env var is malformed; ignore silently in production path + # Silently ignore malformed env vars to avoid breaking production config passtldw_Server_API/tests/e2e/conftest.py (1)
145-167: Well-implemented rate limit toggle with minor simplification opportunity.The fixture correctly:
- Detects tests marked with
@pytest.mark.rate_limits- Temporarily disables TEST_MODE/TESTING env vars to enable rate limiting
- Safely restores original values in finally block, handling both set and unset cases
The else branch can be simplified since it just yields:
@pytest.fixture(autouse=True) def _rate_limit_env_toggle(request): if request.node.get_closest_marker("rate_limits"): original_test_mode = os.environ.get("TEST_MODE") original_testing = os.environ.get("TESTING") # Remove flags that disable rate limiting os.environ.pop("TEST_MODE", None) os.environ.pop("TESTING", None) try: yield finally: if original_test_mode is None: os.environ.pop("TEST_MODE", None) else: os.environ["TEST_MODE"] = original_test_mode if original_testing is None: os.environ.pop("TESTING", None) else: os.environ["TESTING"] = original_testing - else: - yield + return # yield happens implicitly in else case; but explicit is fine too + yieldActually, on reflection, the current code is clearer—the explicit
else: yieldmakes the dual-path logic obvious. Keep as-is.tldw_Server_API/app/api/v1/endpoints/media.py (2)
715-730: Dead/unreachable code in _validate_identifier_queryreturn True at Line 713 makes the subsequent “return None # Cache miss” and example block unreachable. Remove to reduce noise.
- return None # Cache miss -# --- How to call this function --- -# You would now need to call it from within another async function: -# -# async def some_other_async_function(): -# result = await get_cached_response("some_cache_key") -# if result: -# etag, data = result -# print(f"Got from cache: ETag={etag}, Data={data}") -# else: -# print("Cache miss or error processing cache.") -# -# # To run it: -# # import asyncio -# # asyncio.run(some_other_async_function())
1450-1452: Broad except/pass hides real failuresThese blocks suppress all exceptions (including DB bugs). At minimum, log at debug and restrict the exception where feasible.
- except Exception: - # Identifier table may not exist in older DBs; ignore - pass + except Exception as e: + logger.debug("Skipped identifier index sync (likely missing table or backend limitation): %s", e)Also applies to: 1534-1535, 1714-1715
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
CHANGELOG.md(1 hunks)Docs/Code_Documentation/Guides/AuthNZ_Code_Guide.md(1 hunks)Docs/Code_Documentation/Guides/Chatbooks_Code_Guide.md(1 hunks)tldw_Server_API/Config_Files/config.txt(1 hunks)tldw_Server_API/app/api/v1/endpoints/health.py(1 hunks)tldw_Server_API/app/api/v1/endpoints/media.py(13 hunks)tldw_Server_API/app/core/Chatbooks/README.md(3 hunks)tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py(4 hunks)tldw_Server_API/app/core/Ingestion_Media_Processing/Upload_Sink.py(2 hunks)tldw_Server_API/app/core/config.py(1 hunks)tldw_Server_API/tests/e2e/conftest.py(5 hunks)tldw_Server_API/tests/e2e/fixtures.py(4 hunks)tldw_Server_API/tests/e2e/pytest.ini(1 hunks)tldw_Server_API/tests/e2e/test_concurrent_operations.py(8 hunks)tldw_Server_API/tests/e2e/test_database_operations.py(2 hunks)tldw_Server_API/tests/e2e/test_external_services.py(5 hunks)tldw_Server_API/tests/e2e/test_full_user_workflow.py(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
- tldw_Server_API/app/core/Chatbooks/README.md
🧰 Additional context used
🧬 Code graph analysis (7)
tldw_Server_API/app/api/v1/endpoints/health.py (1)
tldw_Server_API/app/core/AuthNZ/settings.py (1)
get_settings(844-883)
tldw_Server_API/tests/e2e/test_external_services.py (1)
tldw_Server_API/tests/e2e/fixtures.py (5)
SmartErrorHandler(1022-1074)AsyncOperationHandler(1081-1161)wait_for_completion(1085-1137)api_client(683-711)get_media_item(302-306)
tldw_Server_API/tests/e2e/test_database_operations.py (1)
tldw_Server_API/tests/e2e/test_concurrent_operations.py (1)
_maybe_sleep(37-42)
tldw_Server_API/tests/e2e/fixtures.py (3)
tldw_Server_API/tests/Chat/integration/test_chat_completions_integration.py (1)
client(108-122)tldw_Server_API/app/core/LLM_Calls/LLM_API_Calls.py (5)
post(131-150)json(85-86)json(101-102)raise_for_status(104-111)status_code(74-75)tldw_Server_API/app/api/v1/endpoints/notes.py (1)
delete_note(572-602)
tldw_Server_API/tests/e2e/conftest.py (1)
tldw_Server_API/tests/_plugins/e2e_state_fixtures.py (1)
test_results(15-23)
tldw_Server_API/tests/e2e/test_concurrent_operations.py (1)
tldw_Server_API/tests/e2e/test_database_operations.py (1)
_maybe_sleep(31-35)
tldw_Server_API/app/api/v1/endpoints/media.py (6)
tldw_Server_API/app/core/DB_Management/ChaChaNotes_DB.py (3)
transaction(2118-2122)execute(180-191)execute(245-247)tldw_Server_API/app/core/DB_Management/backends/postgresql_backend.py (2)
transaction(657-696)execute(727-804)tldw_Server_API/tests/MediaDB2/test_metadata_endpoints_more.py (2)
transaction(25-31)execute(16-17)tldw_Server_API/app/core/DB_Management/Media_DB_v2.py (2)
transaction(1313-1399)get(155-156)tldw_Server_API/app/core/DB_Management/backends/sqlite_backend.py (2)
transaction(237-262)execute(272-316)tldw_Server_API/tests/MediaDB2/test_safe_metadata_endpoints.py (2)
transaction(83-91)execute(69-70)
🪛 LanguageTool
Docs/Code_Documentation/Guides/AuthNZ_Code_Guide.md
[uncategorized] ~123-~123: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ck_rate_limituses a token bucket; see Rate Limiting below. Also handy DI: -get_optional_...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
Docs/Code_Documentation/Guides/Chatbooks_Code_Guide.md
[grammar] ~15-~15: Ensure spelling is correct
Context: ...ook” ZIP with a JSON manifest. - Import chatbooks back into a user’s workspace with confl...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~99-~99: Ensure spelling is correct
Context: ...s, max concurrent jobs, file size caps, chatbook count. - Quota checks occur in the endp...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
Docs/Code_Documentation/Guides/Chatbooks_Code_Guide.md
13-13: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
19-19: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
38-38: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
50-50: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
69-69: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
79-79: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
89-89: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
98-98: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
102-102: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
117-117: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
123-123: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
135-135: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
141-141: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
158-158: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
167-167: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
173-173: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.4)
tldw_Server_API/app/core/Ingestion_Media_Processing/Upload_Sink.py
904-904: Do not catch blind exception: Exception
(BLE001)
907-908: try-except-pass detected, consider logging the exception
(S110)
907-907: Do not catch blind exception: Exception
(BLE001)
913-914: try-except-pass detected, consider logging the exception
(S110)
913-913: Do not catch blind exception: Exception
(BLE001)
916-916: Do not catch blind exception: Exception
(BLE001)
923-923: Do not catch blind exception: Exception
(BLE001)
951-953: try-except-pass detected, consider logging the exception
(S110)
951-951: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/api/v1/endpoints/health.py
143-145: try-except-pass detected, consider logging the exception
(S110)
143-143: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/tests/e2e/test_external_services.py
277-277: Do not catch blind exception: Exception
(BLE001)
641-641: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/tests/e2e/test_database_operations.py
41-41: Redefinition of unused api_client from line 22
(F811)
41-41: Redefinition of unused data_tracker from line 22
(F811)
tldw_Server_API/tests/e2e/fixtures.py
882-883: try-except-pass detected, consider logging the exception
(S110)
882-882: Do not catch blind exception: Exception
(BLE001)
889-890: try-except-pass detected, consider logging the exception
(S110)
889-889: Do not catch blind exception: Exception
(BLE001)
896-897: try-except-pass detected, consider logging the exception
(S110)
896-896: Do not catch blind exception: Exception
(BLE001)
903-904: try-except-pass detected, consider logging the exception
(S110)
903-903: Do not catch blind exception: Exception
(BLE001)
910-911: try-except-pass detected, consider logging the exception
(S110)
910-910: Do not catch blind exception: Exception
(BLE001)
927-927: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/core/config.py
1415-1415: Do not catch blind exception: Exception
(BLE001)
1425-1425: Do not catch blind exception: Exception
(BLE001)
1433-1433: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py
854-855: try-except-pass detected, consider logging the exception
(S110)
854-854: Do not catch blind exception: Exception
(BLE001)
858-859: try-except-pass detected, consider logging the exception
(S110)
858-858: Do not catch blind exception: Exception
(BLE001)
864-865: try-except-pass detected, consider logging the exception
(S110)
864-864: Do not catch blind exception: Exception
(BLE001)
866-868: try-except-pass detected, consider logging the exception
(S110)
866-866: Do not catch blind exception: Exception
(BLE001)
872-872: Do not catch blind exception: Exception
(BLE001)
1062-1063: try-except-pass detected, consider logging the exception
(S110)
1062-1062: Do not catch blind exception: Exception
(BLE001)
1064-1065: try-except-pass detected, consider logging the exception
(S110)
1064-1064: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/api/v1/endpoints/media.py
1450-1452: try-except-pass detected, consider logging the exception
(S110)
1450-1450: Do not catch blind exception: Exception
(BLE001)
1534-1535: try-except-pass detected, consider logging the exception
(S110)
1534-1534: Do not catch blind exception: Exception
(BLE001)
1714-1715: try-except-pass detected, consider logging the exception
(S110)
1714-1714: Do not catch blind exception: Exception
(BLE001)
8661-8662: Abstract raise to an inner function
(TRY301)
8661-8662: Avoid specifying long messages outside the exception class
(TRY003)
8738-8738: Do not catch blind exception: Exception
(BLE001)
8756-8757: try-except-pass detected, consider logging the exception
(S110)
8756-8756: Do not catch blind exception: Exception
(BLE001)
8761-8761: Avoid specifying long messages outside the exception class
(TRY003)
8764-8764: Avoid specifying long messages outside the exception class
(TRY003)
8767-8767: Do not catch blind exception: Exception
(BLE001)
8770-8770: Do not catch blind exception: Exception
(BLE001)
8771-8771: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
8771-8771: Avoid specifying long messages outside the exception class
(TRY003)
8773-8773: Avoid specifying long messages outside the exception class
(TRY003)
8776-8776: Avoid specifying long messages outside the exception class
(TRY003)
8784-8785: try-except-pass detected, consider logging the exception
(S110)
8784-8784: Do not catch blind exception: Exception
(BLE001)
8801-8802: Abstract raise to an inner function
(TRY301)
8801-8802: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (14)
Docs/Code_Documentation/Guides/AuthNZ_Code_Guide.md (3)
19-20: ✅ Duplication resolved.The previous "Guardrails/middleware" duplication has been successfully consolidated into a single, comprehensive bullet listing all relevant modules. Well done.
1-386: Comprehensive, well-structured developer guide.This new AuthNZ guide provides excellent coverage: clear conceptual explanation, practical code examples, dependency patterns, and troubleshooting guidance. The organization flows logically from "what is it" → "how to use it" → "recipes" → "testing" → "extending," making it easy for developers to find both reference material and implementation guidance.
279-292: All documentation references verified as accurate.External files (JWT_Rotation_Runbook.md, Env_Vars.md, Authentication_Setup.md) all present and accessible. All 13 line number references verified:
- Settings.py (370, 374, 378): Correct field definitions
- Auth.py (234, 622, 569, 857, 176): All endpoints correctly located
- Auth_enhanced.py (125, 228, 384, 439, 504): All enhanced endpoints correctly located
No drift detected between documentation and codebase.
tldw_Server_API/tests/e2e/pytest.ini (1)
26-26: LGTM: new marker registered under strict-markersrate_limits marker is correctly declared and will be enforced due to --strict-markers. No action needed.
tldw_Server_API/tests/e2e/test_full_user_workflow.py (1)
133-147: More flexible health/auth_mode assertions are appropriateAllowing multiple “healthy” statuses and both
single_user/multi_usermodes matches the new health endpoint behavior while still catching regressions, and avoids over-constraining E2E. Looks good.tldw_Server_API/tests/e2e/fixtures.py (1)
664-665: Defaultingauth_modein remote health check is helpfulThe
ensure_server_runningchange thatsetdefaultsauth_modefromAUTH_MODEwhen the remote/healthresponse doesn’t include it nicely aligns the behavior with the in-process branch and simplifies tests that rely onauth_mode. No issues here.tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py (2)
15-15: LGTM: weakref import supports finalizer-based cleanup.The addition of
weakrefis appropriate for the lifecycle management improvements in ONNXEmbedder.
828-832: LGTM: Early attribute initialization improves safety.Initializing critical attributes early ensures they exist even if setup fails later, preventing AttributeError in cleanup paths (del or finalizers).
tldw_Server_API/tests/e2e/conftest.py (5)
14-19: LGTM: Plugin registration enables centralized fixture management.The
pytest_pluginsregistration properly imports shared fixtures from dedicated plugin modules, aligning with the PR's goal of centralizing e2e fixture state. This keepsconftest.pyfocused on markers and CLI options while making fixtures discoverable across the test suite.
53-54: LGTM: rate_limits marker enables selective rate limit testing.The new marker correctly registers tests that verify rate limiting behavior, working in conjunction with the
_rate_limit_env_togglefixture to re-enable rate limits for specific tests.
63-63: LGTM: --run-slow option provides explicit control over slow tests.The logic correctly implements test selection:
--skip-slow: skips slow tests--run-slow: explicitly includes slow tests (overrides skip behavior)- The condition
skip_slow and not run_slowensures--run-slowtakes precedenceThis gives developers fine-grained control over test execution time.
Also applies to: 84-87
131-137: LGTM: CLI option properly defined.The
--run-slowoption is correctly configured with appropriate defaults and clear help text. It pairs well with the existing--skip-slowoption.
139-143: LGTM: Enables session-wide result aggregation.Attaching
test_resultstorequest.configallows pytest hooks (likepytest_sessionfinish) to access accumulated results for reporting. Thetype: ignore[attr-defined]appropriately handles dynamic attribute assignment.Based on learnings
tldw_Server_API/app/api/v1/endpoints/media.py (1)
115-116: Dynamic search rate‑limit: good changeModule‑scoped _SEARCH_RATE_LIMIT and its use in @limiter.limit are correct and avoid the decorator ordering issue seen earlier. Request is accepted by limiter via the request param in the handler.
Also applies to: 2324-2326
tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py
Outdated
Show resolved
Hide resolved
tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
tldw_Server_API/app/api/v1/endpoints/media.py (3)
2198-2211: Duplicate GET "/" routes will shadow each otherYou define two GET handlers for "/" under the same router: list_media_endpoint (Lines 178-187) and list_all_media (Lines 2198-2211). This causes ambiguous routing and OpenAPI duplication. Consolidate into one handler (or give one a different path, e.g., "/list"), and remove the other.
-@router.get( - "/", - status_code=status.HTTP_200_OK, - summary="List All Media Items", - tags=["Media Management"], - response_model=MediaListResponse -) -@limiter.limit("50/minute") -async def list_all_media(... +# Option A: replace the earlier '' and '/' aliases with a single canonical route, +# or delete this duplicate block and keep the earlier pair.
8653-8661: Add SSRF guard in central downloader_user-controlled URLs reach download_url_async (e.g., process-code/doc/pdf/ebook), but no SSRF validation occurs here. Centralize the guard to protect all call sites.
async def _download_url_async( client: Optional[httpx.AsyncClient], url: str, target_dir: Path, allowed_extensions: Optional[Set[str]] = None, # Use a Set for faster lookups check_extension: bool = True, # Flag to enable/disable check disallow_content_types: Optional[Set[str]] = None, # Optional set of content-types to reject for inference allow_redirects: bool = True, ) -> Path: @@ if allowed_extensions is None: allowed_extensions = set() # Default to empty set if None + + # Enforce URL safety policy centrally to prevent SSRF across all endpoints + try: + assert_url_safe(url) + except HTTPException as he: + # Align with existing error handling paths that surface ValueError in batch results + raise ValueError(f"URL rejected by safety policy: {he.detail}") from he
9084-9105: Blocking bug: get_resp is undefined in the fallback branchThis path will raise at runtime. Delegate the body download to the standard streaming path with a fresh client (mirrors the earlier fix you applied above).
- # Stream bytes from the same GET response to disk to honor the patched client in tests - tmp_path = target_path.with_suffix(target_path.suffix + ".part") - try: - tmp_path.parent.mkdir(parents=True, exist_ok=True) - async with aiofiles.open(tmp_path, "wb") as f: - async for chunk in get_resp.aiter_bytes(): - if not chunk: - continue - await f.write(chunk) - # Atomic rename to final path - tmp_path.replace(target_path) - except Exception as _werr: - # Cleanup partial file on error - try: - if tmp_path.exists(): - tmp_path.unlink() - except Exception: - pass - raise - - logger.info(f"Successfully downloaded {url} to {target_path}") - return target_path + # Download body via the canonical path to avoid duplication and undefined variables + temp_client = _m_create_async_client() + try: + return await _download_url_async( + client=temp_client, + url=str(response.url), + target_dir=target_dir, + allowed_extensions=allowed_extensions, + check_extension=check_extension, + disallow_content_types=disallow_content_types, + allow_redirects=allow_redirects, + ) + finally: + try: + await temp_client.aclose() + except Exception: + pass
♻️ Duplicate comments (3)
tldw_Server_API/app/api/v1/endpoints/media.py (2)
1532-1560: Repeat narrowed exception handling here (identifier upsert for PUT)Apply the same narrowed-exception pattern and logging as suggested above for Lines 1435-1466.
1723-1751: Repeat narrowed exception handling here (identifier upsert for advanced update)Apply the same narrowed-exception pattern and logging as suggested above for Lines 1435-1466.
tldw_Server_API/app/core/Ingestion_Media_Processing/Upload_Sink.py (1)
918-923: Regex fallback for<script>/<style>blocks still misses spaced closing tagsThe fallback patterns
r"<script[^>]*>.*?</script>"andr"<style[^>]*>.*?</style>"won’t match closing tags like</script >/</style >(whitespace before>), which is the scenario flagged in the earlier code-scanning warning. When the BeautifulSoup path isn’t available, this regex branch is the main preprocessor, so it’s worth hardening.You can make the patterns more robust by allowing optional whitespace after the tag name and in the closing tag, e.g.:
- preprocessed = _re.sub(r"<script[^>]*>.*?</script>", " ", preprocessed, flags=_re.DOTALL | _re.IGNORECASE) - preprocessed = _re.sub(r"<style[^>]*>.*?</style>", " ", preprocessed, flags=_re.DOTALL | _re.IGNORECASE) + preprocessed = _re.sub(r"(?is)<script\b[^>]*>.*?</script\s*>", " ", preprocessed) + preprocessed = _re.sub(r"(?is)<style\b[^>]*>.*?</style\s*>", " ", preprocessed)This keeps behavior but closes the gap for spaced closing tags and should address the scanner finding.
🧹 Nitpick comments (7)
tldw_Server_API/app/api/v1/endpoints/media.py (1)
8786-8788: Reduce overhead of redirect resolution probeYou perform a full GET for header-only redirect resolution. Prefer HEAD, and on 405/501 fall back to GET with a Range header.
- while True: - r = await client.request("GET", cur_url, follow_redirects=False, timeout=30.0) + while True: + r = await client.request("HEAD", cur_url, follow_redirects=False, timeout=30.0) + if r.status_code in (405, 501): + r = await client.request("GET", cur_url, headers={"Range": "bytes=0-0"}, follow_redirects=False, timeout=30.0)tldw_Server_API/app/core/Ingestion_Media_Processing/Plaintext/Plaintext_Files.py (1)
196-217: HTML pre-sanitization is good; consider reducing fully silentexceptblocksThe BeautifulSoup-based removal of
script/style/noscriptand comments before html2text is a solid defense-in-depth step. The main drawback is that multipleexcept Exception: passblocks and the outerexcept Exceptionaround the soup creation make failures completely silent while falling back to unsanitized HTML.Consider:
- Logging a warning/debug when the BeautifulSoup sanitization path fails, and
- Narrowing the inner
exceptclauses to expected errors (or logging once) so unexpected parser issues are at least visible.This keeps behavior the same in success paths while improving diagnosability when sanitization unexpectedly degrades to the fallback.
tldw_Server_API/app/core/Ingestion_Media_Processing/Upload_Sink.py (1)
893-925: HTML sanitization overall looks strong; consider centralizing logic and improving observabilityThe new pipeline (BeautifulSoup preprocessing → bleach clean → anchor rel hardening → simple tag-stripping fallback) is a solid improvement for HTML ingestion. Two non-blocking refinements:
- The BeautifulSoup-based removal of
script/style/noscriptand comments here closely mirrors the logic inPlaintext_Files.convert_document_to_text. Extracting this into a shared helper (e.g.,_strip_scripts_styles_comments(html: str) -> str) would prevent future drift between upload-time and ingestion-time sanitization.- Several inner
except Exception: passblocks (around.decompose()/.extract()and comment removal) fully swallow errors; only bleach-level failures are logged. To aid debugging if the sanitizer ever degrades unexpectedly, consider at least logging once at debug/warn level or narrowing to expected exception types, while still falling back gracefully.Both changes are backward compatible and can be tackled opportunistically.
Also applies to: 951-967
Docs/Code_Documentation/Guides/Chunking_Code_Guide.md (4)
49-51: Consolidate repetitive "chunk_*" API listings for smoother reading.Lines 49–51 list three consecutive APIs all starting with "chunk_text_" or "chunk_file_", creating textual monotony. Restructure to group these by conceptual category (e.g., "Hierarchical APIs" or "Streaming & File APIs") with sub-bullets.
Example refactor:
Hierarchical & flattening methods: - chunk_text_hierarchical_tree(...), flatten_hierarchical(...) → section/block tree and flattening with ancestry. - chunk_text_hierarchical_flat(...) → convenience wrapper returning flat {text, metadata}. Streaming methods: - chunk_file_stream(file_path, method=None, max_size=None, ...) → Generator[str] - Streaming for large files; see "Streaming" below.
51-51: Replace overused "very large" with more precise descriptors.LanguageTool flagged repeated use of "very large" (lines 51, 91, 181) as weak intensification. Use stronger, more specific language:
- Line 51: "Streaming for very large files" → "Streaming for high-volume or multi-gigabyte files"
- Line 91: "Streaming a very large file" → "Streaming a large file in chunks" or "Streaming a multi-gigabyte file"
- Line 181: "avoiding very large overlaps" → "avoiding excessive overlaps" or "avoiding overlaps >50% of max_size"
This improves precision and readability.
Also applies to: 91-91, 181-181
135-135: Capitalize "Markdown" as proper noun.Line 135 reads "Markdown‑like features" which is correct, but ensure all references to the Markdown language use proper capitalization (already done here, but verify no other instances use lowercase "markdown").
1-244: Solid comprehensive module guide; address minor style & verification items.This is a well-structured developer orientation guide covering scope, APIs, usage patterns, extension points, testing, and best practices for the Chunking module. It provides excellent onboarding material for developers working on E2E test workflows or integrating chunking functionality.
Recommended next steps:
- Address LanguageTool's style suggestions (repetitive sentence structure at lines 49–51, overused "very large" at lines 51/91/181, capitalization at line 135).
- Verify that test markers and E2E fixtures referenced align with the pytest infrastructure changes in this PR.
- Verify all code example imports and method signatures are accurate.
- Confirm all cross-referenced documentation and API files exist.
No critical issues prevent merging; the above are quality and verification improvements.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
Docs/Code_Documentation/Guides/Chunking_Code_Guide.md(1 hunks)tldw_Server_API/app/api/v1/endpoints/media.py(14 hunks)tldw_Server_API/app/core/Ingestion_Media_Processing/Plaintext/Plaintext_Files.py(2 hunks)tldw_Server_API/app/core/Ingestion_Media_Processing/Upload_Sink.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tldw_Server_API/app/api/v1/endpoints/media.py (6)
tldw_Server_API/app/core/DB_Management/backends/base.py (4)
BackendType(27-31)transaction(348-358)connection(283-285)backend_type(330-332)tldw_Server_API/app/core/DB_Management/ChaChaNotes_DB.py (2)
transaction(2118-2122)execute_query(1958-2040)tldw_Server_API/app/core/DB_Management/backends/postgresql_backend.py (3)
transaction(657-696)connection(202-207)backend_type(237-239)tldw_Server_API/tests/MediaDB2/test_metadata_endpoints_more.py (1)
transaction(25-31)tldw_Server_API/app/core/DB_Management/Media_DB_v2.py (3)
transaction(1313-1399)execute_query(1054-1158)get(155-156)tldw_Server_API/app/core/DB_Management/backends/sqlite_backend.py (3)
transaction(237-262)connection(141-148)backend_type(180-182)
🪛 LanguageTool
Docs/Code_Documentation/Guides/Chunking_Code_Guide.md
[style] ~49-~49: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ck tree and flattening with ancestry. - chunk_text_hierarchical_flat(...) → convenien...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~50-~50: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...pper returning flat {text, metadata}. - chunk_file_stream(file_path, method=None, max...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~51-~51: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ons) → Generator[str] - Streaming for very large files; see “Streaming” below. AsyncChu...
(EN_WEAK_ADJECTIVE)
[style] ~91-~91: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...lags": "im"}]}, }, ) Streaming a very large filepython ck = Chunker() for ch in...
(EN_WEAK_ADJECTIVE)
[uncategorized] ~135-~135: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...ders (#), hrules, lists, code fences, markdown tables, blank lines, plus optional cust...
(MARKDOWN_NNP)
[style] ~181-~181: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...hing embeddings downstream and avoiding very large overlaps. - Limiting regex rules in t...
(EN_WEAK_ADJECTIVE)
🪛 Ruff (0.14.4)
tldw_Server_API/app/core/Ingestion_Media_Processing/Plaintext/Plaintext_Files.py
203-203: Do not catch blind exception: Exception
(BLE001)
206-207: try-except-pass detected, consider logging the exception
(S110)
206-206: Do not catch blind exception: Exception
(BLE001)
211-212: try-except-pass detected, consider logging the exception
(S110)
211-211: Do not catch blind exception: Exception
(BLE001)
214-214: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/core/Ingestion_Media_Processing/Upload_Sink.py
904-904: Do not catch blind exception: Exception
(BLE001)
907-908: try-except-pass detected, consider logging the exception
(S110)
907-907: Do not catch blind exception: Exception
(BLE001)
913-914: try-except-pass detected, consider logging the exception
(S110)
913-913: Do not catch blind exception: Exception
(BLE001)
916-916: Do not catch blind exception: Exception
(BLE001)
923-923: Do not catch blind exception: Exception
(BLE001)
951-953: try-except-pass detected, consider logging the exception
(S110)
951-951: Do not catch blind exception: Exception
(BLE001)
955-955: Consider moving this statement to an else block
(TRY300)
956-956: Do not catch blind exception: Exception
(BLE001)
962-962: Do not catch blind exception: Exception
(BLE001)
965-965: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/api/v1/endpoints/media.py
1464-1466: try-except-pass detected, consider logging the exception
(S110)
1464-1464: Do not catch blind exception: Exception
(BLE001)
1559-1560: try-except-pass detected, consider logging the exception
(S110)
1559-1559: Do not catch blind exception: Exception
(BLE001)
1750-1751: try-except-pass detected, consider logging the exception
(S110)
1750-1750: Do not catch blind exception: Exception
(BLE001)
8697-8698: Abstract raise to an inner function
(TRY301)
8697-8698: Avoid specifying long messages outside the exception class
(TRY003)
8774-8774: Do not catch blind exception: Exception
(BLE001)
8792-8793: try-except-pass detected, consider logging the exception
(S110)
8792-8792: Do not catch blind exception: Exception
(BLE001)
8797-8797: Avoid specifying long messages outside the exception class
(TRY003)
8800-8800: Avoid specifying long messages outside the exception class
(TRY003)
8803-8803: Do not catch blind exception: Exception
(BLE001)
8806-8806: Do not catch blind exception: Exception
(BLE001)
8807-8807: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
8807-8807: Avoid specifying long messages outside the exception class
(TRY003)
8809-8809: Avoid specifying long messages outside the exception class
(TRY003)
8812-8812: Avoid specifying long messages outside the exception class
(TRY003)
8820-8821: try-except-pass detected, consider logging the exception
(S110)
8820-8820: Do not catch blind exception: Exception
(BLE001)
8837-8838: Abstract raise to an inner function
(TRY301)
8837-8838: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (5)
tldw_Server_API/app/api/v1/endpoints/media.py (1)
116-118: Good fix: dynamic test-mode rate limiting and correct decorator stackingModule-level _SEARCH_RATE_LIMIT and contiguous decorators resolve the SyntaxError and keep behavior intact. Looks good.
Also applies to: 2360-2362
tldw_Server_API/app/core/Ingestion_Media_Processing/Plaintext/Plaintext_Files.py (1)
12-12: bs4Commentimport is appropriate for new HTML sanitizationThe added
Commentimport matches the usage in the HTML branch for stripping comment nodes and looks correct assumingbeautifulsoup4is available in the runtime environment and declared as a dependency.Docs/Code_Documentation/Guides/Chunking_Code_Guide.md (3)
205-210: Verify test markers align with E2E pytest infrastructure from this PR.The "Testing Pointers" section lists markers:
unit, integration, external_api, local_llm_service. Ensure these match the pytest markers and fixtures introduced in this PR (e.g.,rate_limitsmarker and--run-slowCLI option mentioned in the PR summary).Additionally, if this PR adds centralized e2e fixture modules or pytest plugins, consider adding a brief forward reference here (e.g., "See E2E fixture discovery and shared plugins in tests/e2e/conftest.py") to help developers locate and use the new testing infrastructure.
5-8: All cross-referenced files exist; no action required.Verification confirms that all 7 files referenced in the guide are present in the repository. No missing files, [WIP] notes, or TODO comments are needed.
Likely an incorrect or invalid review comment.
63-72: All code example imports and method signatures verified as correct.Verification confirms:
Chunkerclass exists attldw_Server_API/app/core/Chunking/chunker.py:215chunk_text_with_metadata()exists (line 1441) with matching signatureprocess_text()exists (line 1907) with matching signaturechunk_file_stream()exists (line 2368) with matching signatureTemplateManagerclass exists attldw_Server_API/app/core/Chunking/templates.py:485TemplateManager.process()exists (line 685) with matching signatureAll import paths resolve correctly and method parameters align with the documented examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
tldw_Server_API/app/api/v1/endpoints/media.py (5)
178-187: Duplicate GET "/" routes — one must be removed or renamedTwo handlers are bound to GET "/": list_media_endpoint and list_all_media. This causes route collision/overrides. Pick one path or rename the newer one (e.g., "/all").
Apply one of:
-@router.get( - "/", - status_code=status.HTTP_200_OK, - summary="List All Media Items", - tags=["Media Management"], - response_model=MediaListResponse -) +@router.get( + "/all", + status_code=status.HTTP_200_OK, + summary="List All Media Items", + tags=["Media Management"], + response_model=MediaListResponse +)Also applies to: 2238-2245
9124-9145: Undefined get_resp in fallback branch — runtime NameErrorWhen client is None, the branch streams from get_resp which is undefined here. Delegate to the main streaming path with a temp client (mirrors earlier fix).
- # Stream bytes from the same GET response to disk to honor the patched client in tests - tmp_path = target_path.with_suffix(target_path.suffix + ".part") - try: - tmp_path.parent.mkdir(parents=True, exist_ok=True) - async with aiofiles.open(tmp_path, "wb") as f: - async for chunk in get_resp.aiter_bytes(): - if not chunk: - continue - await f.write(chunk) - # Atomic rename to final path - tmp_path.replace(target_path) - except Exception as _werr: - # Cleanup partial file on error - try: - if tmp_path.exists(): - tmp_path.unlink() - except Exception: - pass - raise - - logger.info(f"Successfully downloaded {url} to {target_path}") - return target_path + # Download the body using the standard streaming path + temp_client = _m_create_async_client() + try: + return await _download_url_async( + client=temp_client, + url=str(response.url), + target_dir=target_dir, + allowed_extensions=allowed_extensions, + check_extension=check_extension, + disallow_content_types=disallow_content_types, + allow_redirects=allow_redirects, + ) + finally: + try: + await temp_client.aclose() + except Exception: + pass
416-425: Missing SSRF guard on URL downloads in /process-codeassert_url_safe isn’t called before scheduling downloads, unlike document-like flow. Guard each URL and surface a per-item error instead of issuing the request.
- tasks = [ - _download_url_async( - client=client, url=u, target_dir=temp_dir, - allowed_extensions=CODE_ALLOWED_EXTENSIONS, check_extension=True - ) for u in urls - ] + safe_tasks = [] + for u in urls: + try: + assert_url_safe(u) + safe_tasks.append( + _download_url_async( + client=client, url=u, target_dir=temp_dir, + allowed_extensions=CODE_ALLOWED_EXTENSIONS, check_extension=True + ) + ) + except HTTPException as he: + batch["results"].append({ + "status": "Error", "input_ref": u, "processing_source": None, + "media_type": "code", "error": f"URL rejected by policy: {he.detail}", + "metadata": {}, "content": None, "chunks": None, "analysis": None, + "keywords": None, "warnings": None, "analysis_details": {}, "db_id": None, + "db_message": "Processing only endpoint." + }) + batch["errors_count"] += 1 + tasks = safe_tasks
4951-4964: Typo: form_data.overlap → form_data.chunk_overlapBackground task uses a non-existent attribute overlap; embeddings jobs will fail.
- result = await generate_embeddings_for_media( + result = await generate_embeddings_for_media( media_id=media_id, media_content=media_content, embedding_model=embedding_model, embedding_provider=embedding_provider, chunk_size=form_data.chunk_size or 1000, - chunk_overlap=form_data.overlap or 200 + chunk_overlap=form_data.chunk_overlap or 200 )
27-28: Path used in annotations but not importedPath is referenced in annotations (e.g., target_dir: Path) but only Path as FilePath is imported. Import Path to avoid NameError when annotations are evaluated.
-from pathlib import Path as FilePath +from pathlib import Path, Path as FilePath
🧹 Nitpick comments (15)
Docs/Code_Documentation/Embeddings-Developer-Guide.md (1)
205-215: Fix style: Replace vague intensifier "very" with concrete descriptor.Chunking is essential to split information into smaller pieces for efficient storage and meaningful retrieval. The guidance provided here aligns well with established RAG practices. However, the phrasing on line 214 uses "very" as an intensifier, which weakens the specificity of the recommendation.
Suggested revision:
- Streaming vs. generator: for very large files on disk use `chunk_file_stream`; for whole in‑memory text without metadata, prefer `chunk_text_generator`. For normalized rows with metadata, use `process_text`. + Streaming vs. generator: for large files exceeding available memory, use `chunk_file_stream`; for complete in‑memory text without metadata, prefer `chunk_text_generator`. For normalized rows with metadata, use `process_text`.tldw_Server_API/app/core/DB_Management/Media_DB_v2.py (2)
6528-6542: Avoid blind except and silent pass when parsing created_atCatch specific exceptions and log at debug. Also drop the inner datetime import; module-level import already exists.
- try: - from datetime import datetime as _dt - created_at_val = _dt.fromisoformat(created_at_val.replace('Z', '+00:00')) - except Exception: - # Leave as-is; Pydantic may still coerce common formats - pass + try: + created_at_val = datetime.fromisoformat(created_at_val.replace('Z', '+00:00')) + except (ValueError, TypeError) as e: + logging.debug( + "get_full_media_details_rich: failed to parse created_at '%s': %s", + created_at_val, e + )
6551-6567: Tighten JSON decoding for timestamps and remove redundant importUse json.JSONDecodeError, log decode issues, and avoid re-importing json under a local alias.
- if isinstance(raw_timestamps, str): - try: - import json as _json - parsed_ts = _json.loads(raw_timestamps) - if isinstance(parsed_ts, list): - # Coerce items to strings for safety - raw_timestamps = [str(x) for x in parsed_ts] - else: - raw_timestamps = [] - except Exception: - raw_timestamps = [] + if isinstance(raw_timestamps, str): + try: + parsed_ts = json.loads(raw_timestamps) + except json.JSONDecodeError as e: + logging.debug("get_full_media_details_rich: invalid timestamps JSON: %s", e) + raw_timestamps = [] + else: + raw_timestamps = [str(x) for x in parsed_ts] if isinstance(parsed_ts, list) else []Docs/Code_Documentation/Chunking-Module.md (2)
89-89: Add language identifiers to code blocks. Lines 89 and 134 contain code blocks without language specifiers. Update```to```json(lines 89, 152) and```yamlor```openapi(line 134) to comply with markdown best practices and improve syntax highlighting.-``` +```json { "method_specific_options": {-``` +```yaml { "hierarchical_template": {Also applies to: 134-134
83-84: Clarify runtime method discovery scope. Line 83 notes that methods include "runtime-registered strategy keys," which is important context. Consider adding a brief note about the implications for API consumers (e.g., "Methods available may vary depending on server configuration and installed dependencies").Docs/Code_Documentation/Guides/Chunking_Code_Guide.md (1)
225-232: Gotchas section captures important practical concerns. The list of overlap constraints, offset preservation caveats, and regex safety guidance is helpful for production usage. However, consider expanding the "code_mode" gotcha note to remind developers that explicit mode selection is safer than relying on language hints.Docs/Published/API-related/Chunking_API_Documentation.md (1)
9-11: Add language identifiers to all code blocks. Multiple code blocks lack language specifiers violating MD040 (lines 9, 26, 45, 75, 88, 105, 128). Update all code fence declarations with appropriate languages:```jsonfor JSON examples,```bashfor curl examples, and```yamlfor OpenAPI schema.Example fixes:
- Line 9:
```→```bash(curl example) or```text(URL)- Lines 26, 45, 75, 105:
```→```json- Line 88:
```→```bash- Line 128:
```→```yaml-``` +```json { "text_content": "... your text ...",-``` +```bash curl -X POST "http://localhost:8000/api/v1/chunking/chunk_file" \-``` +```yaml openapi: 3.0.3Also applies to: 26-37, 45-72, 75-81, 88-96, 105-125, 128-163
tldw_Server_API/app/core/TTS/TTS-README.md (1)
101-109: Align wording with implementation (words vs tokens)Implementation inserts a pause after a configurable number of words (
text.split()), while the doc mentions “tokens”. Consider saying “words” everywhere for Kokoro to match the actual behavior and avoid confusion.tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py (1)
154-199: ONNX sync→async wrapping and SR propagation test is solid (minor lint-only nits)The fake writer and sync generator nicely exercise:
- wrapping a sync iterator into an async generator,
- honoring the stream-provided sample rate for
StreamingAudioWriter,- and ensuring a single writer instance is used.
Static analysis flags some unused parameters (
format,channels,audio_data,target_dtype, and generator args). If you want to quiet Ruff, you could prefix them with_or accept*args, **kwargs, but behavior-wise the test is correct.tldw_Server_API/app/core/TTS/tts_validation.py (3)
55-60: Kokoro text-length limit increased significantly; keep all layers consistentKokoro’s
max_text_lengthis now1_000_000both inProviderLimitsandMAX_TEXT_LENGTHS, which is consistent inside this module but:
KokoroAdapter.get_capabilities()still reportsmax_text_length=500.- The new test now expects a large “no-practical-cap” sentinel on capabilities.
It would be good to have a single source of truth (e.g.,
ProviderLimits.get_max_text_length("kokoro")) used both by validation and byKokoroAdapter.get_capabilities()so clients and tests all see the same limit. Also, double‑check that allowing 1M‑character inputs is acceptable from a latency/memory/DoS perspective for Kokoro.Also applies to: 178-185
367-368: Provider-aware parameter validation wiring is correct but public API is still provider-agnosticPassing
providerinto_validate_parametersfromvalidate_request()correctly enables provider-specific speed ranges. Note that the publicvalidate_parameters()wrapper still calls_validate_parameters(request)without a provider, so callers using that directly get the"default"limits, not provider-specific ones. If external code relies onvalidate_parameters(), consider either:
- adding an optional
providerargument tovalidate_parameters, or- documenting that provider-aware checks only occur via
validate_request.
442-456: Speed validation should guard against non-numeric/None values more explicitlyUsing
raw_speed = getattr(request, "_original_speed", request.speed)plus provider-specificmin_speed/max_speedis good, but ifspeedisNoneor a non-numeric type, the comparisonraw_speed < min_speedwill raiseTypeErrorand end up as a generic “Validation failed” instead of a cleanTTSInvalidInputError.You could tighten this by explicitly coercing and validating the type:
raw_speed = getattr(request, "_original_speed", request.speed) if raw_speed is not None: try: raw_speed = float(raw_speed) except (TypeError, ValueError) as exc: raise TTSInvalidInputError("Speed must be numeric", details={"value": raw_speed}) from exc if raw_speed < min_speed or raw_speed > max_speed: raise TTSInvalidInputError( f"Speed must be between {min_speed} and {max_speed}, got {raw_speed}", details={"min_speed": min_speed, "max_speed": max_speed}, )and skip the comparison when
raw_speedisNone.tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py (2)
165-181: Pause pacing configuration is robust; consider light tightening of boundsReading
pause_interval_words/pause_tagfrom either top-level config or nestedextra_paramsand falling back to sane defaults is good, and matches tests and YAML.If you want to prevent extreme configs from producing a pause tag on every word (e.g.,
pause_interval_words <= 0), you might clamp to a minimum of 1 here before storing the value, but the current behavior is functionally correct and defensive against parsing errors.
799-812: Preprocessing + pause insertion behavior matches configuration and tests
- Quote/apostrophe normalization in
preprocess_textmakes input more uniform without changing semantics._insert_pause_tagsinsertspause_tageverypause_interval_wordswords, and the recursive handling of existing tags preserves manual pacing while filling in long stretches.The try/except around
_insert_pause_tagsprevents preprocessing failures from breaking generation; if you ever want more observability, you could log the exception once, but behavior is sensible and matches the new tests.Also applies to: 814-838
tldw_Server_API/app/api/v1/endpoints/media.py (1)
8810-8862: Too many blind excepts and verbose inline messages in redirect resolverMultiple bare excepts (BLE001/S110) and long message strings (TRY003). Recommend narrowing exceptions (httpx.HTTPError, ValueError), log with context, and raise from e. Consider extracting redirect-policy checks into a helper for clarity.
Also applies to: 8871-8879, 8931-8952
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
Docs/Code_Documentation/Chunking-Module.md(2 hunks)Docs/Code_Documentation/Chunking_Templates_Developer_Guide.md(1 hunks)Docs/Code_Documentation/Embeddings-Developer-Guide.md(1 hunks)Docs/Code_Documentation/Guides/Chunking_Code_Guide.md(1 hunks)Docs/Code_Documentation/index.md(1 hunks)Docs/Published/API-related/Chunking_API_Documentation.md(1 hunks)Docs/Published/API-related/Endpoints_Feature_Map.md(1 hunks)Docs/Published/API-related/index.md(1 hunks)tldw_Server_API/app/api/v1/endpoints/media.py(16 hunks)tldw_Server_API/app/core/DB_Management/Media_DB_v2.py(3 hunks)tldw_Server_API/app/core/TTS/TTS-README.md(1 hunks)tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py(9 hunks)tldw_Server_API/app/core/TTS/tts_providers_config.yaml(1 hunks)tldw_Server_API/app/core/TTS/tts_validation.py(4 hunks)tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tldw_Server_API/app/core/TTS/tts_validation.py (2)
tldw_Server_API/app/core/TTS/adapters/base.py (1)
TTSRequest(107-180)tldw_Server_API/app/core/TTS/tts_exceptions.py (1)
TTSInvalidInputError(76-78)
tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py (1)
tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py (2)
preprocess_text(793-812)_stream_audio_kokoro(507-638)
tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py (3)
tldw_Server_API/app/core/TTS/adapters/base.py (2)
AudioFormat(30-40)close(405-414)tldw_Server_API/WebUI/js/tts.js (13)
text(470-470)text(582-582)request(410-410)request(474-479)format(471-471)format(583-583)format(961-961)error(433-433)v(226-226)v(231-231)id(1344-1344)id(1347-1347)out(1230-1230)tldw_Server_API/app/core/TTS/streaming_audio_writer.py (1)
StreamingAudioWriter(21-160)
tldw_Server_API/app/api/v1/endpoints/media.py (5)
tldw_Server_API/app/core/DB_Management/backends/base.py (4)
BackendType(27-31)transaction(348-358)connection(283-285)backend_type(330-332)tldw_Server_API/app/core/DB_Management/Media_DB_v2.py (3)
get(155-156)transaction(1313-1399)execute_query(1054-1158)tldw_Server_API/app/core/Utils/metadata_utils.py (1)
normalize_safe_metadata(18-82)tldw_Server_API/app/core/DB_Management/backends/postgresql_backend.py (3)
transaction(657-696)connection(202-207)backend_type(237-239)tldw_Server_API/app/core/DB_Management/backends/sqlite_backend.py (3)
transaction(237-262)connection(141-148)backend_type(180-182)
🪛 LanguageTool
Docs/Code_Documentation/Embeddings-Developer-Guide.md
[style] ~214-~214: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ... needed. - Streaming vs. generator: for very large files on disk use chunk_file_stream; ...
(EN_WEAK_ADJECTIVE)
Docs/Code_Documentation/Guides/Chunking_Code_Guide.md
[style] ~51-~51: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ck tree and flattening with ancestry. - chunk_text_hierarchical_flat(...) → convenien...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~52-~52: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...pper returning flat {text, metadata}. - chunk_file_stream(file_path, method=None, max...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~53-~53: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ons) → Generator[str] - Streaming for very large files; see “Streaming” below. AsyncChu...
(EN_WEAK_ADJECTIVE)
[style] ~78-~78: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...shion. - Prefer chunk_file_stream for very large files where you don’t want to load the ...
(EN_WEAK_ADJECTIVE)
[style] ~97-~97: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...lags": "im"}]}, }, ) Streaming a very large filepython ck = Chunker() for ch in...
(EN_WEAK_ADJECTIVE)
[uncategorized] ~141-~141: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...ders (#), hrules, lists, code fences, markdown tables, blank lines, plus optional cust...
(MARKDOWN_NNP)
[style] ~194-~194: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...hing embeddings downstream and avoiding very large overlaps. - Limiting regex rules in t...
(EN_WEAK_ADJECTIVE)
🪛 markdownlint-cli2 (0.18.1)
Docs/Published/API-related/Chunking_API_Documentation.md
9-9: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
26-26: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
45-45: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
75-75: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
88-88: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
105-105: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
128-128: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Docs/Code_Documentation/Chunking-Module.md
89-89: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
134-134: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.4)
tldw_Server_API/app/core/TTS/tts_validation.py
450-450: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/core/DB_Management/Media_DB_v2.py
6534-6536: try-except-pass detected, consider logging the exception
(S110)
6534-6534: Do not catch blind exception: Exception
(BLE001)
6562-6562: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py
161-161: Unused method argument: format
(ARG002)
161-161: Unused method argument: channels
(ARG002)
164-164: Unused method argument: audio_data
(ARG002)
180-180: Unused lambda argument: target_dtype
(ARG005)
184-184: Unused function argument: text
(ARG001)
184-184: Unused function argument: voice
(ARG001)
184-184: Unused function argument: speed
(ARG001)
184-184: Unused function argument: lang
(ARG001)
tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py
172-172: Do not catch blind exception: Exception
(BLE001)
180-180: Do not catch blind exception: Exception
(BLE001)
601-601: Do not catch blind exception: Exception
(BLE001)
637-638: try-except-pass detected, consider logging the exception
(S110)
637-637: Do not catch blind exception: Exception
(BLE001)
669-669: Do not catch blind exception: Exception
(BLE001)
804-804: String contains ambiguous ‘ (LEFT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?
(RUF001)
804-804: String contains ambiguous ’ (RIGHT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?
(RUF001)
809-810: try-except-pass detected, consider logging the exception
(S110)
809-809: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/api/v1/endpoints/media.py
1443-1443: Do not catch blind exception: Exception
(BLE001)
1478-1480: try-except-pass detected, consider logging the exception
(S110)
1478-1478: Do not catch blind exception: Exception
(BLE001)
1550-1550: Do not catch blind exception: Exception
(BLE001)
1585-1586: try-except-pass detected, consider logging the exception
(S110)
1585-1585: Do not catch blind exception: Exception
(BLE001)
1755-1755: Do not catch blind exception: Exception
(BLE001)
1790-1791: try-except-pass detected, consider logging the exception
(S110)
1790-1790: Do not catch blind exception: Exception
(BLE001)
8737-8738: Abstract raise to an inner function
(TRY301)
8737-8738: Avoid specifying long messages outside the exception class
(TRY003)
8814-8814: Do not catch blind exception: Exception
(BLE001)
8832-8833: try-except-pass detected, consider logging the exception
(S110)
8832-8832: Do not catch blind exception: Exception
(BLE001)
8837-8837: Avoid specifying long messages outside the exception class
(TRY003)
8840-8840: Avoid specifying long messages outside the exception class
(TRY003)
8843-8843: Do not catch blind exception: Exception
(BLE001)
8846-8846: Do not catch blind exception: Exception
(BLE001)
8847-8847: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
8847-8847: Avoid specifying long messages outside the exception class
(TRY003)
8849-8849: Avoid specifying long messages outside the exception class
(TRY003)
8852-8852: Avoid specifying long messages outside the exception class
(TRY003)
8860-8861: try-except-pass detected, consider logging the exception
(S110)
8860-8860: Do not catch blind exception: Exception
(BLE001)
8877-8878: Abstract raise to an inner function
(TRY301)
8877-8878: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (24)
Docs/Code_Documentation/Embeddings-Developer-Guide.md (1)
205-215: All referenced code modules and functions exist in the codebase.Verification confirms:
Chunker.process_text✓ (tldw_Server_API/app/core/Chunking/chunker.py)DEFAULT_CHUNK_OPTIONS✓ (tldw_Server_API/app/core/Chunking/init.py)chunk_file_stream✓ (tldw_Server_API/app/core/Chunking/chunker.py)chunk_text_generator✓ (tldw_Server_API/app/core/Chunking/chunker.py)GET /api/v1/chunking/capabilities✓ (tldw_Server_API/app/api/v1/endpoints/chunking.py)Chunking_Code_Guide.md✓ (Docs/Code_Documentation/Guides/Chunking_Code_Guide.md)All documentation references are accurate and accessible.
tldw_Server_API/app/core/DB_Management/Media_DB_v2.py (1)
6589-6591: LGTM: normalized timestamps surfaced in responseExposing timestamps as a list of strings is a sensible, tolerant shape for clients.
Docs/Code_Documentation/index.md (1)
31-31: Link reference format matches existing patterns. The new entry integrating the Chunking Code Guide into the documentation index is properly formatted and positioned alongside related chunking documentation.Docs/Published/API-related/Endpoints_Feature_Map.md (1)
70-74: Verify table view completeness. The JSON map correctly adds the three/api/v1/chunkingendpoints; however, ensure the Table View section (lines 289+) has corresponding rows added for/api/v1/chunkingendpoints. A quick scan should confirm entries exist for POST /chunk_text, POST /chunk_file, and GET /capabilities.Docs/Code_Documentation/Chunking_Templates_Developer_Guide.md (1)
7-7: Cross-reference placement is well-positioned. The new line helpfully directs readers to the core Chunking code guide for implementation details, complementing this templates-focused guide.Docs/Published/API-related/index.md (1)
7-8: API index entry replacement is well-formatted. The transition from Audio Jobs API to Chunking API is clear and maintains consistent documentation structure. Note: Verify the Audio_Jobs_API.md file is not still needed by checking PR scope to confirm this is an intentional replacement rather than an accidental removal.Docs/Code_Documentation/Chunking-Module.md (1)
128-149: Hierarchical template safety documentation is strong. The boundary rule schema, regex safety guardrails (max 20 rules, 256-char limit), and configuration references provide good defensive guidance. This aligns well with the security-conscious approach needed for template-driven chunking.Docs/Code_Documentation/Guides/Chunking_Code_Guide.md (3)
1-20: Comprehensive developer guide with excellent scope and organization. The new guide clearly delineates module scope, provides a practical code map, and offers actionable examples. The emphasis on streaming behavior, hierarchical splitting, and security is well-placed for a chunking module that may process untrusted content.
51-73: Usage pattern examples are clear and practical. The progression from simple strategy calls through hierarchical and file streaming to end-to-end normalized processing gives developers a good mental model of complexity and when to choose each approach.
139-155: Streaming carry-forward overlap behavior is well-documented. The distinction between "carry-forward" (chunks not withheld) vs. "withholding" overlap is important for avoiding subtle bugs. This clarification will help developers avoid incorrect assumptions about overlap semantics.Docs/Published/API-related/Chunking_API_Documentation.md (4)
23-72: POST /chunk_text endpoint documentation is comprehensive. Request and response examples clearly show the metadata structure and option handling. The note about template support is important for users seeking more advanced capabilities.
100-125: GET /capabilities endpoint documentation provides helpful discovery. Documenting the method-specific options shape and noting that methods are a union of enum and runtime-registered strategies helps API consumers understand that availability may vary by deployment.
13-17: Authentication documentation clearly covers both modes. The distinction between single-user and multi-user authentication patterns is explicit and will help users correctly configure their requests.
170-175: Usage tips are practical and method-focused. The guidance on when to use each method (words for embedding windows, structure_aware for long docs, tokens for model alignment, code_mode routing) helps users make informed choices without having to read the full Chunking module documentation.tldw_Server_API/app/core/TTS/tts_providers_config.yaml (1)
39-41: Kokoro long-text pacing config looks consistent
pause_interval_words: 500andpause_tag: "[pause=1.1]"align withKokoroAdapterdefaults and the new tests; no issues from a config/behavior standpoint.tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py (2)
134-143: Config-driven pause interval coverage looks goodThis test correctly verifies that setting
pause_interval_wordsin the adapter config drives pause insertion frequency inpreprocess_text. It tightly couples to the default pause tag"[pause=1.1]", which is fine and intentional given the new Kokoro defaults.
144-152: extra_params-based pause interval is well coveredThis test ensures the adapter honors
pause_interval_wordswhen supplied via a nestedextra_paramsconfig block, matching the constructor’s lookup logic. The expectation of at least two pauses for five words at interval 2 aligns with_insert_pause_tagssemantics.tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py (5)
430-434: Kokoro supported_formats now aligned with validator and docsAdvertising
{MP3, WAV, OPUS}matchesTTSInputValidator.SUPPORTED_FORMATS["kokoro"]and the format preferences section. This keeps capability reporting consistent with validation and config.
523-526: Streaming path improvements (lazy writer, sync→async wrapping, safe finalization) look correct
- Deferring
StreamingAudioWritercreation until the first(samples_chunk, sr_chunk)allows you to honor the true stream sample rate.- The sync→async wrapper for ONNX (
_sync_source+_async_wrap) correctly handles non-async iterators without changing the external async interface.- Conditional finalization (
if writer is not None) and guardedwriter.close()avoid unnecessary work when no audio is produced and prevent cleanup errors from leaking.The logic reads clean and matches the new ONNX streaming test, with no obvious resource-leak or concurrency pitfalls.
Also applies to: 531-550, 597-608, 621-627, 634-638
571-573: Language code normalization for PyTorch pipelinesUsing
lang_code = lang.split('-')[0] if lang else 'en'before building/looking upKPipelineinstances is a sensible normalization step given_get_language_from_voicereturns tags likeen-us/en-gb. This should reduce duplicate pipelines while keeping behavior predictable.
666-676: Dynamic voice passthrough before mapping generic IDs is a good fixChecking both static
VOICESand dynamically loaded voice IDs before callingmap_voiceensures that custom/dynamic Kokoro voices aren’t accidentally remapped to a generic preset. This aligns with the dynamic voices loader and makes mixed deployments safer.
778-789: Historicalchild -> af_nicolemapping is clearly documentedThe explicit comment that
"af_nicole"isn’t in staticVOICESbut kept for compatibility is helpful. As long as the underlying Kokoro assets or dynamic voices provide that ID in the environments where it’s used, this mapping is fine.tldw_Server_API/app/api/v1/endpoints/media.py (2)
116-118: Good: centralized, test-aware rate limit constantModule-scope _SEARCH_RATE_LIMIT and decorator usage are clean and fix the earlier decorator ordering issue.
1308-1318: Good: canonical identifier normalizationConsistent lowercasing/order for doi/pmid/pmcid/arxiv_id/s2_paper_id improves search correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tldw_Server_API/app/api/v1/API_Deps/auth_deps.py (1)
779-808: Argument order bug inget_optional_current_useris confirmed — fix required before function is usedThe bug identified in the review is confirmed. The
get_current_userfunction signature expects(request, response, credentials, session_manager, db_pool, ...)butget_optional_current_usercalls it with(request, credentials, jwt_service, session_manager, db_pool). This misalignment will cause runtime failures when the function is invoked—specifically,credentials.credentialswill fail becausecredentialsreceives aJWTServiceinstance instead ofHTTPAuthorizationCredentials.While the function is not currently used in active code paths (searches found only its definition and documentation examples), it is documented as a public dependency for optional authentication endpoints. The proposed fix correctly aligns the function signatures by adding the missing
responseparameter and removing the unusedjwt_servicedependency.
♻️ Duplicate comments (1)
tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py (1)
76-78: Capabilities test matches enlarged Kokoro max_text_length, but could use canonical sourceAsserting
caps.max_text_length >= 100000correctly validates that Kokoro advertises a very large limit relative to typical providers and matches the adapter’s 1_000_000 sentinel.To avoid future drift, consider asserting against the canonical limit source (e.g.,
ProviderLimits.get_max_text_length("kokoro")) or haveKokoroAdapter.get_capabilities()derive its value from there so tests and limits share one source of truth.
🧹 Nitpick comments (12)
Docs/Design/Text2SQL.md (1)
14-14: Consider wrapping bare URLs in angle brackets for markdown linting compliance.The new URL on line 14 follows the existing pattern in the "Link Dump" section, where all URLs are bare. However, markdownlint flags this as a violation of MD034 (no-bare-urls). For consistent markdown formatting across the documentation, consider wrapping URLs in angle brackets.
If reformatting the section, apply this pattern to all URLs:
-https://arxiv.org/abs/2511.08245 +<https://arxiv.org/abs/2511.08245>Alternatively, to keep consistency with the existing style across the file, you could suppress this linting rule for this section in a
.markdownlintrcconfig.Docs/Design/RAG-Upgrades-PRD.md (1)
1-223: Well-structured PRD with comprehensive phased approach.The document effectively outlines the RAG upgrades roadmap with clear phases, functional requirements, acceptance criteria, and test plan. The emphasis on backward compatibility, feature flags, and graceful degradation is appropriate for a production system. Section 16 (Test Plan) ties the PRD to E2E testing coverage, making this a solid supporting document for the PR's testing improvements.
Consider cross-referencing specific API endpoints, config flags, and test scripts in follow-up PRs once implementation begins, to keep this document aligned with actual code changes.
tldw_Server_API/app/api/v1/API_Deps/auth_deps.py (1)
896-919: Verified: extract shared rate-limit bypass helper to eliminate duplicationBoth
check_rate_limit(lines 839–858) andcheck_auth_rate_limit(lines 904–919) implement identical TEST_MODE and single-user API-key/bearer-token bypass logic. Extracting this into a shared helper likemaybe_bypass_rate_limit(request) -> boolwill reduce maintenance risk and keep the two functions in sync. Additionally, both currently use bareexcept Exception: pass; adding a debug log statement would help surface configuration issues without changing control flow.tldw_Server_API/tests/e2e/fixtures.py (2)
430-468: Character JSON import path looks correct and preserves legacy fieldsSwitching dict-based imports to
POST /api/v1/characters/with a mapped payload (including legacy keys likefirst_mes,mes_example,avatar) is a solid improvement and avoids abusing the file-import endpoint. The_extract_image_base64helper correctly strips data URI prefixes when present.One minor improvement you could consider later is pruning
Nonefields frompayloadbefore sending to reduce 422 noise if the schema tightens, but the current approach is functionally sound.
882-930: Resource cleanup is safe but swallows per-item errors silentlyThe new
cleanup_resourcesplus the updateddata_trackerfixture give you automatic teardown of prompts, notes, chats, characters, and media, which is great for keeping the test environment clean. Right now, each per-item deletion wrapsExceptionand simplypasses, so individual cleanup failures are completely silent.Given this runs in teardown and you already guard against teardown failures at the fixture level, consider at least logging a short warning per failure (resource type + ID) so intermittent cleanup problems can be diagnosed without impacting test outcomes.
Also applies to: 933-948
tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py (7)
165-181: Consider more specific exception handling.The blind
Exceptioncatch provides safe fallback but could be more specific (e.g.,(ValueError, TypeError, KeyError)) to avoid masking unexpected errors during initialization.Apply this diff if you want to be more specific:
- except Exception: + except (ValueError, TypeError, KeyError, AttributeError): self.pause_interval_words = 500(Similarly for the pause_tag block)
597-608: Consider more specific exception handling for sample rate conversion.The blind
Exceptioncatch when convertingsr_chunkto int could mask unexpected errors. Consider catching specific exceptions like(ValueError, TypeError).- except Exception: + except (ValueError, TypeError): effective_sr = self.sample_rate
634-638: Consider logging cleanup exceptions for debugging.While silently catching exceptions in cleanup code prevents cascading errors, logging them would help diagnose resource cleanup issues during debugging.
finally: try: if writer is not None: writer.close() - except Exception: - pass + except Exception as e: + logger.debug(f"{self.provider_name}: Error closing writer during cleanup: {e}")
666-677: Consider more specific exception handling for dynamic voices.The blind
Exceptioncatch when buildingdynamic_idscould mask issues with the_dynamic_voicesattribute. Consider catchingAttributeErrorspecifically since that's the most likely error if the attribute is missing.- except Exception: + except (AttributeError, TypeError): dynamic_ids = set()
679-751: Consider logging dynamic voice loading failures for diagnostics.The method silently handles multiple exception paths (lines 719-721, 747-751) which could make it difficult to diagnose voice loading issues. While graceful degradation is appropriate, logging warnings would help users understand why certain voices weren't loaded.
Example for the directory parsing fallback:
except Exception: # Fall back to JSON parsing - pass + logger.debug(f"{self.provider_name}: Failed to parse voices directory, trying JSON format")And for the final JSON parsing:
except Exception: - return + logger.debug(f"{self.provider_name}: Failed to load dynamic voices from JSON") + return
802-810: Consider logging pause tag insertion failures.While silently handling pause tag insertion failures is acceptable (it's a non-critical feature), logging would help diagnose unexpected behavior with long texts.
try: text = self._insert_pause_tags(text, words_between=self.pause_interval_words, pause_tag=self.pause_tag) - except Exception: - pass + except Exception as e: + logger.debug(f"{self.provider_name}: Failed to insert pause tags: {e}")Note: The static analysis warning about ambiguous quotes on line 804 is a false positive - the code intentionally normalizes various quote characters to standard ASCII quotes.
814-838: Consider adding input validation for edge cases.The method doesn't validate edge cases that could cause issues:
- Negative or zero
words_between: Could cause infinite recursion or division errors- Empty
pause_tag: Would still process but with no effect- Recursive depth: Text with many existing pause tags could cause deep recursion
Add validation at the start:
def _insert_pause_tags(self, text: str, words_between: int = 500, pause_tag: str = '[pause=1.1]') -> str: """Ensure a pause tag appears at least every N words. - Splits on whitespace and inserts a pause marker every `words_between` tokens. - Respects existing pause markers by splitting and processing each section independently. """ + # Validate inputs + if words_between <= 0: + return text + if not pause_tag or not pause_tag.strip(): + return text + # If already contains pause tags, process sections separately so spacing is preserved if pause_tag in text:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
Docs/Design/RAG-Upgrades-PRD.md(1 hunks)Docs/Design/Text2SQL.md(1 hunks)tldw_Server_API/app/api/v1/API_Deps/auth_deps.py(2 hunks)tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py(9 hunks)tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py(3 hunks)tldw_Server_API/tests/TTS/test_tts_validation.py(1 hunks)tldw_Server_API/tests/e2e/fixtures.py(6 hunks)tldw_Server_API/tests/e2e/test_negative_scenarios.py(3 hunks)tldw_Server_API/tests/e2e/test_smoke_user_journeys.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
tldw_Server_API/tests/TTS/test_tts_validation.py (1)
tldw_Server_API/app/core/TTS/tts_validation.py (2)
ProviderLimits(34-129)get_limits(97-107)
tldw_Server_API/app/api/v1/API_Deps/auth_deps.py (1)
tldw_Server_API/app/core/AuthNZ/settings.py (2)
is_single_user_mode(911-913)get_settings(844-883)
tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py (1)
tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py (3)
preprocess_text(793-812)_stream_audio_kokoro(507-638)get_capabilities(422-449)
tldw_Server_API/tests/e2e/test_negative_scenarios.py (3)
tldw_Server_API/tests/e2e/fixtures.py (2)
authenticated_client(744-783)health_check(599-603)tldw_Server_API/app/api/v1/endpoints/evaluations_unified.py (1)
health_check(990-1007)tldw_Server_API/app/main.py (1)
health_check(3517-3543)
tldw_Server_API/tests/e2e/fixtures.py (3)
tldw_Server_API/tests/Chat/integration/test_chat_completions_integration.py (1)
client(108-122)tldw_Server_API/tests/Chat/integration/test_chat_endpoint.py (1)
client(88-106)tldw_Server_API/app/api/v1/endpoints/notes.py (1)
delete_note(572-602)
tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py (3)
tldw_Server_API/app/core/TTS/adapters/base.py (2)
AudioFormat(30-40)close(405-414)tldw_Server_API/app/core/TTS/streaming_audio_writer.py (1)
StreamingAudioWriter(21-160)tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py (2)
write_chunk(164-165)close(166-167)
🪛 LanguageTool
Docs/Design/RAG-Upgrades-PRD.md
[grammar] ~51-~51: Use a hyphen to join words.
Context: ...ble enable_intent_routing=true to auto adjust hybrid_alpha for lexical vs sem...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
Docs/Design/Text2SQL.md
14-14: Bare URL used
(MD034, no-bare-urls)
Docs/Design/RAG-Upgrades-PRD.md
65-65: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
67-67: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
69-69: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
73-73: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
75-75: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
79-79: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
🪛 Ruff (0.14.4)
tldw_Server_API/app/api/v1/API_Deps/auth_deps.py
836-837: try-except-pass detected, consider logging the exception
(S110)
836-836: Do not catch blind exception: Exception
(BLE001)
854-856: try-except-pass detected, consider logging the exception
(S110)
854-854: Do not catch blind exception: Exception
(BLE001)
901-902: try-except-pass detected, consider logging the exception
(S110)
901-901: Do not catch blind exception: Exception
(BLE001)
917-918: try-except-pass detected, consider logging the exception
(S110)
917-917: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py
161-161: Unused method argument: format
(ARG002)
161-161: Unused method argument: channels
(ARG002)
164-164: Unused method argument: audio_data
(ARG002)
180-180: Unused lambda argument: target_dtype
(ARG005)
184-184: Unused function argument: text
(ARG001)
184-184: Unused function argument: voice
(ARG001)
184-184: Unused function argument: speed
(ARG001)
184-184: Unused function argument: lang
(ARG001)
tldw_Server_API/tests/e2e/fixtures.py
900-901: try-except-pass detected, consider logging the exception
(S110)
900-900: Do not catch blind exception: Exception
(BLE001)
907-908: try-except-pass detected, consider logging the exception
(S110)
907-907: Do not catch blind exception: Exception
(BLE001)
914-915: try-except-pass detected, consider logging the exception
(S110)
914-914: Do not catch blind exception: Exception
(BLE001)
921-922: try-except-pass detected, consider logging the exception
(S110)
921-921: Do not catch blind exception: Exception
(BLE001)
928-929: try-except-pass detected, consider logging the exception
(S110)
928-928: Do not catch blind exception: Exception
(BLE001)
945-945: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py
172-172: Do not catch blind exception: Exception
(BLE001)
180-180: Do not catch blind exception: Exception
(BLE001)
601-601: Do not catch blind exception: Exception
(BLE001)
637-638: try-except-pass detected, consider logging the exception
(S110)
637-637: Do not catch blind exception: Exception
(BLE001)
669-669: Do not catch blind exception: Exception
(BLE001)
804-804: String contains ambiguous ‘ (LEFT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?
(RUF001)
804-804: String contains ambiguous ’ (RIGHT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?
(RUF001)
809-810: try-except-pass detected, consider logging the exception
(S110)
809-809: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (16)
tldw_Server_API/tests/TTS/test_tts_validation.py (1)
226-229: Kokoro max_text_length assertion now matches enlarged limitsAllowing any large sentinel via
>= 100000aligns the test with the new provider limits semantics and avoids over-constraining the exact value. Looks good.tldw_Server_API/tests/e2e/test_negative_scenarios.py (1)
411-414: Broader health status set keeps tests stable across health implementationsRelaxing the health assertions to accept
"healthy","ok", or"degraded"is reasonable given multiple health endpoints and transient degraded states, while still catching outright failures. Change is consistent across all negative-scenario checks.Also applies to: 531-534, 594-595
tldw_Server_API/tests/e2e/test_smoke_user_journeys.py (1)
76-85: RAG response normalization in smoke test is robust and backward compatiblePreferring
documentsand falling back toresults/items, with a final list-type assertion, matches the unified RAG contract while staying compatible with older shapes. Nicely loosens coupling to exact payload structure.tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py (3)
134-153: Pause insertion interval tests align with adapter configuration behaviorThese two tests exercising
pause_interval_wordsfrom direct config vs nestedextra_paramsmirror the adapter’s configuration logic and validate that default[pause=1.1]tags are inserted at the expected word intervals. Good, focused coverage of pacing behavior.
154-199: ONNX streaming test nicely validates writer creation and sample-rate propagationThe ONNX-path test stub (FakeWriter + sync generator) accurately checks that
_stream_audio_kokoro:
- Wraps a sync iterator into an async stream,
- Creates the writer only once, and
- Honors the per-chunk sample rate (
12345).This gives strong behavioral coverage of a tricky streaming path without coupling to the real writer implementation.
274-287: Text-length behavior test correctly reflects pause-based pacing semanticsShifting the test to:
- Assert pause-tag insertion for >500 words, and
- Ensure
caps.max_text_lengthcomfortably exceeds the processed text length,matches the design where Kokoro uses pacing via pauses rather than a strict hard cap. This keeps the test meaningful while allowing very large inputs.
tldw_Server_API/tests/e2e/fixtures.py (3)
505-510: delete_chat helper is simple and consistent with other delete methodsThe new
delete_chatwrapper mirrors the pattern used for other delete helpers and tolerates both 200 and 204, which matches typical REST behavior for soft deletes.
554-563: RAG response normalization improves backward compatibilityThe normalization in
rag_simple_searchandrag_advanced_search—derivingdocsfromdocuments/results/items, then ensuring bothresultsanddocumentskeys exist and synthesizing asuccessflag—gives callers a stable shape across legacy and unified RAG implementations. This should noticeably reduce brittleness in e2e tests that assert on these fields.Also applies to: 587-596
681-683: ensure_server_running now guarantees auth_mode in health dataDefaulting
auth_modewhen the health payload omits it keeps downstream fixtures (likeapi_clientandauthenticated_client) from having to special-case missing auth metadata, while still preferring real values from the server when present.tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py (7)
430-434: LGTM! Format support expanded correctly.The addition of OPUS format aligns with the validator requirements and the AudioFormat enum definition.
524-525: LGTM! Deferred writer creation improves robustness.Deferring the StreamingAudioWriter initialization until the first chunk allows the effective sample rate to be determined dynamically, which is a good practice for handling variable source formats.
531-549: Verify async wrapper doesn't block the event loop.The
_async_wrap()function wraps synchronous iteration but doesn't yield control to the event loop between items. Ifbase_iterperforms blocking I/O or heavy computation, this could still block the event loop despite being an async generator.Consider adding
await asyncio.sleep(0)in the wrapper to yield control:async def _async_wrap(): for item in _sync_source(): + await asyncio.sleep(0) # Yield control to event loop yield itemHowever, verify whether
create_stream()is already non-blocking before making changes.
571-573: LGTM! Language code extraction is correct.The primary language tag extraction properly handles language codes like "en-us" → "en" and provides a sensible default.
622-627: LGTM! Finalization properly guarded.The check ensures finalization only occurs when a writer was created, correctly handling the edge case of empty or no audio chunks.
435-435: Add test coverage validating max_text_length=1,000,000 performance and memory safety.While pause insertion (every 500 words) and streaming support provide mitigations for handling long text, there is no explicit test validating that the full 1,000,000 character limit works correctly under realistic conditions. The existing test (
test_kokoro_adapter_mock.py) only asserts thatmax_text_length >= 100000, not that the system handles 1M characters without memory or latency issues. Add a test that generates audio with text near the 1M limit to confirm memory consumption and generation latency remain acceptable.
787-788: No action required — the mapping is correct and handles voice IDs appropriately.The
"af_nicole"voice ID is valid for Kokoro TTS and is supported by the Kokoro library directly, even though it's not included in the adapter's staticVOICESdictionary. The static dictionary serves as documentation and for voice discovery, not as validation. When the voice ID is passed tokokoro_instance.create_stream(), Kokoro handles it correctly. The comment accurately documents that this voice is not in the static list but is retained for backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tldw_Server_API/app/core/RAG/rag_service/unified_pipeline.py (1)
1188-1216: LGTM! Guard addition prevents iteration errors.The addition of the
if result.documents:guard on line 1198 is a good defensive improvement that prevents attempting to iterate over empty or None document lists. The metadata emission on line 1216 correctly remains outside this guard, ensuring that numeric intent detection is recorded even when no documents are retrieved, which aligns with test expectations.The logic is sound:
- The guard prevents iteration errors
- The
affectedcounter correctly starts at 0- Metadata is emitted for observability regardless of document presence
Optional: Consider logging the exception on line 1213.
The bare exception handling on line 1213 is defensively correct (it prevents one document's processing error from breaking the entire boost operation), but adding a debug-level log would improve debuggability:
except Exception: - continue + logger.debug(f"Failed to apply numeric/table boost to document {getattr(d, 'id', 'unknown')}") + continueThis is a minor enhancement and not required for correctness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tldw_Server_API/app/core/RAG/rag_service/unified_pipeline.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tldw_Server_API/app/core/RAG/rag_service/unified_pipeline.py (3)
tldw_Server_API/app/core/RAG/rag_service/vector_stores/pgvector_adapter.py (1)
search(638-691)tldw_Server_API/tests/RAG_NEW/unit/test_unified_pipeline_focused.py (1)
get(243-244)tldw_Server_API/app/core/RAG/rag_service/agentic_chunker.py (1)
looks_table(515-521)
🪛 Ruff (0.14.4)
tldw_Server_API/app/core/RAG/rag_service/unified_pipeline.py
1194-1194: Do not catch blind exception: Exception
(BLE001)
1213-1214: try-except-continue detected, consider logging the exception
(S112)
1213-1213: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tldw_Server_API/app/api/v1/endpoints/health.py (1)
141-141: Fix TEST_MODE truthy parsing to prevent unintended key exposure.The current check
os.getenv("TEST_MODE")is truthy for any non-empty string, including"false","0","no", etc. This could inadvertently expose the test API key in environments whereTEST_MODEis explicitly set to a falsy value.Apply this diff to align with the truthy parsing used elsewhere in the codebase (see
tldw_Server_API/app/core/AuthNZ/settings.pylines 875-877):if ( - os.getenv("TEST_MODE") + str(os.getenv("TEST_MODE", "")).strip().lower() in {"1", "true", "yes", "on"} and body["auth_mode"] == "single_user" and str(os.getenv("HEALTH_EXPOSE_TEST_API_KEY", "")).strip().lower() == "true" ):
🧹 Nitpick comments (7)
tldw_Server_API/tests/_plugins/media_fixtures.py (1)
49-53: Consider sorting__all__alphabetically for consistency.The export list could be alphabetically sorted to align with the RUF022 style guideline.
Apply this diff if you'd like to sort alphabetically:
__all__ = [ + "test_audio_file", - "test_text_file", "test_pdf_file", - "test_audio_file", + "test_text_file", ]tldw_Server_API/app/api/v1/endpoints/health.py (1)
148-150: Add debug logging to the exception handler for troubleshooting.Silent exception handling makes debugging difficult when settings enrichment fails. While the health endpoint should remain resilient, logging the exception at debug level would aid troubleshooting without impacting reliability.
Apply this diff to add debug logging:
- except Exception: + except Exception as exc: # Never fail health on settings import issues - pass + logger.debug(f"/health settings enrichment skipped: {exc}")tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py (2)
135-153: Consider usingadapter.pause_tagfor better maintainability.Both pause insertion tests hardcode
'[pause=1.1]'in the assertions (lines 143, 153). If the default pause tag changes, these tests will break unexpectedly.Apply this diff to make the tests more resilient:
async def test_pause_insertion_interval_from_config(self): """Pause tags should be inserted based on configurable interval""" adapter = KokoroAdapter({ "pause_interval_words": 3 }) text = "one two three four five six seven" processed = adapter.preprocess_text(text) # Expect pause after words 3 and 6 - assert processed.count('[pause=1.1]') >= 2 + assert processed.count(adapter.pause_tag) >= 2 async def test_pause_insertion_interval_from_extra_params(self): """Pause tags should respect nested extra_params configuration""" adapter = KokoroAdapter({ "extra_params": {"pause_interval_words": 2} }) text = "a b c d e" processed = adapter.preprocess_text(text) # Expect at least two pauses for 5 words with interval 2 - assert processed.count('[pause=1.1]') >= 2 + assert processed.count(adapter.pause_tag) >= 2
155-199: Well-structured test; consider underscore prefixes for unused stub parameters.This test comprehensively validates that the ONNX sync iterator is wrapped to async and that the sample rate is correctly propagated to the
StreamingAudioWriter. The test setup is clean and effective.Static analysis flags unused parameters in the
FakeWriterstub andsync_streamgenerator. While these are intentional interface matches, you can silence the warnings using underscore prefixes:class FakeWriter: constructed = 0 last_sr = None - def __init__(self, format: str, sample_rate: int, channels: int): + def __init__(self, _format: str, sample_rate: int, _channels: int): FakeWriter.constructed += 1 FakeWriter.last_sr = sample_rate - def write_chunk(self, audio_data=None, finalize: bool=False): + def write_chunk(self, _audio_data=None, finalize: bool=False): return b'D' if not finalize else b'F' def close(self): pass # ... # Sync generator that yields two chunks with a custom sample rate -def sync_stream(text, voice, speed, lang): +def sync_stream(_text, _voice, _speed, _lang): yield (np.array([0.0, 0.1], dtype=np.float32), 12345) yield (np.array([0.2, -0.2], dtype=np.float32), 12345)Alternatively:
adapter.audio_normalizer = type('N', (), { - 'normalize': staticmethod(lambda x, target_dtype: (np.clip(x, -1.0, 1.0) * 32767).astype(np.int16)) + 'normalize': staticmethod(lambda x, _target_dtype: (np.clip(x, -1.0, 1.0) * 32767).astype(np.int16)) })()Docs/Development/Testing.md (2)
26-33: Add language specifier to code block.The fenced code block should specify
pythonas the language for proper syntax highlighting and linting compliance.Apply this diff:
-``` +```python pytest_plugins = [ "tldw_Server_API.tests._plugins.e2e_fixtures", "tldw_Server_API.tests._plugins.e2e_state_fixtures", "tldw_Server_API.tests._plugins.chat_fixtures", "tldw_Server_API.tests._plugins.media_fixtures", ]--- `78-80`: **Add language specifier to code block.** The fenced code block should specify `bash` or `shell` as the language for proper syntax highlighting and linting compliance. Apply this diff: ```diff -``` +```bash pytest --fixtures -q tldw_Server_API/tests/e2e | rg -q '^test_user_credentials$'</blockquote></details> <details> <summary>tldw_Server_API/tests/_plugins/e2e_state_fixtures.py (1)</summary><blockquote> `178-183`: **Consider sorting `__all__` alphabetically (optional).** The `__all__` list is not sorted alphabetically. While the current order is logical, sorting alphabetically helps with maintainability and satisfies the Ruff RUF022 check. Apply this diff: ```diff __all__ = [ + "ensure_embeddings", + "shared_media_state", "test_results", - "shared_media_state", "test_workflow_state", - "ensure_embeddings", ]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
Docs/Development/Testing.md(1 hunks)tldw_Server_API/app/api/v1/endpoints/health.py(1 hunks)tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py(4 hunks)tldw_Server_API/tests/_plugins/e2e_state_fixtures.py(1 hunks)tldw_Server_API/tests/_plugins/media_fixtures.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py (2)
tldw_Server_API/app/core/TTS/tts_validation.py (2)
ProviderLimits(34-129)get_max_text_length(110-113)tldw_Server_API/app/core/TTS/adapters/kokoro_adapter.py (3)
KokoroAdapter(43-908)preprocess_text(793-812)_stream_audio_kokoro(507-638)
tldw_Server_API/tests/_plugins/media_fixtures.py (1)
tldw_Server_API/tests/e2e/fixtures.py (3)
create_test_file(786-790)create_test_pdf(793-799)cleanup_test_file(829-834)
tldw_Server_API/app/api/v1/endpoints/health.py (1)
tldw_Server_API/app/core/AuthNZ/settings.py (1)
get_settings(844-883)
🪛 LanguageTool
Docs/Development/Testing.md
[grammar] ~70-~70: Ensure spelling is correct
Context: ...surface explicit. - Prefer composition: re‑use helpers from `tldw_Server_API.tests.e2e...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
Docs/Development/Testing.md
26-26: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
78-78: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.4)
tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py
162-162: Unused method argument: format
(ARG002)
162-162: Unused method argument: channels
(ARG002)
165-165: Unused method argument: audio_data
(ARG002)
181-181: Unused lambda argument: target_dtype
(ARG005)
185-185: Unused function argument: text
(ARG001)
185-185: Unused function argument: voice
(ARG001)
185-185: Unused function argument: speed
(ARG001)
185-185: Unused function argument: lang
(ARG001)
tldw_Server_API/tests/_plugins/media_fixtures.py
26-26: Do not catch blind exception: Exception
(BLE001)
49-53: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
tldw_Server_API/tests/_plugins/e2e_state_fixtures.py
30-30: Do not catch blind exception: Exception
(BLE001)
35-36: try-except-pass detected, consider logging the exception
(S110)
35-35: Do not catch blind exception: Exception
(BLE001)
125-125: Do not catch blind exception: Exception
(BLE001)
178-183: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
tldw_Server_API/app/api/v1/endpoints/health.py
148-150: try-except-pass detected, consider logging the exception
(S110)
148-148: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (10)
tldw_Server_API/tests/_plugins/media_fixtures.py (3)
1-17: LGTM! Clean plugin structure with appropriate imports.The module docstring clearly explains the purpose, and the imports are well-organized. Importing the test asset creators from the e2e fixtures module promotes code reuse.
20-28: LGTM! Defensive cleanup pattern is appropriate here.The broad exception catch in the cleanup finally block is intentional defensive coding. Cleanup failures are logged visibly but don't propagate, which prevents test result masking. This is a reasonable tradeoff for test fixtures where cleanup is best-effort.
31-46: LGTM! Generator fixture pattern is correct.The fixtures now properly use
yield from _yield_file(...), which ensures pytest manages the setup/teardown lifecycle correctly. Tests receive a string path (not a generator object), and cleanup executes reliably. The consistent pattern across all three fixtures is well-structured.tldw_Server_API/tests/TTS/adapters/test_kokoro_adapter_mock.py (2)
77-78: Excellent fix—now synced with the canonical limit source.This change directly addresses the previous review feedback by validating
max_text_lengthagainstProviderLimits.get_max_text_length("kokoro")instead of a hardcoded value. The test now reflects the same source of truth used throughout the codebase.
274-287: LGTM—test correctly validates pause-based pacing.The refactored test accurately reflects Kokoro's new behavior: no hard maximum length, but pacing enforced via periodic pause insertion. The test generates 550 words (exceeding the default 500-word interval) and verifies at least one pause tag is inserted.
Minor note: The assertion at line 287 (
caps.max_text_length >= len(processed)) will always pass given Kokoro's 1,000,000 character limit. The substantive validation is the pause insertion check at line 283, which correctly verifies the pacing behavior.tldw_Server_API/tests/_plugins/e2e_state_fixtures.py (5)
13-37: LGTM! Test results fixture properly wired.The session-scoped fixture correctly creates the results dictionary and wires it to the session config for access in hooks. The defensive exception handling ensures compatibility across different pytest versions and configurations.
40-49: LGTM! Clean shared state structure.The session-scoped state dictionary uses appropriate data structures (sets for embeddings tracking, dicts for resource mappings) and provides a clear contract for shared test state.
52-99: LGTM! Well-designed workflow state management.The
WorkflowStateclass provides a clean API for managing per-class test state while properly referencing session-wide shared resources. The separation between local state (media_items) and shared state (_shared) is well-designed, and the helper methods are intuitive.
102-132: LGTM! Embeddings fixture with appropriate error handling.The async helper function correctly checks for existing embeddings before attempting generation. The broad exception handling is well-justified by the comment explaining that tests should continue and record failures rather than crash the session in diverse environments.
135-175: LGTM! Hooks properly collect results and generate summary.The pytest hooks are correctly implemented:
pytest_runtest_makereportuses the hookwrapper pattern and properly extracts test outcomes into the results dict.pytest_sessionfinishgenerates a clear, user-friendly summary with visual indicators.Both hooks properly handle the unused parameters with explicit
delstatements as noted in previous reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
tldw_Server_API/app/api/v1/endpoints/media.py (5)
716-716: Blocking SyntaxError: stray top-level ‘return’.There’s a bare “return None” at module scope. This will fail import.
Apply this diff to remove it:
- return None # Cache miss
178-188: Duplicate GET “/” routes cause ambiguity.Both list_media_endpoint (“” and “/”) and list_all_media (“/”) register the same path/method. This leads to ambiguous routing and confusing OpenAPI.
Options:
- Keep list_all_media as canonical; delete/rename the earlier two decorators or move them to a distinct path (e.g., “/legacy-list”), or set include_in_schema=False and change their path to avoid collisions.
- Alternatively, have list_media_endpoint call list_all_media and expose only one GET “/”.
Also applies to: 2238-2248
4949-4960: Wrong field name for embeddings chunk overlap.Uses form_data.overlap which doesn’t exist; should be form_data.chunk_overlap.
Apply:
- chunk_overlap=form_data.overlap or 200 + chunk_overlap=form_data.chunk_overlap or 200
8696-8703: Annotation bug: Path not imported; will raise at import time.Function annotates target_dir: Path but only FilePath is imported; elsewhere the file also uses Path in annotations.
Prefer aligning with the alias here:
- target_dir: Path, + target_dir: FilePath,Additionally, add a module import to cover other annotations using Path (outside this hunk):
+from pathlib import Path # for annotations elsewhere that reference PathOr add “from future import annotations” at the top to postpone evaluation.
8708-8736: Add SSRF guard inside downloader.Centralize URL safety by validating here; callers already rely on per-item failures for 207 behavior.
Apply:
if allowed_extensions is None: allowed_extensions = set() # Default to empty set if None - + # SSRF guard (enforces per-item failure rather than aborting a batch) + assert_url_safe(url) # Generate a safe filename (defer final naming until after we see headers) test_mode_active = _is_test_mode() or bool(os.getenv("PYTEST_CURRENT_TEST"))This also addresses the code scanning SSRF warnings.
♻️ Duplicate comments (2)
tldw_Server_API/app/api/v1/endpoints/media.py (2)
1772-1793: Same swallowed-exception pattern in advanced upsert path.Mirror the narrowed/logged handling to avoid hiding real errors.
Apply:
# Keep identifier index updated for metadata-search using backend-specific upsert try: _doi = merged_sm.get('doi') if isinstance(merged_sm, dict) else None _pmid = merged_sm.get('pmid') if isinstance(merged_sm, dict) else None _pmcid = merged_sm.get('pmcid') if isinstance(merged_sm, dict) else None _arxiv = (merged_sm.get('arxiv_id') if isinstance(merged_sm, dict) else None) _s2id = (merged_sm.get('s2_paper_id') if isinstance(merged_sm, dict) else None) if db.backend_type == BackendType.POSTGRESQL: ident_sql = ( "INSERT INTO DocumentVersionIdentifiers (dv_id, doi, pmid, pmcid, arxiv_id, s2_paper_id) " "VALUES (?, ?, ?, ?, ?, ?) " "ON CONFLICT (dv_id) DO UPDATE SET " "doi = EXCLUDED.doi, pmid = EXCLUDED.pmid, pmcid = EXCLUDED.pmcid, " "arxiv_id = EXCLUDED.arxiv_id, s2_paper_id = EXCLUDED.s2_paper_id" ) else: ident_sql = ( "INSERT OR REPLACE INTO DocumentVersionIdentifiers (dv_id, doi, pmid, pmcid, arxiv_id, s2_paper_id) " "VALUES (?, ?, ?, ?, ?, ?)" ) db.execute_query(ident_sql, (dv_id, _doi, _pmid, _pmcid, _arxiv, _s2id), connection=conn) - except Exception: - pass + except (sqlite3.OperationalError, DatabaseError) as e: + logger.debug("Identifier index update skipped (missing table/unsupported upsert): %s", e) + except Exception as e: + logger.error("Identifier index update failed for dv_id=%s: %s", dv_id, e, exc_info=True) + raiseAs per static analysis hints (BLE001/S110).
1548-1587: Don’t swallow exceptions when syncing identifier index.The bare “except Exception: pass” hides real failures and hurts diagnosability. Align with the narrowed handling you used in patch_metadata.
Apply:
# Sync identifier index for this version using backend-specific upsert try: _doi = new_meta.get('doi') or new_meta.get('DOI') _pmid = new_meta.get('pmid') or new_meta.get('PMID') _pmcid = new_meta.get('pmcid') or new_meta.get('PMCID') _arxiv = new_meta.get('arxiv_id') or new_meta.get('arxiv') or new_meta.get('ArXiv') _s2id = new_meta.get('s2_paper_id') or new_meta.get('paperId') if db.backend_type == BackendType.POSTGRESQL: ident_sql = ( "INSERT INTO DocumentVersionIdentifiers (dv_id, doi, pmid, pmcid, arxiv_id, s2_paper_id) " "VALUES (?, ?, ?, ?, ?, ?) " "ON CONFLICT (dv_id) DO UPDATE SET " "doi = EXCLUDED.doi, pmid = EXCLUDED.pmid, pmcid = EXCLUDED.pmcid, " "arxiv_id = EXCLUDED.arxiv_id, s2_paper_id = EXCLUDED.s2_paper_id" ) else: ident_sql = ( "INSERT OR REPLACE INTO DocumentVersionIdentifiers (dv_id, doi, pmid, pmcid, arxiv_id, s2_paper_id) " "VALUES (?, ?, ?, ?, ?, ?)" ) db.execute_query(ident_sql, (dv_id, _doi, _pmid, _pmcid, _arxiv, _s2id), connection=conn) - except Exception: - pass + except (sqlite3.OperationalError, DatabaseError) as e: + logger.debug("Identifier index update skipped (missing table/unsupported upsert): %s", e) + except Exception as e: + logger.error("Identifier index update failed for dv_id=%s: %s", dv_id, e, exc_info=True) + raiseAs per static analysis hints (BLE001/S110).
🧹 Nitpick comments (6)
Docs/RAG/RAG_Notes.md (1)
18-23: Consider wrapping bare URLs in angle brackets for Markdown compliance.The six new URLs follow the existing document style but violate Markdown linting rules (MD034). For better compliance and maintainability, wrap URLs in angle brackets:
<url>or format as Markdown links[description](url).This is a nice-to-have improvement since the additions maintain consistency with the rest of the file, but modernizing the entire reference section to follow Markdown standards would be beneficial for long-term maintainability.
Example fix for lines 18-23:
-https://docs.kiln.tech/docs/evaluations/evaluate-rag-accuracy-q-and-a-evals -https://arxiv.org/html/2507.06554v2#S5 -https://arxiv.org/abs/2507.07426 -https://arxiv.org/html/2507.04055v2#S4 -https://arxiv.org/html/2511.07328v1 -https://jxnl.co/writing/2025/09/11/the-rag-mistakes-that-are-killing-your-ai-skylar-payne/ +<https://docs.kiln.tech/docs/evaluations/evaluate-rag-accuracy-q-and-a-evals> +<https://arxiv.org/html/2507.06554v2#S5> +<https://arxiv.org/abs/2507.07426> +<https://arxiv.org/html/2507.04055v2#S4> +<https://arxiv.org/html/2511.07328v1> +<https://jxnl.co/writing/2025/09/11/the-rag-mistakes-that-are-killing-your-ai-skylar-payne/>Docs/Code_Documentation/Guides/Chunking_Code_Guide.md (2)
50-54: Optional style improvements for consistency and readability. Three API items in the list begin with the method name prefix, and "very large" is used multiple times across the document. Consider:
- Reduce repetition of "very" in lines 53, 79, 98, and 195 by using alternatives like "large", "substantial", or "extensive" depending on context.
- Consider minor rewording of lines 50–52 to reduce the parallel structure of beginning each item with a method name (e.g., consolidate related hierarchical methods or use varied introductions).
Apply this diff to address style concerns:
- chunk_text_hierarchical_tree(...), flatten_hierarchical(...) → section/block tree and flattening with ancestry. - chunk_text_hierarchical_flat(...) → convenience wrapper returning flat {text, metadata}. - chunk_file_stream(file_path, method=None, max_size=None, overlap=None, language=None, buffer_size=8192, encoding='utf-8', **options) → Generator[str] - - Streaming for very large files; see "Streaming" below. +- Hierarchical methods: chunk_text_hierarchical_tree/flatten_hierarchical (section/block tree and flattening with ancestry), chunk_text_hierarchical_flat (convenience wrapper returning flat {text, metadata}). +- chunk_file_stream(file_path, method=None, max_size=None, overlap=None, language=None, buffer_size=8192, encoding='utf-8', **options) → Generator[str] + - Streaming for large files; see "Streaming" below.Additional "very large" occurrences to simplify:
- - Prefer `chunk_file_stream` for very large files where you don't want to load the entire file; it reads buffers from disk and carries overlap forward between buffers. + - Prefer `chunk_file_stream` for large files where you don't want to load the entire file; it reads buffers from disk and carries overlap forward between buffers. - Streaming a very large file + Streaming large files - - Batching embeddings downstream and avoiding very large overlaps. + - Batching embeddings downstream and avoiding large overlaps.
142-142: Capitalize "Markdown" as a proper noun. Line 142 references "markdown tables"; this should be "Markdown tables" to follow standard capitalization for the language/format name.- - chunk_text_hierarchical_tree builds a light tree with sections and blocks. Markdown‑like features are recognized: ATX headers (`#`), hrules, lists, code fences, markdown tables, blank lines, plus optional custom "boundary" patterns from hierarchical_template. + - chunk_text_hierarchical_tree builds a light tree with sections and blocks. Markdown‑like features are recognized: ATX headers (`#`), hrules, lists, code fences, Markdown tables, blank lines, plus optional custom "boundary" patterns from hierarchical_template.tldw_Server_API/app/api/v1/API_Deps/auth_deps.py (1)
836-853: Consider extracting the duplicated single-user bypass logic.The single-user API key bypass logic is nearly identical in both
check_rate_limitandcheck_auth_rate_limit. Extracting this into a small helper function would improve maintainability and reduce the risk of drift between the two implementations.Example refactor:
def _should_bypass_rate_limit_for_single_user(request: Request) -> bool: """Check if request should bypass rate limiting in single-user mode.""" try: if not is_single_user_mode(): return False settings = get_settings() api_key_hdr = request.headers.get("X-API-KEY") if getattr(request, "headers", None) else None auth_hdr = request.headers.get("Authorization") if getattr(request, "headers", None) else None bearer_tok = auth_hdr.split(" ", 1)[1] if isinstance(auth_hdr, str) and auth_hdr.startswith("Bearer ") else None import os as _os env_key = _os.getenv("SINGLE_USER_API_KEY") configured_key = settings.SINGLE_USER_API_KEY or env_key return configured_key and (api_key_hdr == configured_key or bearer_tok == configured_key) except Exception: return FalseThen use it in both functions:
async def check_rate_limit(request: Request, rate_limiter: RateLimiter = Depends(get_rate_limiter_dep)): if _is_test_mode(): return - try: - from tldw_Server_API.app.core.AuthNZ.settings import is_single_user_mode, get_settings as _get_settings - if is_single_user_mode(): - settings = _get_settings() - api_key_hdr = request.headers.get("X-API-KEY") if getattr(request, "headers", None) else None - auth_hdr = request.headers.get("Authorization") if getattr(request, "headers", None) else None - bearer_tok = auth_hdr.split(" ", 1)[1] if isinstance(auth_hdr, str) and auth_hdr.startswith("Bearer ") else None - import os as _os - env_key = _os.getenv("SINGLE_USER_API_KEY") - configured_key = settings.SINGLE_USER_API_KEY or env_key - if configured_key and (api_key_hdr == configured_key or bearer_tok == configured_key): - return - except Exception: - pass + if _should_bypass_rate_limit_for_single_user(request): + returnAlso applies to: 897-911
tldw_Server_API/app/core/AuthNZ/database.py (1)
95-118: Consider logging when the warning itself fails.The defensive fallback logic is sound for handling misconfigured test environments. However, the
try-except-passblock at lines 115-116 silently swallows any exception that occurs during logging, which could hide issues if the logger itself is misconfigured or if string formatting fails.Apply this diff to at least log to stderr when the warning fails:
try: logger.warning( "Single-user mode: ignoring non-sqlite DATABASE_URL '%s'; using sqlite:///./Databases/users.db", _raw_url, ) - except Exception: - pass + except Exception as e: + import sys + print(f"Warning: Failed to log DATABASE_URL fallback: {e}", file=sys.stderr)tldw_Server_API/app/api/v1/endpoints/media.py (1)
2480-2486: ETag comparison may need quotes handling.If-None-Match often carries quotes (e.g., W/"..."). Consider normalizing before equality check to avoid false 200s.
Example:
- if if_none_match == current_etag: + inm = (if_none_match or "").strip().strip('W/').strip('"') + if inm == current_etag: return Response(status_code=status.HTTP_304_NOT_MODIFIED)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
Docs/Code_Documentation/Guides/Chunking_Code_Guide.md(1 hunks)Docs/Code_Documentation/Guides/Evaluations_Code_Guide.md(1 hunks)Docs/Design/RAG-Upgrades-PRD.md(1 hunks)Docs/Design/Security2.md(1 hunks)Docs/RAG/RAG_Notes.md(1 hunks)tldw_Server_API/app/api/v1/API_Deps/auth_deps.py(3 hunks)tldw_Server_API/app/api/v1/endpoints/media.py(17 hunks)tldw_Server_API/app/core/AuthNZ/database.py(2 hunks)tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py(4 hunks)
✅ Files skipped from review due to trivial changes (1)
- Docs/Code_Documentation/Guides/Evaluations_Code_Guide.md
🧰 Additional context used
🧬 Code graph analysis (3)
tldw_Server_API/app/core/AuthNZ/database.py (2)
tldw_Server_API/tests/DB_Management/test_media_db_fallback_warning.py (2)
warning(15-19)warning(72-73)tldw_Server_API/app/core/AuthNZ/settings.py (1)
get_settings(844-883)
tldw_Server_API/app/api/v1/endpoints/media.py (5)
tldw_Server_API/app/core/DB_Management/backends/base.py (5)
BackendType(27-31)transaction(348-358)connection(283-285)backend_type(330-332)DatabaseError(17-19)tldw_Server_API/app/core/DB_Management/Media_DB_v2.py (4)
get(155-156)transaction(1313-1399)execute_query(1054-1158)DatabaseError(100-102)tldw_Server_API/app/core/Utils/metadata_utils.py (1)
normalize_safe_metadata(18-82)tldw_Server_API/app/core/DB_Management/backends/postgresql_backend.py (3)
transaction(657-696)connection(202-207)backend_type(237-239)tldw_Server_API/app/core/DB_Management/backends/sqlite_backend.py (3)
transaction(237-262)connection(141-148)backend_type(180-182)
tldw_Server_API/app/api/v1/API_Deps/auth_deps.py (2)
tldw_Server_API/app/core/testing.py (1)
is_test_mode(23-33)tldw_Server_API/app/core/AuthNZ/settings.py (2)
is_single_user_mode(911-913)get_settings(844-883)
🪛 LanguageTool
Docs/Code_Documentation/Guides/Chunking_Code_Guide.md
[style] ~51-~51: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ck tree and flattening with ancestry. - chunk_text_hierarchical_flat(...) → convenien...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~52-~52: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...pper returning flat {text, metadata}. - chunk_file_stream(file_path, method=None, max...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~53-~53: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ons) → Generator[str] - Streaming for very large files; see “Streaming” below. - Note:...
(EN_WEAK_ADJECTIVE)
[style] ~79-~79: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...shion. - Prefer chunk_file_stream for very large files where you don’t want to load the ...
(EN_WEAK_ADJECTIVE)
[style] ~98-~98: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...lags": "im"}]}, }, ) Streaming a very large filepython ck = Chunker() for ch in...
(EN_WEAK_ADJECTIVE)
[uncategorized] ~142-~142: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...ders (#), hrules, lists, code fences, markdown tables, blank lines, plus optional cust...
(MARKDOWN_NNP)
[style] ~195-~195: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...hing embeddings downstream and avoiding very large overlaps. - Limiting regex rules in t...
(EN_WEAK_ADJECTIVE)
🪛 markdownlint-cli2 (0.18.1)
Docs/Design/Security2.md
28-28: Bare URL used
(MD034, no-bare-urls)
58-58: Bare URL used
(MD034, no-bare-urls)
Docs/RAG/RAG_Notes.md
18-18: Bare URL used
(MD034, no-bare-urls)
19-19: Bare URL used
(MD034, no-bare-urls)
20-20: Bare URL used
(MD034, no-bare-urls)
21-21: Bare URL used
(MD034, no-bare-urls)
22-22: Bare URL used
(MD034, no-bare-urls)
23-23: Bare URL used
(MD034, no-bare-urls)
Docs/Design/RAG-Upgrades-PRD.md
65-65: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
67-67: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
69-69: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
73-73: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
75-75: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
79-79: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
🪛 Ruff (0.14.4)
tldw_Server_API/app/core/AuthNZ/database.py
102-102: Do not catch blind exception: Exception
(BLE001)
115-116: try-except-pass detected, consider logging the exception
(S110)
115-115: Do not catch blind exception: Exception
(BLE001)
160-160: Do not catch blind exception: Exception
(BLE001)
176-177: try-except-pass detected, consider logging the exception
(S110)
176-176: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/api/v1/endpoints/media.py
1443-1443: Do not catch blind exception: Exception
(BLE001)
1552-1552: Do not catch blind exception: Exception
(BLE001)
1587-1588: try-except-pass detected, consider logging the exception
(S110)
1587-1587: Do not catch blind exception: Exception
(BLE001)
1757-1757: Do not catch blind exception: Exception
(BLE001)
1792-1793: try-except-pass detected, consider logging the exception
(S110)
1792-1792: Do not catch blind exception: Exception
(BLE001)
8739-8740: Abstract raise to an inner function
(TRY301)
8739-8740: Avoid specifying long messages outside the exception class
(TRY003)
8816-8816: Do not catch blind exception: Exception
(BLE001)
8834-8835: try-except-pass detected, consider logging the exception
(S110)
8834-8834: Do not catch blind exception: Exception
(BLE001)
8839-8839: Avoid specifying long messages outside the exception class
(TRY003)
8842-8842: Avoid specifying long messages outside the exception class
(TRY003)
8845-8845: Do not catch blind exception: Exception
(BLE001)
8848-8848: Do not catch blind exception: Exception
(BLE001)
8849-8849: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
8849-8849: Avoid specifying long messages outside the exception class
(TRY003)
8851-8851: Avoid specifying long messages outside the exception class
(TRY003)
8854-8854: Avoid specifying long messages outside the exception class
(TRY003)
8862-8863: try-except-pass detected, consider logging the exception
(S110)
8862-8862: Do not catch blind exception: Exception
(BLE001)
8879-8880: Abstract raise to an inner function
(TRY301)
8879-8880: Avoid specifying long messages outside the exception class
(TRY003)
tldw_Server_API/app/api/v1/API_Deps/auth_deps.py
851-851: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py
844-844: Unused function argument: session_ref
(ARG001)
849-850: try-except-pass detected, consider logging the exception
(S110)
849-849: Do not catch blind exception: Exception
(BLE001)
857-857: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (10)
Docs/Code_Documentation/Guides/Chunking_Code_Guide.md (2)
41-61: Past review comment successfully addressed. Theencodingparameter is now properly documented in thechunk_file_streamsignature (line 52) with a helpful note explaining its purpose (line 54). This resolves the previous concern about API signature completeness and prevents developer confusion about available options.
1-257: Excellent comprehensive documentation guide. The guide is well-organized, covers all major aspects of the Chunking module (scope, APIs, usage patterns, methods, streaming, options, error handling, integrations, extensions, and testing), includes practical code examples, and provides clear guidance for developers at different experience levels. The cross-references to related documentation and the minimal endpoint wiring example are particularly helpful for onboarding.Docs/Design/Security2.md (1)
1-58: Verify alignment with PR objectives.This documentation file (security references) appears to be outside the scope of the stated PR objectives, which focus on E2E testing workflows. Please confirm whether this change is intentional or was included by accident. If it is intentional, consider updating the PR description to reflect the broader scope of documentation updates.
tldw_Server_API/app/api/v1/API_Deps/auth_deps.py (2)
36-36: Good: Centralized test-mode detection now in use.The import of
is_test_modefromcore.testingaddresses the previous review feedback to consolidate test-mode detection logic and avoid duplication.
832-834: Test-mode bypass correctly applied.The test-mode bypass using
_is_test_mode()is clean and ensures deterministic behavior during E2E tests. The early return pattern is clear and consistent across both rate-limiting functions.Also applies to: 893-895
tldw_Server_API/app/core/AuthNZ/database.py (1)
95-180: The defensive fallback approach is sound for E2E test environments.The logic to detect and handle misconfigured DATABASE_URL values in single_user mode is appropriate for preventing test failures when environment variables leak across test contexts. The fallback to a known-good SQLite path ensures test reliability.
tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py (2)
15-15: LGTM: weakref import added for finalizer support.The import is appropriate for implementing cleanup via
weakref.finalize.
828-833: LGTM: Early attribute initialization improves cleanup safety.Initializing critical attributes before potentially failing operations ensures that cleanup code (finalizers,
__del__, orunload_model) can safely access these attributes without riskingAttributeError.Docs/Design/RAG-Upgrades-PRD.md (1)
51-51: ✓ Hyphenation fix is correct.The compound modifier "auto-adjust" on line 51 is properly hyphenated. No changes needed.
tldw_Server_API/app/api/v1/endpoints/media.py (1)
116-118: Dynamic search rate limit wiring looks good.Constant is defined at module scope and used safely in the decorator; fixes prior decorator ordering issue.
| 10. Numeric grounding boost | ||
| - Unit‑normalization and token presence checks to lightly boost spans with matching normalized numerics. | ||
| 11. Temporal heuristics, clearer knobs | ||
| - Promote `auto_temporal_filters`; expose range in metadata and make behavior explicit. | ||
| 12. Corpus‑learned synonyms | ||
| - Batch miner to auto‑update `synonyms_registry` from co‑occurrence/PMI and headings; versioned per corpus. | ||
|
|
||
| ### Phase 3 — Ranking/Fusion Learning & Multi‑hop | ||
| 13. Learned fusion + abstention calibration | ||
| - Lightweight logistic calibrator on features (bm25 norm, vec sim, recency, CE score, MMR pos, source quality) to yield fused score and abstention threshold; train from feedback/eval logs. | ||
| 14. Guided query decomposition orchestration | ||
| - First‑class sub‑query workflow for “why/how/compare/timeline”; retrieve per sub‑goal, then synthesize + verify. | ||
|
|
||
| ### Phase 4 — Graph‑Augmented Retrieval (Optional) | ||
| 15. Graph neighborhoods retrieval | ||
| - Build per‑corpus lightweight entity/section graph; retrieve by communities and blend with text retrieval for multi‑hop questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix list indentation inconsistency in Phase 2–4 sections.
Bullets under numbered items 10–15 are indented with only 2 spaces, while bullets under items 1–9 use 3 spaces. Markdownlint is flagging this inconsistency (MD007 errors on lines 65, 67, 69, 73, 75, 79). Adjust bullet indentation to align with the 3-space standard for numbered lists.
Apply this diff to fix the indentation:
-10. Numeric grounding boost
- - Unit‑normalization and token presence checks to lightly boost spans with matching normalized numerics.
+10. Numeric grounding boost
+ - Unit‑normalization and token presence checks to lightly boost spans with matching normalized numerics.
-11. Temporal heuristics, clearer knobs
- - Promote `auto_temporal_filters`; expose range in metadata and make behavior explicit.
+11. Temporal heuristics, clearer knobs
+ - Promote `auto_temporal_filters`; expose range in metadata and make behavior explicit.
-12. Corpus‑learned synonyms
- - Batch miner to auto‑update `synonyms_registry` from co‑occurrence/PMI and headings; versioned per corpus.
+12. Corpus‑learned synonyms
+ - Batch miner to auto‑update `synonyms_registry` from co‑occurrence/PMI and headings; versioned per corpus.
-13. Learned fusion + abstention calibration
- - Lightweight logistic calibrator on features (bm25 norm, vec sim, recency, CE score, MMR pos, source quality) to yield fused score and abstention threshold; train from feedback/eval logs.
+13. Learned fusion + abstention calibration
+ - Lightweight logistic calibrator on features (bm25 norm, vec sim, recency, CE score, MMR pos, source quality) to yield fused score and abstention threshold; train from feedback/eval logs.
-14. Guided query decomposition orchestration
- - First‑class sub‑query workflow for "why/how/compare/timeline"; retrieve per sub‑goal, then synthesize + verify.
+14. Guided query decomposition orchestration
+ - First‑class sub‑query workflow for "why/how/compare/timeline"; retrieve per sub‑goal, then synthesize + verify.
-15. Graph neighborhoods retrieval
- - Build per‑corpus lightweight entity/section graph; retrieve by communities and blend with text retrieval for multi‑hop questions.
+15. Graph neighborhoods retrieval
+ - Build per‑corpus lightweight entity/section graph; retrieve by communities and blend with text retrieval for multi‑hop questions.🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
65-65: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
67-67: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
69-69: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
73-73: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
75-75: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
79-79: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
🤖 Prompt for AI Agents
In Docs/Design/RAG-Upgrades-PRD.md around lines 64 to 79, the bullet items under
numbered entries 10–15 use 2-space indentation which is inconsistent with the
3-space indentation used for items 1–9 and triggers markdownlint MD007; update
each sub-bullet under lines 64–79 to use 3 spaces before the hyphen (aligning
with the existing 3-space standard for numbered list sub-items) so all list
indentation is consistent.
| https://arxiv.org/abs/2506.18032 | ||
| https://arxiv.org/abs/2504.07717v3 | ||
| https://arxiv.org/abs/2507.05093v1 | ||
| https://arxiv.org/abs/2402.03299 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Fix bare URL formatting violations (markdownlint MD034).
Lines 28 and 58 contain bare URLs that violate Markdown best practices. While the existing file already uses bare URLs (lines 3–27), static analysis flags this inconsistency. Consider wrapping these URLs in proper Markdown link syntax: [description](url) or reformatting the entire file for consistency.
Alternatively, if bare URLs are intentional for this reference list, add a .markdownlintignore rule or disable MD034 for this file.
Also applies to: 58-58
---
🏁 Script executed:
```shell
#!/bin/bash
# Inspect the security documentation file to verify bare URL issues
echo "=== Full file content ==="
cat Docs/Design/Security2.md
echo -e "\n=== Line count ==="
wc -l Docs/Design/Security2.md
echo -e "\n=== Lines 20-30 (around line 28) ==="
sed -n '20,30p' Docs/Design/Security2.md | cat -n
echo -e "\n=== Lines 50-65 (around line 58) ==="
sed -n '50,65p' Docs/Design/Security2.md | cat -n
echo -e "\n=== Check for markdownlint config ==="
find . -maxdepth 2 -type f \( -name ".markdownlintignore" -o -name ".markdownlint.json" -o -name ".markdownlint.yaml" -o -name ".markdownlint.yml" \) 2>/dev/null | head -20
Length of output: 2139
Fix bare URL formatting and remove excessive blank lines.
Line 28 contains a bare URL (https://arxiv.org/abs/2402.03299), and line 57 contains another (https://arxiv.org/html/2507.04055v2#S4). Both violate markdownlint MD034. While the entire file uses bare URLs consistently, the unusual 28-line gap (lines 29–56) between these two additions suggests a formatting error.
Recommended actions:
- Remove the blank lines (29–56) between lines 28 and 57, or clarify their intent
- Wrap URLs in markdown link syntax (
[description](url)) for consistency and compliance, or - Add a
.markdownlintignorerule to disable MD034 for this file if bare URLs are intentional
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
28-28: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In Docs/Design/Security2.md around lines 28 and 57, there is a bare URL at line
28 (https://arxiv.org/abs/2402.03299) and another at line 57
(https://arxiv.org/html/2507.04055v2#S4) and an unexpected large blank region
(lines 29–56); remove the unintended blank lines (29–56) or document their
intent, and replace each bare URL with markdown link syntax (e.g., [paper title
or short description](URL)) for consistency and to satisfy markdownlint MD034,
or if bare URLs are intentional add a .markdownlintignore rule to disable MD034
for this file.
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Clarify intent of extensive blank lines.
Lines 29–57 contain only blank lines between the two new URL additions (line 28 and line 58). This is unusual and may indicate a formatting error. Please confirm whether these blank lines are intentional or if they should be removed.
---
🏁 Script executed:
```shell
#!/bin/bash
# Display the file content around the disputed lines to verify the claim
cat -n "Docs/Design/Security2.md" | sed -n '25,60p'
Length of output: 441
Remove the 29 blank lines between URL entries.
Lines 29–57 in Docs/Design/Security2.md contain 29 consecutive blank lines between the URL on line 28 and the URL on line 58. This is excessive whitespace and should be removed or consolidated to a single blank line for proper markdown formatting.
🤖 Prompt for AI Agents
In Docs/Design/Security2.md around lines 29 to 57 there are 29 consecutive blank
lines between the URL on line 28 and the URL on line 58; remove the excess
whitespace so only a single blank line (or none if consistent with surrounding
formatting) separates the two URL entries, ensuring proper markdown spacing and
preserving the URL lines and their order.
tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
Docs/Design/Search.md (1)
16-16: Wrap the bare URL in markdown syntax or angle brackets.The markdown linter flagged the bare URL on line 16 (MD034). While the entire "Link Dump" section uses bare URLs, consider wrapping URLs in markdown link syntax (e.g.,
[title](url)) or angle brackets (<url>) for consistency with markdown best practices.-https://mburaksayici.com/blog/2025/10/12/information-retrieval-1.html +[Information Retrieval](https://mburaksayici.com/blog/2025/10/12/information-retrieval-1.html)Alternatively, if bare URLs are preferred in this section, consider suppressing the MD034 rule for the file or updating the entire "Link Dump" section to use markdown syntax.
Docs/Prompts/Programming/Simplification_Cascade.md (3)
6-6: Add title and/or description to the frontmatter link for accessibility.The Source link references an external skill but provides minimal context. Consider adding a brief description of what's at that link or why it's foundational to this prompt.
56-62: Process section is clear but could benefit from a worked example.The 5-step process is concise and actionable, but walking through one of the three Cascade examples (Stream Abstraction, Resource Governance, or Immutability) using this exact process would reinforce the methodology and make it concrete for practitioners.
64-70: Red Flags section is valuable; consider adding recovery tactics.These red flags are excellent diagnostic indicators, but the section would benefit from brief recovery guidance—e.g., "If you notice 'don't touch that, it's complicated,' try extracting the invariant that makes the complexity necessary." This bridges from problem recognition to action.
Docs/Design/UX.md (1)
82-82: Consider wrapping bare URLs in angle brackets for proper markdown rendering.Lines 82 and 86 contain bare URLs flagged by markdownlint (MD034). While these follow the established pattern in the Link Dump section, bare URLs are not rendered as clickable hyperlinks in standard markdown. Wrapping them in angle brackets (
<URL>) will ensure they render correctly across markdown renderers.This is a minor formatting issue and doesn't affect functionality. The entire Link Dump section (lines 51–87) could benefit from this improvement for consistency and proper rendering.
Apply this diff to improve markdown rendering:
- https://savolai.net/ux/product-and-framework-thinkers-when-developers-and-uxers-dont-get-each-other/ + <https://savolai.net/ux/product-and-framework-thinkers-when-developers-and-uxers-dont-get-each-other/> - https://savolai.net/ux/the-why-and-the-how-usability-testing/ + <https://savolai.net/ux/the-why-and-the-how-usability-testing/>Also applies to: 86-86
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
.github/workflows/ci.yml(2 hunks)Docs/Code_Documentation/Evaluations/index.md(1 hunks)Docs/Code_Documentation/Guides/Evaluations_Code_Guide.md(1 hunks)Docs/Design/Education.md(1 hunks)Docs/Design/Search.md(1 hunks)Docs/Design/UX.md(1 hunks)Docs/Prompts/Programming/Simplification_Cascade.md(1 hunks)Docs/Published/API-related/Chunking_API_Documentation.md(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- Docs/Design/Education.md
- Docs/Code_Documentation/Evaluations/index.md
🚧 Files skipped from review as they are similar to previous changes (1)
- Docs/Code_Documentation/Guides/Evaluations_Code_Guide.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
Docs/Design/UX.md
82-82: Bare URL used
(MD034, no-bare-urls)
86-86: Bare URL used
(MD034, no-bare-urls)
Docs/Published/API-related/Chunking_API_Documentation.md
9-9: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
26-26: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
45-45: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
75-75: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
147-147: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
160-160: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
177-177: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
200-200: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Docs/Design/Search.md
16-16: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (4)
Docs/Prompts/Programming/Simplification_Cascade.md (2)
1-8: Verify scope alignment with PR objectives.This documentation file about simplification cascades appears disconnected from the stated PR objectives, which focus on "E2E Tests Fixing—Ensuring User workflows work as intended." While this might be contextual guidance for refactoring work underlying the E2E tests, please clarify:
- Is this documentation directly supporting the E2E test improvements in this PR?
- If this is a general documentation addition, should it be in a separate PR to keep E2E test fixes focused?
1-77: Documentation is well-structured with clear, actionable guidance.The document provides a thoughtful framework for recognizing and applying simplification cascades. The structure progresses logically from concept to practice, the Quick Reference table enables rapid pattern matching, and the three concrete examples (Stream Abstraction, Resource Governance, Immutability) effectively illustrate the principle. The Red Flags section is particularly valuable for helping developers identify when they might be missing a cascade opportunity.
Docs/Published/API-related/Chunking_API_Documentation.md (1)
1-240: Well-structured and comprehensive API documentation.The documentation covers all three endpoints with clear descriptions, authentication modes, request/response schemas, OpenAPI definitions, and practical examples. The progression from overview to endpoints to tips is logical and reader-friendly.
.github/workflows/ci.yml (1)
250-297: Verify multi-user smoke test scope and DATABASE_URL availability.The new multi-user Postgres-backed smoke test block (lines 250–297) mirrors the single-user structure but runs only a health check, not full E2E tests. Confirm this aligns with the PR objective to verify user workflows aren't broken:
- Scope: The block starts the multi-user server and verifies it's healthy, but does not run actual E2E test cases. Is this intentional, or should it run e2e tests similar to the single-user block?
- DATABASE_URL propagation: Line 258 assumes
DATABASE_URLwas exported in the "Export PG env vars" step (line 152–159). Verify that step always runs before this block, or add explicit dependency/error handling ifDATABASE_URLis unset.If this is smoke-test-only and full E2E tests are run elsewhere, the comment should clarify the intent. If full E2E tests should run here, consider adding test steps after the health check.
| supports_request = hasattr(client, "request") and callable(getattr(client, "request", None)) | ||
| if supports_request: | ||
| while True: | ||
| r = await client.request("GET", cur_url, follow_redirects=False, timeout=30.0) |
Check failure
Code scanning / CodeQL
Full server-side request forgery Critical
user-provided value
The full URL of this request depends on a
user-provided value
The full URL of this request depends on a
user-provided value
The full URL of this request depends on a
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
To mitigate full SSRF, always restrict which domains can be accessed by user-submitted URLs—never use all user input directly in HTTP requests.
- Maintain a server-side whitelist of permitted hostnames/domains for code download requests.
- Before making an outbound request, parse and validate the URL to ensure its netloc/hostname is in the allowed set.
- Reject requests for any URL not in this allowlist, returning an error to the user.
- To implement:
- At the start of
_download_url_async, parse the providedurlstring and verify its hostname is within a definedCODE_ALLOWED_HOSTSset. - If not allowed, raise an exception (or return a structured error) and do not perform the HTTP fetch.
- Define
CODE_ALLOWED_HOSTSnear the function, or at module level. - Add relevant imports if needed (
urlparsefromurllib.parse).
- At the start of
- The changes should be restricted only to the shown function (
_download_url_async) and related lines.
-
Copy modified line R29 -
Copy modified lines R8706-R8711 -
Copy modified lines R8721-R8725
| @@ -26,7 +26,7 @@ | ||
| from datetime import datetime | ||
| from pathlib import Path as FilePath | ||
| from typing import Any, Dict, List, Optional, Tuple, Callable, Literal, Union, Set | ||
| # | ||
| from urllib.parse import urlparse | ||
| # 3rd-party imports | ||
| from fastapi import ( | ||
| APIRouter, | ||
| @@ -8703,6 +8703,12 @@ | ||
| DEFAULT_MAX_REDIRECTS as _DEFAULT_MAX_REDIRECTS, | ||
| ) | ||
|
|
||
| # ALLOWLIST: hosts permitted for code downloads to prevent SSRF | ||
| CODE_ALLOWED_HOSTS: Set[str] = { | ||
| "raw.githubusercontent.com", | ||
| "github.com", | ||
| # Add other trusted code hosts as needed | ||
| } | ||
| async def _download_url_async( | ||
| client: Optional[httpx.AsyncClient], | ||
| url: str, | ||
| @@ -8712,6 +8718,11 @@ | ||
| disallow_content_types: Optional[Set[str]] = None, # Optional set of content-types to reject for inference | ||
| allow_redirects: bool = True, | ||
| ) -> Path: | ||
| # SSRF mitigation: only allow URLs from permitted hosts | ||
| parsed_url = urlparse(url) | ||
| host = (parsed_url.hostname or "").lower() | ||
| if host not in CODE_ALLOWED_HOSTS: | ||
| raise ValueError(f"Download blocked: host '{host}' not permitted") | ||
| """ | ||
| Downloads a URL asynchronously and saves it to the target directory. | ||
| Optionally validates the file extension against a set of allowed extensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tldw_Server_API/app/api/v1/endpoints/media.py (1)
178-188: Blocking: duplicate GET "/" routes collide; pick one pathTwo handlers register GET on ""/"/" (lines 178-188) and again on "/" (lines 2240-2253). FastAPI will register both; order-specific behavior and OpenAPI collisions can break E2E and docs. Keep one canonical route and move the other.
Apply either removal or move legacy list to a distinct path:
-@router.get( - "", +@router.get( + "/legacy-list", tags=["Media Management"], summary="List Media", ) -@router.get( - "/", +@router.get( + "/legacy-list/", tags=["Media Management"], summary="List Media (slash)", )Also applies to: 2239-2248
♻️ Duplicate comments (2)
Docs/Published/API-related/Chunking_API_Documentation.md (1)
9-11: Add language specification to all fenced code blocks for markdown linting compliance.This issue was flagged in the past review; the file still has 11 fenced code blocks missing language identifiers (MD040 violations), which violates markdown linting standards and prevents proper syntax highlighting.
Apply this comprehensive fix to add language specifiers to all code blocks:
## Base URL -``` +```url http://localhost:8000/api/v1/chunking -``` +``` ## Authentication @@ -23,7 +23,7 @@ Chunk raw text and return normalized chunks with metadata. Request body (JSON): -``` +```json { "text_content": "... your text ...", "file_name": "sample.txt", @@ -42,7 +42,7 @@ Notes Example response (truncated): -``` +```json { "chunks": [ { @@ -72,7 +72,7 @@ Example response (truncated): OpenAPI schema (request/response) -``` +```yaml openapi: 3.0.3 info: title: Chunk Text @@ -145,7 +145,7 @@ OpenAPI schema (request/response) Example (template-based): -``` +```json { "text_content": "# Title\n...", "file_name": "paper.md", @@ -158,7 +158,7 @@ Hierarchical splitting examples - Minimal template JSON with custom boundaries (for use with Templates Apply): -``` +```json { "name": "chapters_and_appendices", "description": "Chapters and appendices with headings", @@ -179,7 +179,7 @@ Hierarchical splitting examples Apply via Templates API: -``` +```bash POST /api/v1/chunking/templates/apply { "template_name": "chapters_and_appendices", @@ -195,7 +195,7 @@ Upload a file via multipart form-data and return chunks. Example request: -``` +```bash curl -X POST "http://localhost:8000/api/v1/chunking/chunk_file" \ -H "Authorization: Bearer <JWT>" \ -F file=@/path/to/large.txt \ @@ -208,7 +208,7 @@ Example request: Response shape matches `POST /chunk_text`. OpenAPI schema (multipart request) -``` +```yaml openapi: 3.0.3 info: title: Chunk File @@ -268,7 +268,7 @@ Response shape matches `POST /chunk_text`. Example response (trimmed): -``` +```json { "methods": ["words", "sentences", "paragraphs", "tokens", "semantic", "json", "xml", "ebook_chapters", "rolling_summarize", "structure_aware", "code", "code_ast"], "default_options": { @@ -291,7 +291,7 @@ Example response (trimmed): Minimal OpenAPI schema (response) for stubbing -``` +```yaml openapi: 3.0.3 info: title: Chunking CapabilitiesAlso applies to: 26-37, 45-72, 75-144, 147-153, 160-180, 182-189, 197-205, 210-264, 271-291, 294-329
tldw_Server_API/app/api/v1/endpoints/media.py (1)
1587-1588: Don’t swallow identifier upsert failures; narrow and log (repeat of earlier feedback)
except Exception: passhides real DB faults and breaks diagnosability. Narrow to operational/missing-table, log, and re-raise unexpected errors.- except Exception: - pass + except (sqlite3.OperationalError, DatabaseError) as e: + logger.debug("Identifier index update skipped (missing table/unsupported upsert): %s", e) + except Exception as e: + logger.error("Identifier index update failed for dv_id=%s: %s", dv_id, e, exc_info=True) + raiseMake this change in both blocks.
Also applies to: 1792-1793
🧹 Nitpick comments (3)
tldw_Server_API/app/api/v1/endpoints/media.py (2)
1442-1456: Deduplicate timestamp generation; use DB helper consistently
now_tsformatting is repeated. Prefer a single source of truth to avoid drift and locale issues.Example:
- now_ts = datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z' + now_ts = getattr(db, "_get_current_utc_timestamp_str", lambda: datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z')()Also applies to: 1551-1565, 1756-1770
8833-8863: Downloader: replace silent passes with debug logs; keep exceptions narrowIn
_download_url_async, several broadexcept Exception: passblocks suppress useful diagnostics during redirect probing/close and client cleanup. Replace with debug logs; keep operational exceptions narrow where possible.Example adjustments:
- except Exception: - pass + except Exception as _e: + logger.debug("Probe close suppressed: %s", _e) ... - except Exception: - pass + except Exception as _e: + logger.debug("Client close suppressed: %s", _e)This keeps E2E stability while improving observability.
Also applies to: 8950-8953, 9125-9140
tldw_Server_API/app/core/Resource_Governance/governor_redis.py (1)
1221-1296: Requests-only pre-check short-circuit looks good; consider simplifying denial/backoff condition.The new flow that:
- Short-circuits
reserve()only when the requests category is denied, and- Lets tokens proceed to reservation for partial saturation
is a nice behavioral improvement and aligns with the comments. The per-category backoff + requests-specific
_requests_deny_untilfloor based onretry_afteralso looks consistent with the early-guard logic.One small clarity/future‑proofing nit in the backoff loop:
for cat_name, cat_info in cats_bm.items(): try: if not bool(cat_info.get("allowed") is False): continue ra_b = int(cat_info.get("retry_after") or 0) ...
- This condition is correct today (it executes only when
allowed is False), but it’s harder to read than necessary and subtly different from the earlier denial detection (not bool((info or {}).get("allowed"))inhas_requests_denial).- If
allowedever becomes tri‑state (e.g.,Nonefor “unknown”),has_requests_denialwould treat that as a denial, but this loop would skip backoff for that category.You could make this more explicit and consistent with the intent by rewriting as:
for cat_name, cat_info in cats_bm.items(): try: if cat_info.get("allowed") is not False: continue ra_b = int(cat_info.get("retry_after") or 0) ...Optionally, if you want absolute alignment with
has_requests_denial, you could normalize on a single notion of “denied” (e.g., treat onlyallowed is Falseas denial everywhere).The extra
except Exception: pass/continueblocks around metrics/backoff and idempotency persistence are consistent with the rest of the module’s fail‑open posture; given the design goal to never break callers on instrumentation paths, I’d treat Ruff’s warnings here as low‑priority or something to suppress explicitly rather than change behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
Docs/Published/API-related/Chunking_API_Documentation.md(1 hunks)Docs/Published/User_Guides/Installation-Setup-Guide.md(1 hunks)tldw_Server_API/app/api/v1/endpoints/media.py(17 hunks)tldw_Server_API/app/core/Resource_Governance/governor_redis.py(1 hunks)tldw_Server_API/tests/scripts/server_lifecycle.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- Docs/Published/User_Guides/Installation-Setup-Guide.md
🧰 Additional context used
🧬 Code graph analysis (2)
tldw_Server_API/app/core/Resource_Governance/governor_redis.py (3)
tldw_Server_API/app/core/Resource_Governance/governor.py (2)
_parse_entity(208-213)_get_policy(197-205)tldw_Server_API/app/core/Metrics/metrics_manager.py (1)
increment(1303-1312)tldw_Server_API/app/core/Resource_Governance/metrics_rg.py (1)
_labels(109-130)
tldw_Server_API/app/api/v1/endpoints/media.py (4)
tldw_Server_API/app/core/DB_Management/backends/base.py (5)
BackendType(27-31)transaction(348-358)connection(283-285)backend_type(330-332)DatabaseError(17-19)tldw_Server_API/app/core/DB_Management/Media_DB_v2.py (3)
transaction(1313-1399)execute_query(1054-1158)DatabaseError(100-102)tldw_Server_API/app/core/Utils/metadata_utils.py (1)
normalize_safe_metadata(18-82)tldw_Server_API/app/core/DB_Management/backends/postgresql_backend.py (3)
transaction(657-696)connection(202-207)backend_type(237-239)
🪛 markdownlint-cli2 (0.18.1)
Docs/Published/API-related/Chunking_API_Documentation.md
9-9: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
26-26: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
45-45: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
75-75: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
147-147: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
160-160: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
182-182: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
197-197: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
210-210: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
271-271: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
294-294: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.4)
tldw_Server_API/app/core/Resource_Governance/governor_redis.py
1231-1232: try-except-pass detected, consider logging the exception
(S110)
1231-1231: Do not catch blind exception: Exception
(BLE001)
1245-1246: try-except-pass detected, consider logging the exception
(S110)
1245-1245: Do not catch blind exception: Exception
(BLE001)
1266-1266: Do not catch blind exception: Exception
(BLE001)
1279-1280: try-except-pass detected, consider logging the exception
(S110)
1279-1279: Do not catch blind exception: Exception
(BLE001)
1285-1286: try-except-pass detected, consider logging the exception
(S110)
1285-1285: Do not catch blind exception: Exception
(BLE001)
1287-1288: try-except-continue detected, consider logging the exception
(S112)
1287-1287: Do not catch blind exception: Exception
(BLE001)
1289-1290: try-except-pass detected, consider logging the exception
(S110)
1289-1289: Do not catch blind exception: Exception
(BLE001)
1294-1295: try-except-pass detected, consider logging the exception
(S110)
1294-1294: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/api/v1/endpoints/media.py
1443-1443: Do not catch blind exception: Exception
(BLE001)
1552-1552: Do not catch blind exception: Exception
(BLE001)
1587-1588: try-except-pass detected, consider logging the exception
(S110)
1587-1587: Do not catch blind exception: Exception
(BLE001)
1757-1757: Do not catch blind exception: Exception
(BLE001)
1792-1793: try-except-pass detected, consider logging the exception
(S110)
1792-1792: Do not catch blind exception: Exception
(BLE001)
8736-8737: Abstract raise to an inner function
(TRY301)
8736-8737: Avoid specifying long messages outside the exception class
(TRY003)
8813-8813: Do not catch blind exception: Exception
(BLE001)
8833-8834: try-except-pass detected, consider logging the exception
(S110)
8833-8833: Do not catch blind exception: Exception
(BLE001)
8838-8838: Avoid specifying long messages outside the exception class
(TRY003)
8841-8841: Avoid specifying long messages outside the exception class
(TRY003)
8844-8844: Do not catch blind exception: Exception
(BLE001)
8847-8847: Do not catch blind exception: Exception
(BLE001)
8848-8848: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
8848-8848: Avoid specifying long messages outside the exception class
(TRY003)
8850-8850: Avoid specifying long messages outside the exception class
(TRY003)
8853-8853: Avoid specifying long messages outside the exception class
(TRY003)
8861-8862: try-except-pass detected, consider logging the exception
(S110)
8861-8861: Do not catch blind exception: Exception
(BLE001)
8878-8879: Abstract raise to an inner function
(TRY301)
8878-8879: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
tldw_Server_API/tests/scripts/server_lifecycle.py (2)
148-148: No issues found; original implementation is correct and verified.Verification confirms the aggregate health endpoint (
/api/v1/health) returns 206 when in degraded state (perhealth.pyline 176). The logic at line 148 correctly restricts 206 acceptance to paths ending with/healthto avoid false positives from range requests, while other health endpoints (/healthz,/ready, etc.) remain unrestricted to 200. This semantics is consistently documented and applied throughout the test suite.
40-49: No issues found; endpoints are implemented and ordering is correct.Both
/readyzand/api/v1/readyzendpoints are properly implemented intldw_Server_API/app/api/v1/endpoints/health.py(lines 74–75) and registered inmain.py. The "lightweight first" ordering is sound: liveness checks (/healthz) probe only queue depth, readiness checks (/readyz) add DB connectivity, and aggregate health endpoints (/health) perform comprehensive dependency checks. The 206 status code is correctly used to indicate "degraded" state in aggregate health endpoints and properly validated in the health check logic (line 156).tldw_Server_API/app/api/v1/endpoints/media.py (1)
716-731: Remove unreachablereturn Noneand inline example from validator
_validate_identifier_queryalready returns/raises above. The strayreturn Noneand long inline “How to call” example inside the function body are dead/unreachable and confuse readers.- return None # Cache miss -# --- How to call this function --- -# You would now need to call it from within another async function: -# ... -# # asyncio.run(some_other_async_function())[ suggest_recommended_refactor ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/workflows/ci.yml (1)
132-143:⚠️ Fixture discovery check still suppresses pytest errors with|| trueon line 137.The previous review flagged this as a critical issue (marked addressed in commit 16e1ddc), but the error suppression persists in the current code. The
|| truemasks pytest collection failures (import errors, conftest issues, syntax errors), making it impossible to distinguish a genuinely missing fixture from a broken test infrastructure.Removing
|| truewill ensure pytest collection errors fail the step explicitly:- FIXTURES_OUTPUT="$(pytest --fixtures -q tldw_Server_API/tests || true)" + FIXTURES_OUTPUT="$(pytest --fixtures -q tldw_Server_API/tests)"The subsequent
rg/grepcheck will still catch a missing fixture, but now real pytest issues won't be silently masked.
🧹 Nitpick comments (1)
.github/workflows/ci.yml (1)
206-223: Single-user smoke startup configuration relaxed for faster, less brittle testing.The changes introduce relaxation flags (DISABLE_HEAVY_STARTUP: '0' → DEFER_HEAVY_STARTUP: '1', ULTRA_MINIMAL_APP: '1', PRIVILEGE_METADATA_VALIDATE_ON_STARTUP: '0') to speed up startup. This is a reasonable optimization for smoke tests, but ensure that deferring and skipping validation steps doesn't mask real initialization errors that should be caught in CI.
Consider documenting why each flag is necessary (e.g., is PRIVILEGE_METADATA_VALIDATE_ON_STARTUP truly safe to skip in smoke tests?).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/ci.yml(4 hunks)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
.github/workflows/ci.yml (3)
91-91: PostgreSQL health check user corrected to match POSTGRES_USER.Line 91 now correctly uses
pg_isready -U tldw_userto match the POSTGRES_USER variable set on line 85. This ensures consistency between the health check and the service configuration.
256-310: Multi-user smoke test (Postgres-backed) added with independent port and JWT configuration.New smoke test for multi-user mode (lines 256–310) properly isolates from single-user on a separate port (8001) and includes:
- JWT_SECRET_KEY with sufficient length (>=32 chars as per security best practice)
- DATABASE_URL bound to the Actions Postgres service port
- Consistent startup relaxation flags
- 180s health-check timeout with configurable override via env var
Structure and configuration look sound. Health-check timeout aligns with single-user (default 180s vs single-user's implicit default), and lifecycle steps (start/health-check/log/stop) are consistent.
152-160: DATABASE_URL and TEST_DATABASE_URL exported for Postgres tests.Lines 158–159 export both DATABASE_URL and TEST_DATABASE_URL pointing to the Postgres service. Verify that all multi-user tests and fixtures correctly read from DATABASE_URL and that the TEST_DATABASE_URL export doesn't introduce confusion between single-user and multi-user modes downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Docs/Published/API-related/Chunking_Templates_API_Documentation.md (1)
155-210: Resolve duplicate heading violation.The document contains two "#### Response" headings under the Create Template section (lines 155 and 191), which violates the markdown linting rule MD024 (no-duplicate-heading). The second occurrence at line 191 should be renamed to differentiate it from the first response block.
Suggested fix:
-#### Response +#### Response (hierarchical example)Alternatively, reorganize the hierarchical example as a separate subsection with its own heading structure.
♻️ Duplicate comments (1)
.github/workflows/ci.yml (1)
132-145:⚠️ The error-suppression issue from the previous review remains unaddressed.Line 137 still contains
|| true, which suppresses pytest collection failures (import errors, conftest issues, syntax errors). While the past review marked this as addressed in commit 16e1ddc, the suppression is still present. This masks real pytest issues and makes CI debugging harder.The
FIXTURES_OUTPUTvariable will contain empty or partial output if pytest fails, and the downstreamrg/grepcheck will only report "fixture not found" without revealing the underlying collection error.Recommend removing
|| trueto let pytest collection failures fail the step explicitly:- FIXTURES_OUTPUT="$(pytest --fixtures -q tldw_Server_API/tests || true)" + FIXTURES_OUTPUT="$(pytest --fixtures -q tldw_Server_API/tests)"This preserves the
set -euo pipefailbehavior and ensures CI fails loudly on pytest errors.
🧹 Nitpick comments (1)
Docs/API-related/Chunking_Templates_API_Documentation.md (1)
174-207: Excellent addition of hierarchical chunking example.The new example (lines 174–207) effectively demonstrates how to create a template with hierarchical boundaries for chapters and appendices. It's well-structured, includes clear boundary definitions, and the accompanying notes (lines 204–206) correctly align with the validation constraints documented elsewhere in the file (e.g., lines 414–417). The formatting and curl syntax are consistent with other examples in the documentation.
Optional enhancement: Consider briefly explaining why hierarchical chunking is useful (e.g., "preserves document structure by respecting chapter/appendix boundaries") to help users understand when to use this feature. This is a nice-to-have and could be deferred.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/ci.yml(4 hunks)Docs/API-related/Chunking_Templates_API_Documentation.md(1 hunks)Docs/Published/API-related/Chunking_API_Documentation.md(1 hunks)Docs/Published/API-related/Chunking_Templates_API_Documentation.md(2 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
Docs/Published/API-related/Chunking_API_Documentation.md
9-9: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
26-26: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
45-45: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
75-75: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
147-147: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
160-160: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
182-182: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
197-197: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
210-210: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
271-271: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
322-322: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Docs/Published/API-related/Chunking_Templates_API_Documentation.md
191-191: Multiple headings with the same content
(MD024, no-duplicate-heading)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (4)
.github/workflows/ci.yml (4)
91-91: Fix: PostgreSQL health check now correctly specifies user and database.The updated health check command properly aligns with the service configuration (matching
POSTGRES_USER: tldw_userandPOSTGRES_DB: tldw_content), improving reliability of the health probe.
139-145: Fixture pattern improvement: word boundary matching is more robust.The change from
'^test_user_credentials$'to'^test_user_credentials\b'correctly handles pytest's fixture output format, which includes scope and location info (e.g.,test_user_credentials [session scope] -- path:line). This is a sound improvement over the strict end-of-line anchor.
215-221: Smoke test environment relaxations improve speed and reduce brittleness.The changes to the single-user smoke test are well-motivated:
DISABLE_HEAVY_STARTUP: '0'(enable startup) +DEFER_HEAVY_STARTUP: '1'(but defer it) balances coverage with speed.ULTRA_MINIMAL_APP: '1'reduces surface area for CI smoke tests.PRIVILEGE_METADATA_VALIDATE_ON_STARTUP: '0'skips strict validation, appropriate for ephemeral smoke runs.MCP_DEBUG: '1'improves observability if smoke tests fail.Verify that these env vars are recognized and properly respected by the application startup code.
258-312: Multi-user smoke test adds valuable coverage for multi-user workflows.This new test section properly:
- Runs on a different port (8001) to avoid conflicts with single-user smoke.
- Uses
AUTH_MODE: multi_userand Postgres backend (DATABASE_URLpointing to the test Postgres service).- Includes a strong, ephemeral JWT secret (32+ chars as required).
- Mirrors single-user smoke structure (start → health-check → log-on-failure → stop) for consistency.
- Extends health-check timeout to 180s for multi-user setup robustness.
Verify that the
DATABASE_URLformat is correct for your application's connection string parser and that multi-user mode properly recognizes and uses theJWT_SECRET_KEY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/workflows/ci.yml (1)
132-147: Remove|| truefrom pytest fixture discovery to fail explicitly on collection errors.Line 137 still suppresses pytest exit codes via
|| true, masking actual collection failures (import errors, conftest issues, syntax errors). The temp file workaround on lines 139–147 addresses SIGPIPE concerns but doesn't solve the root issue. If pytest fails to collect fixtures,FIXTURES_OUTPUTbecomes empty/partial, and the subsequent grep/rg check reports "fixture not found" without revealing the real problem.Previous review flagged this as critical; removing
|| trueallows pytest errors to fail the step explicitly:- FIXTURES_OUTPUT="$(pytest --fixtures -q tldw_Server_API/tests || true)" + FIXTURES_OUTPUT="$(pytest --fixtures -q tldw_Server_API/tests)"The fixture check will still catch a genuinely missing fixture, but pytest collection errors won't be silently swallowed.
🧹 Nitpick comments (1)
tldw_Server_API/app/main.py (1)
3335-3348: Presence guards correctly prevent NameError in ULTRA_MINIMAL_APP mode.The guards checking
if 'router_name' in locals():before including routers are correct and necessary for ULTRA_MINIMAL_APP scenarios where these routers aren't imported. The pattern safely degrades when routers are unavailable.Minor suggestion for consistency and robustness: Consider following the pattern used for
tools_routerat line 3339-3340, which checks both existence and None:if 'router_name' in locals() and router_name is not None:. While not critical for these routers (since they don't explicitly get set to None on import failure), it would provide additional safety and consistency.Optional improvement: Add a brief comment explaining why these specific routers require runtime guards, e.g.:
# Guard optional routers that may not be imported in ULTRA_MINIMAL_APP if 'chat_router' in locals(): _include_if_enabled("chat", chat_router, prefix=f"{API_V1_PREFIX}/chat")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/ci.yml(5 hunks)tldw_Server_API/app/main.py(1 hunks)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (5)
.github/workflows/ci.yml (4)
91-91: PostgreSQL health check command correctly updated to match credentials and database.The health check now explicitly verifies the intended user and database, improving reliability. Aligns with
POSTGRES_USER: tldw_userandPOSTGRES_DB: tldw_contentconfigured in the service.
217-224: Smoke startup flags updated for faster, lighter smoke testing.The changes strategically adjust startup behavior: enabling deferred startup (line 218), adding trace/debug output (lines 219–220), running an ultra-minimal app (line 222), and skipping strict privilege validation (line 224). These work together to reduce startup time and complexity for smoke tests while preserving diagnostics. This aligns with the PR goal of refining E2E workflows as canaries.
234-234: Smoke startup timeout explicitly configured.The
SMOKE_STARTUP_TIMEOUT_SECONDS: '240'(4 minutes) provides explicit control over health check tolerance. Consistent with the multi-user smoke setup (line 293) and sufficient for startup in CI environments.
262-319: Multi-user smoke (Postgres-backed) section well-structured and properly isolated.The new multi-user smoke tests correctly:
- Use separate port (8001) and label (smoke-mu) to avoid conflicts
- Configure Postgres for auth (DATABASE_URL, JWT_SECRET_KEY)
- Keep content backend on SQLite (line 278) to avoid PG content migrations
- Apply similar startup flags and trace/debug options for consistency
- Include complete lifecycle (start, health-check, log-on-failure, stop)
The 240-second timeout (line 293) matches single-user, and the strong JWT secret is appropriate for ephemeral CI use. This setup properly validates multi-user workflows as intended in the E2E test refinement goal.
tldw_Server_API/app/main.py (1)
3090-3090: Verify intentional disabling of minimal test app routing block.The condition
if _MINIMAL_TEST_APP and False:at line 3090 makes the entire block (lines 3090-3277) unreachable due to theand Falseclause. This appears to intentionally disable a large routing configuration block for minimal test mode.Is this
and Falseintentional (staged feature being developed) or leftover debug code that should be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tldw_Server_API/app/main.py (2)
3336-3374: Consider using feature flags instead oflocals()checks for consistency and maintainability.The pattern
if 'router_name' in locals():used to guard router inclusion is fragile for several reasons:
- String literals are refactoring-hostile: Variable renames won't update string checks, causing silent failures
- Inconsistent with established pattern: Lines 343-357 define
_HAS_*feature flags (e.g.,_HAS_AUDIO,_HAS_MEDIA) which are set during import; this segment mixeslocals()checks with flag checks (line 3353)- Redundant safety: Routers are already guarded at import time (lines 390-578) with try/except blocks; if import failed, the variable won't exist—
locals()checks provide minimal additional valueRecommend consolidating on the feature flag pattern used elsewhere:
-if 'chat_router' in locals(): - _include_if_enabled("chat", chat_router, prefix=f"{API_V1_PREFIX}/chat") +if _HAS_CHAT: # Set during import phase + _include_if_enabled("chat", chat_router, prefix=f"{API_V1_PREFIX}/chat")This requires defining additional flags (e.g.,
_HAS_CHAT = Falsenear line 357) and setting them during the import phase, but yields a more maintainable and consistent codebase.
3355-3370: Optionally narrow exception handling for better diagnostics.Static analysis flags broad
Exceptioncatching at lines 3359 and 3369. While this is likely intentional to prevent startup failures when optional routers (outputs, connectors) are unavailable, you could improve debuggability by catching expected exceptions:try: from tldw_Server_API.app.api.v1.endpoints.outputs import router as _outputs_router _include_if_enabled("outputs", _outputs_router, prefix=f"{API_V1_PREFIX}", tags=["outputs"]) -except Exception as _e: +except (ImportError, AttributeError) as _e: logger.warning(f"Outputs endpoint not available: {_e}")This preserves fail-open behavior for missing dependencies while allowing unexpected exceptions (e.g., logic errors) to surface during development. If you need to catch all exceptions for stability (common in production startup sequences), consider adding a comment justifying the broad catch to suppress linter warnings and document intent.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/ci.yml(5 hunks)tldw_Server_API/app/main.py(3 hunks)
🧰 Additional context used
🪛 Ruff (0.14.4)
tldw_Server_API/app/main.py
3359-3359: Do not catch blind exception: Exception
(BLE001)
3369-3369: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
🔇 Additional comments (4)
.github/workflows/ci.yml (4)
91-91: Improved PostgreSQL readiness check with explicit database reference.The health check now validates the specific
tldw_contentdatabase is ready (via-d tldw_content), not just that the user can connect. This reduces false positives and aligns with the service configuration.
132-147: Fixture verification step uses temp file to prevent SIGPIPE, but error suppression remains.The improvements to the fixture discovery check (temp file handling, better regex) are good. However, Line 137 still uses
|| true, which suppresses actual pytest collection failures (import errors, conftest issues, syntax errors). If pytest fails during fixture collection,FIXTURES_OUTPUTbecomes empty or partial, and the subsequent check simply reports "fixture not found" without revealing the real problem.Consider removing
|| trueto fail CI explicitly on pytest collection errors:- FIXTURES_OUTPUT="$(pytest --fixtures -q tldw_Server_API/tests || true)" + FIXTURES_OUTPUT="$(pytest --fixtures -q tldw_Server_API/tests)"The temp file + regex check will still catch genuinely missing fixtures, but now pytest collection errors won't be silently swallowed.
217-234: Single-user smoke startup configuration enables deferred heavy startup with tracing and extended timeout.The changes enable more realistic startup behavior (
DISABLE_HEAVY_STARTUP: '0') while deferring heavy initialization (DEFER_HEAVY_STARTUP: '1'). Debug flags and extended timeout (240s) are appropriate for E2E smoke testing. Disabling privilege validation in smoke is reasonable since the smoke test isn't testing auth specifics. Step-level env overrides are correct rather than polluting the job-level environment.
262-319: Multi-user smoke tests (Postgres-backed) added with proper port isolation and SQLite content backend.The new multi-user smoke block is well-structured:
- Clean port separation: multi-user uses 8001, single-user uses 8000
- Ephemeral JWT secret (34+ chars) suitable for CI
- DATABASE_URL properly bound to Postgres service via job context
- SQLite content backend avoids PG content migrations, smart for ephemeral smoke
- Parallel startup pattern consistent with single-user smoke
Note: The health-check, log-printing, and stop steps are duplicated across single-user and multi-user blocks. This is maintainable for now, but consider centralizing common patterns if the blocks grow further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
New-User-Guide.md (1)
1-1: Resolve the WIP/NOT ACCURATE marking before merge.Documentation marked as incomplete should not be delivered to users. Either remove this marking after validation or defer this file's merge until the content is finalized and accurate.
♻️ Duplicate comments (1)
.github/workflows/ci.yml (1)
132-147: Critical: The|| trueon line 137 still suppresses pytest fixture collection errors.This issue was flagged in the prior review as a blocker. The pipe to
|| truemasks pytest errors (import errors, conftest issues, syntax errors) during fixture collection. If pytest fails to collect fixtures,FIXTURES_OUTPUTbecomes empty or partial, and the subsequentrg/grepcheck reports "fixture not found" without revealing the real problem.The temporary file approach (lines 139–147) handles SIGPIPE gracefully but does not address the root issue: pytest failures must fail the step explicitly.
Remove
|| trueso pytest collection errors propagate and fail the CI step:- FIXTURES_OUTPUT="$(pytest --fixtures -q tldw_Server_API/tests || true)" + FIXTURES_OUTPUT="$(pytest --fixtures -q tldw_Server_API/tests)" echo "Checking for 'test_user_credentials' fixture in e2e suite..." TMPFILE=$(mktemp) # Write to a temp file to avoid SIGPIPE issues with set -o pipefail when using -q matchers printf "%s\n" "$FIXTURES_OUTPUT" > "$TMPFILE"The
rg/grepcheck will still catch a genuinely missing fixture, but pytest collection errors will now fail loudly.This concern mirrors the past review comment; it appears the fix was not applied.
🧹 Nitpick comments (4)
New-User-Guide.md (2)
74-76: API key generation guidance is helpful; minor clarity nit on Option B.Option B mentions "once you have a working
.env" but a user following this guide for the first time may not yet have a working.env. Consider rephrasing to "re-run the initializer in future" or provide the command earlier as an alternative to manual generation.
244-244: Clarify Docker interactivity note for users unfamiliar with container shells.Line 244 notes the command is interactive and should be run "attached to the container," but doesn't explain what that means. Consider adding an example, e.g., "...attached to the container (the above
docker compose execcommand automatically handles this)."tldw_Server_API/tests/e2e/conftest.py (2)
136-158: LGTM! The rate limit toggle logic is correct.The fixture properly isolates tests marked with
rate_limitsby temporarily removing the environment variables that disable rate limiting, then restoring them afterward. The implementation handles missing variables gracefully.The cleanup logic could be slightly simplified since
os.environ.popwith a default is already safe:finally: - if original_test_mode is None: - os.environ.pop("TEST_MODE", None) - else: + if original_test_mode is not None: os.environ["TEST_MODE"] = original_test_mode - if original_testing is None: - os.environ.pop("TESTING", None) - else: + else: + os.environ.pop("TEST_MODE", None) + if original_testing is not None: os.environ["TESTING"] = original_testing + else: + os.environ.pop("TESTING", None)
130-134: Remove the redundant fixture—plugin already attaches results to config.Verification confirms the
_attach_results_to_configfixture is redundant. The plugin fixturetest_resultsalready attaches the results dict torequest.session.config._test_results(with error handling fallbacks). Sincerequest.configandrequest.session.configrefer to the same object in session scope, the conftest fixture merely re-assigns the same object to the same location.The
pytest_sessionfinishhook (line 163 ine2e_state_fixtures.py) reads fromsession.config._test_results, which is already populated by the plugin fixture. The conftest fixture can be safely removed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
.github/workflows/ci.yml(6 hunks)New-User-Guide.md(8 hunks)pyproject.toml(1 hunks)tldw_Server_API/app/core/Chunking/strategies/ebook_chapters.py(1 hunks)tldw_Server_API/app/core/Prompt_Management/prompt_studio/prompt_executor.py(1 hunks)tldw_Server_API/app/core/Web_Scraping/WebSearch_APIs.py(2 hunks)tldw_Server_API/tests/Resource_Governance/conftest.py(0 hunks)tldw_Server_API/tests/e2e/conftest.py(4 hunks)
💤 Files with no reviewable changes (1)
- tldw_Server_API/tests/Resource_Governance/conftest.py
✅ Files skipped from review due to trivial changes (1)
- tldw_Server_API/app/core/Chunking/strategies/ebook_chapters.py
🧰 Additional context used
🧬 Code graph analysis (1)
tldw_Server_API/tests/e2e/conftest.py (1)
tldw_Server_API/tests/_plugins/e2e_state_fixtures.py (1)
test_results(15-37)
🪛 Gitleaks (8.29.0)
New-User-Guide.md
[high] 97-99: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.
(curl-auth-header)
[high] 109-111: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.
(curl-auth-header)
[high] 211-213: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.
(curl-auth-header)
[high] 268-269: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.
(curl-auth-header)
🪛 Ruff (0.14.4)
tldw_Server_API/app/core/Web_Scraping/WebSearch_APIs.py
687-768: Possible SQL injection vector through string-based query construction
(S608)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (15)
.github/workflows/ci.yml (4)
91-91: Expand PostgreSQL health check command — approved.Adding the
-d tldw_contentflag topg_isreadyis sound; it checks connectivity to the specific database and makes the health check more comprehensive.
217-224: Clarify startup configuration pattern: DISABLE_HEAVY_STARTUP=0 + DEFER_HEAVY_STARTUP=1.The single-user smoke test now sets
DISABLE_HEAVY_STARTUP: '0'(enable heavy startup) while also settingDEFER_HEAVY_STARTUP: '1'(defer it). This is a different pattern than the earlierDISABLE_HEAVY_STARTUP: '1'. The addition ofMINIMAL_TEST_APP: '1'andPRIVILEGE_METADATA_VALIDATE_ON_STARTUP: '0'also changes startup behavior.Ensure this combination is intentional and that startup behaves as expected for the smoke test. If the intent is to defer heavy operations but still enable them, confirm that
DEFER_HEAVY_STARTUPis properly honored by the startup logic.Can you confirm:
- The purpose of enabling heavy startup (0) but deferring it (1)?
- Whether
MINIMAL_TEST_APPand the privilege validation toggle align with smoke test goals?- If similar changes are needed in the multi-user smoke section (lines 274–283)?
262-319: Multi-user smoke tests (Postgres-backed) — well-structured addition.The new multi-user smoke test block adds comprehensive coverage for multi-user auth + Postgres database backend. The structure mirrors the single-user path: start → health-check → log-on-failure → stop. A few observations:
- Port isolation: Multi-user runs on port 8001; single-user on 8000. Good separation.
- JWT secret: The hard-coded test secret (32+ chars) is appropriate for CI ephemeral environments.
- Content backend:
TLDW_CONTENT_DB_BACKEND: 'sqlite'keeps content (media/notes) on SQLite while using Postgres for auth, avoiding content migrations in smoke. Sensible trade-off.- Lifecycle management:
if: always()on health-check and stop ensures cleanup even on failure. Good practice.Verify that the multi-user startup logic respects all the environment variables (especially
JWT_SECRET_KEYandDATABASE_URL) without requiring additional manual setup or migrations.
1480-1485: Windows health check timeout — reasonable extension.Extended timeout from default to 900 seconds (15 minutes) for Windows runners is justified; Windows CI jobs often run slower. The conditional
if: ${{ always() && matrix.os == 'windows-latest' }}ensures this applies only to Windows.tldw_Server_API/app/core/Prompt_Management/prompt_studio/prompt_executor.py (1)
406-407: LGTM! Appropriate use of raw f-string for regex pattern.Using
rf"..."correctly ensures that backslash sequences like\sare treated as regex escapes rather than Python string escapes, preventingSyntaxWarningabout invalid escape sequences. This improves code clarity and maintainability.tldw_Server_API/app/core/Web_Scraping/WebSearch_APIs.py (2)
687-687: LGTM! Raw f-string correctly preserves LaTeX delimiters.Using
rf"..."for this prompt string is appropriate because it contains LaTeX markup (e.g.,\(,\),\[,\]) that requires literal backslashes. This prevents Python from interpreting these as escape sequences.Note: The static analysis hint (S608) flagging "SQL injection" for lines 687-768 is a false positive—this code constructs a prompt string for an LLM, not a SQL query.
749-750: LGTM! LaTeX formatting instructions clarified.The adjustments to the LaTeX examples improve clarity by showing the correct syntax for inline (
\(and\)) and block (\[and\]) formulas with proper citation formatting.pyproject.toml (1)
563-569: No issues found—all registered pytest plugins exist and are properly configured.The four newly registered plugins all exist in the codebase and do not create duplicate registrations:
tldw_Server_API.tests._plugins.e2e_fixturesand related plugins in_plugins/each contribute specific fixturestldw_Server_API.tests.AuthNZ.conftestdefines 10 distinct auth/JWT-related fixtures, whileauthnz_fixtures(line 556) defines a single schema fixture—they serve different purposesThe initial import test failure was due to the sandbox environment lacking the
logurudependency (which is a valid project dependency inpyproject.toml). In a properly configured environment with dependencies installed, all modules will import successfully.The naming pattern difference (three
_plugins/modules vs. oneAuthNZ/conftest) follows pytest convention for test suite–scoped configuration files and is intentional, not problematic.New-User-Guide.md (5)
97-99: Gitleaks flagged auth headers in curl examples; these are false positives.The static analysis tool detected API key placeholders in curl commands as potential leaked credentials. However, the guide explicitly instructs users to replace
CHANGE_ME_TO_SECURE_API_KEYwith actual keys. This is expected and safe for documentation. No action required, though you may consider using an even more obviously fake placeholder (e.g.,your-api-key-here) to reduce scanner noise in the future.Also applies to: 109-111, 211-213, 268-269
82-83: Interactive nature of AuthNZ.initialize is now well-documented; verify behavior matches.The guide states the command is "interactive" and advises users to "answer the prompts" and "run it again" if configuration issues occur. Ensure the actual implementation clearly indicates when input is needed and provides helpful prompts on re-runs. Consider adding an example of a typical prompt/response if the guide is meant for true beginners.
183-223: Documentation inaccuracy found: environment variable precedence is backwards.All three configuration mechanisms are correctly implemented and functional:
✅ 1.
default_apiin config.txt: Parsed inapp/core/config.pyandendpoints/llm_providers.pyfrom[API]section
✅ 2.DEFAULT_LLM_PROVIDERenv var: Actively used inendpoints/chat.pyandschemas/chat_request_schemas.py
✅ 3.GET /api/v1/llm/providersreturnsdefault_provider: Confirmed inendpoints/llm_providers.pyHowever, the documented priority is inverted. Section 4.6 claims the environment variable "overrides when
config.txtlacks a default," but the code reveals the opposite hierarchy in_get_default_provider_with_fallback()and the schema initialization:
- Config.txt
default_apitakes first priorityDEFAULT_LLM_PROVIDERenv var acts as fallback only (used when config.txt is absent)TEST_MODEas final fallbackThe documentation should state: "Alternatively, set an environment variable (used when
config.txtlacks a default)."
21-21: Python version requirement in New-User-Guide.md is accurate.The guide's statement (Python 3.10+ with 3.11–3.13 recommended/tested) aligns with the actual codebase:
- Minimum requirement
requires-python = ">=3.10"matches the guide's 3.10+- Classifiers list Python 3.11, 3.12, and 3.13 exactly match the guide's recommended/tested versions
- FastAPI 2025 guidance recommends Python >=3.10 with testing on 3.11 and 3.12, and the guide appropriately extends to include 3.13 based on the codebase's explicit classifier support
268-275: <!-- [scratchpad_start]
[task verify_media_endpoint_implementation COMPLETE]
[observations]
- AddMediaForm found in media_request_models.py
- Form fields verified:
- media_type: MediaType (required) ✓ matches guide
- title: Optional[str] ✓ matches guide
- keywords_str with alias="keywords" ✓ matches guide (accepts as "keywords" in form)
- perform_analysis: bool ✓ matches guide
- urls: Optional[List[str]] (not used in example but available)
- files: handled by FastAPI File() ✓ matches guide
- Endpoint path: /api/v1/media/add ✓ confirmed
- All field names in documentation are CORRECT
[analysis]
The guide documentation is ACCURATE. The curl example uses the correct:
- Endpoint path: /api/v1/media/add
- Field names: media_type, title, keywords, perform_analysis, files
- All parameters exist and match the implementation
[conclusion]
Review comment concern is INCORRECT. No issues found. Verification shows documentation is correct.[done]
Verification complete. No issues found with the endpoint or field names.
</thinking] -->No issues found — documentation is accurate.
I've verified the
/api/v1/media/addendpoint implementation against the guide documentation. The endpoint accepts all the parameters shown in the curl example:
- Endpoint path:
/api/v1/media/add✓- Field names:
media_type,title,keywords,perform_analysis,files✓- Form structure: Multipart form data with
-Fflags ✓The
AddMediaFormmodel intldw_Server_API/app/api/v1/schemas/media_request_models.pyconfirms these are valid parameters with the correct names. The fieldkeywords_struses an alias"keywords"to accept form data with the field namekeywords, matching the guide exactly.tldw_Server_API/tests/e2e/conftest.py (2)
44-45: LGTM! Clean marker registration.The
rate_limitsmarker is clearly documented and integrates well with the existing marker system and the_rate_limit_env_togglefixture below.
122-128: LGTM! The--run-slowoption integrates cleanly.The option pairs well with
--skip-slowto provide flexible control over slow test execution, with--run-slowcorrectly overriding--skip-slowwhen both are specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tldw_Server_API/tests/Notes_NEW/integration/test_notes_api.py (1)
273-273: Remove unusedmonkeypatchparameter.The
test_rate_limit_on_create_notefunction signature includesmonkeypatchbut doesn't use it anywhere in the test body (lines 274-287).Apply this diff:
-def test_rate_limit_on_create_note(client_with_notes_db: TestClient, monkeypatch): +def test_rate_limit_on_create_note(client_with_notes_db: TestClient):
♻️ Duplicate comments (2)
tldw_Server_API/tests/Collections/test_outputs_templates_api.py (1)
10-19: Same duplicate fixtures as test_items_and_outputs_api.py.These fixtures are identical to those in test_items_and_outputs_api.py and test_reading_highlights_api.py. See the review comment on test_items_and_outputs_api.py lines 18-27 for consolidation recommendation.
tldw_Server_API/tests/Collections/test_reading_highlights_api.py (1)
10-19: Same duplicate fixtures as other Collection test files.These fixtures are identical to those in test_items_and_outputs_api.py and test_outputs_templates_api.py. See the review comment on test_items_and_outputs_api.py lines 18-27 for consolidation recommendation.
🧹 Nitpick comments (6)
tldw_Server_API/tests/Resource_Governance/conftest.py (1)
18-18: Remove unnecessarynoqadirective.The
# noqa: F401comment is unnecessary because the F401 rule is not enabled in your Ruff configuration. While the logical intent (suppressing "imported but unused" warnings for re-exported fixtures) is sound, the directive has no effect.Apply this diff to clean up the unused directive:
-from tldw_Server_API.tests.AuthNZ.conftest import ( # noqa: F401 +from tldw_Server_API.tests.AuthNZ.conftest import ( setup_test_database, clean_database, test_db_pool,Based on static analysis.
tldw_Server_API/tests/Character_Chat/test_world_book_negatives_and_new_endpoint.py (1)
257-261: Consider tightening the sender verification to match the comment's claim.The comment states assistant messages are stored under the character's name, but the assertion only checks
sender != "user". This could match messages withNone, empty strings, or other unexpected sender values.To fully validate the behavior described, consider verifying that the sender matches the character's (sanitized) name.
You could fetch the character details and verify against its name:
# Fetch character to get expected sender name r_char = await client.get(f"/api/v1/characters/{character_id}", headers=headers) assert r_char.status_code == 200 expected_sender = r_char.json()["name"] # or sanitized version if needed # Then verify with specific sender name assert any( m.get("sender") == expected_sender and m.get("content") == assistant_content for m in msgs )tldw_Server_API/app/core/Resource_Governance/governor_redis.py (1)
1237-1294: Simplify boolean checks for readability.The per-category denial handling logic is correct, but line 1257 uses an unnecessarily convoluted boolean check:
if not bool(cat_info.get("allowed") is False): continueThis can be simplified to:
if cat_info.get("allowed") is not False: continueThe same pattern appears on line 1243. While functionally correct, the simpler form improves readability.
Apply this diff to simplify line 1257:
- if not bool(cat_info.get("allowed") is False): + if cat_info.get("allowed") is not False: continueConsider the same simplification on line 1243:
- if not bool(cat_info.get("allowed")): + if not cat_info.get("allowed"):tldw_Server_API/app/core/AuthNZ/jwt_service.py (1)
468-482: Duplicate JWTService singleton definitions should be consolidatedYou now define
_jwt_service,get_jwt_service, andreset_jwt_servicetwice (once here and again around lines 789–803). The later, module-level versions are the ones actually used, so this earlier block is redundant and risks future edits diverging.Consider removing this first block and keeping only the canonical definitions at the bottom of the file:
-_jwt_service: Optional["JWTService"] = None - - -def get_jwt_service() -> JWTService: - """Get JWT service singleton instance.""" - global _jwt_service - if not _jwt_service: - _jwt_service = JWTService() - return _jwt_service - - -def reset_jwt_service() -> None: - """Reset the cached JWTService singleton (used primarily in tests).""" - global _jwt_service - _jwt_service = NoneThis will avoid confusion and make future changes to the singleton behavior less error-prone.
tldw_Server_API/app/main.py (2)
506-538: Minimal app gating and collections router coverage look correct; adjust linting on broad exceptsThe new condition
if _MINIMAL_TEST_APP and not _ULTRA_MINIMAL_APP:keeps the “minimal” research/paper-search profile from running under ultra-minimal mode, which matches the intent of the _ULTRA_MINIMAL_APP gate above. The extended minimal-app router set (includingoutputs_templates,outputs,reading_highlights, anditems) is wired under/api/v1and guarded bytry/except, so minimal tests can now hit those endpoints without affecting full-app behavior.Ruff is flagging the new broad exception catches here (BLE001). Since these imports are intentionally best-effort, you might either narrow them to
ImportErroror explicitly mark them as intentional to keep lint noise down:- except Exception as _ot_min_err: + except Exception as _ot_min_err: # noqa: BLE001 @@ - except Exception as _outputs_min_err: + except Exception as _outputs_min_err: # noqa: BLE001 @@ - except Exception as _rh_min_err: + except Exception as _rh_min_err: # noqa: BLE001 @@ - except Exception as _items_min_err: + except Exception as _items_min_err: # noqa: BLE001Functionally this path looks consistent with the rest of the minimal-app design.
Also applies to: 3090-3193
3357-3396: Optional router guards are solid; keep tags for claims and align BLE001 handlingThe new
if '<router>' in locals()checks before_include_if_enabled(...)are a good safety net to avoidNameErrorwhen some routers weren’t imported (e.g., ULTRA/MINIMAL or optional modules). That pattern is consistent across chat, character, metrics, chunking, embeddings, vector-stores, claims, and media-embeddings.Two small polish items:
- Claims tag consistency – unlike neighboring calls, the
claims_routerinclusion no longer passes tags, which may drop these endpoints from the “claims” OpenAPI tag group:- if 'claims_router' in locals(): - _include_if_enabled("claims", claims_router, prefix=f"{API_V1_PREFIX}") + if 'claims_router' in locals(): + _include_if_enabled("claims", claims_router, prefix=f"{API_V1_PREFIX}", tags=["claims"])
- Ruff BLE001 on broad excepts – for the new/adjusted exception blocks in this area, you can mirror the rest of this file by annotating them as intentional or by narrowing to
ImportError:- except Exception as _e: + except Exception as _e: # noqa: BLE001 @@ - except Exception as _conn_e: + except Exception as _conn_e: # noqa: BLE001These tweaks keep docs grouping and static analysis aligned with the surrounding patterns.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
tldw_Server_API/app/core/AuthNZ/jwt_service.py(1 hunks)tldw_Server_API/app/core/Resource_Governance/governor_redis.py(2 hunks)tldw_Server_API/app/main.py(4 hunks)tldw_Server_API/tests/AuthNZ/conftest.py(2 hunks)tldw_Server_API/tests/Character_Chat/test_world_book_negatives_and_new_endpoint.py(1 hunks)tldw_Server_API/tests/Collections/test_items_and_outputs_api.py(1 hunks)tldw_Server_API/tests/Collections/test_outputs_templates_api.py(1 hunks)tldw_Server_API/tests/Collections/test_reading_highlights_api.py(1 hunks)tldw_Server_API/tests/Notes/test_auto_title_integration.py(1 hunks)tldw_Server_API/tests/Notes_NEW/integration/test_notes_api.py(3 hunks)tldw_Server_API/tests/Notes_NEW/integration/test_notes_graph_edges.py(3 hunks)tldw_Server_API/tests/Notes_NEW/integration/test_notes_graph_rbac.py(2 hunks)tldw_Server_API/tests/Resource_Governance/conftest.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
tldw_Server_API/tests/Collections/test_items_and_outputs_api.py (3)
tldw_Server_API/tests/Collections/test_outputs_templates_api.py (4)
preserve_app_state(11-13)reset_app_overrides(17-19)client_with_user(22-34)override_user(23-24)tldw_Server_API/tests/Collections/test_reading_highlights_api.py (4)
preserve_app_state(11-13)reset_app_overrides(17-19)client_with_user(22-34)override_user(23-24)tldw_Server_API/tests/Watchlists/test_watchlists_api.py (2)
client_with_user(18-32)override_user(19-20)
tldw_Server_API/tests/Collections/test_reading_highlights_api.py (2)
tldw_Server_API/tests/Collections/test_items_and_outputs_api.py (4)
preserve_app_state(19-21)reset_app_overrides(25-27)client_with_user(31-63)override_user(32-33)tldw_Server_API/tests/Collections/test_outputs_templates_api.py (4)
preserve_app_state(11-13)reset_app_overrides(17-19)client_with_user(22-34)override_user(23-24)
tldw_Server_API/tests/Collections/test_outputs_templates_api.py (2)
tldw_Server_API/tests/Collections/test_items_and_outputs_api.py (4)
preserve_app_state(19-21)reset_app_overrides(25-27)client_with_user(31-63)override_user(32-33)tldw_Server_API/tests/Collections/test_reading_highlights_api.py (4)
preserve_app_state(11-13)reset_app_overrides(17-19)client_with_user(22-34)override_user(23-24)
tldw_Server_API/tests/Resource_Governance/conftest.py (1)
tldw_Server_API/tests/AuthNZ/conftest.py (4)
setup_test_database(656-903)clean_database(907-947)test_db_pool(951-970)isolated_test_environment(322-652)
tldw_Server_API/tests/Notes/test_auto_title_integration.py (1)
tldw_Server_API/app/core/AuthNZ/User_DB_Handling.py (1)
get_request_user(416-708)
tldw_Server_API/tests/AuthNZ/conftest.py (1)
tldw_Server_API/app/core/DB_Management/Users_DB.py (1)
reset_users_db(664-667)
tldw_Server_API/app/core/AuthNZ/jwt_service.py (1)
tldw_Server_API/tests/AuthNZ_SQLite/test_quota_enforcement_http_sqlite.py (1)
_jwt_service(8-10)
tldw_Server_API/tests/Notes_NEW/integration/test_notes_graph_rbac.py (1)
tldw_Server_API/app/core/AuthNZ/User_DB_Handling.py (1)
get_request_user(416-708)
tldw_Server_API/app/core/Resource_Governance/governor_redis.py (5)
tldw_Server_API/app/core/Infrastructure/redis_factory.py (5)
evalsha(566-579)evalsha(713-715)get(511-513)get(681-683)get(779-781)tldw_Server_API/tests/Resource_Governance/test_governor_redis.py (1)
evalsha(153-154)tldw_Server_API/app/core/Resource_Governance/governor.py (2)
_parse_entity(208-213)_get_policy(197-205)tldw_Server_API/app/core/Metrics/metrics_manager.py (1)
increment(1303-1312)tldw_Server_API/app/core/Resource_Governance/metrics_rg.py (1)
_labels(109-130)
🪛 Ruff (0.14.4)
tldw_Server_API/tests/Collections/test_items_and_outputs_api.py
31-31: Unused function argument: tmp_path
(ARG001)
tldw_Server_API/tests/Resource_Governance/conftest.py
18-18: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
tldw_Server_API/app/main.py
3177-3177: Do not catch blind exception: Exception
(BLE001)
3182-3182: Do not catch blind exception: Exception
(BLE001)
3187-3187: Do not catch blind exception: Exception
(BLE001)
3192-3192: Do not catch blind exception: Exception
(BLE001)
3380-3380: Do not catch blind exception: Exception
(BLE001)
3390-3390: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/core/Resource_Governance/governor_redis.py
1235-1236: try-except-pass detected, consider logging the exception
(S110)
1235-1235: Do not catch blind exception: Exception
(BLE001)
1249-1250: try-except-pass detected, consider logging the exception
(S110)
1249-1249: Do not catch blind exception: Exception
(BLE001)
1270-1270: Do not catch blind exception: Exception
(BLE001)
1283-1284: try-except-pass detected, consider logging the exception
(S110)
1283-1283: Do not catch blind exception: Exception
(BLE001)
1289-1290: try-except-pass detected, consider logging the exception
(S110)
1289-1289: Do not catch blind exception: Exception
(BLE001)
1291-1292: try-except-continue detected, consider logging the exception
(S112)
1291-1291: Do not catch blind exception: Exception
(BLE001)
1293-1294: try-except-pass detected, consider logging the exception
(S110)
1293-1293: Do not catch blind exception: Exception
(BLE001)
1298-1299: try-except-pass detected, consider logging the exception
(S110)
1298-1298: Do not catch blind exception: Exception
(BLE001)
1319-1320: try-except-pass detected, consider logging the exception
(S110)
1319-1319: Do not catch blind exception: Exception
(BLE001)
1322-1324: try-except-pass detected, consider logging the exception
(S110)
1322-1322: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (14)
tldw_Server_API/tests/AuthNZ/conftest.py (1)
594-594: LGTM! Consistent singleton reset pattern.The addition of
reset_users_db()calls in theisolated_test_environmentfixture follows the established pattern for resetting singletons before and after each test. This ensures the UsersDB singleton is properly reset for per-test database isolation, consistent with other singleton resets already present in this fixture.Also applies to: 601-601, 629-629
tldw_Server_API/tests/Resource_Governance/conftest.py (2)
15-23: Good refactoring to explicit imports.Replacing
pytest_pluginswith explicit imports makes it clearer which fixtures are being re-exported from the AuthNZ test suite. This improves discoverability and maintainability.
28-28: LGTM! Improved docstring clarity.The simplified docstring is more concise while retaining the essential information about the fixture's purpose.
tldw_Server_API/tests/Character_Chat/test_world_book_negatives_and_new_endpoint.py (1)
248-250: LGTM: Improved clarity.Capturing
assistant_contentinto a variable before asserting improves readability and enables its reuse in the persistence verification below.tldw_Server_API/tests/Collections/test_items_and_outputs_api.py (1)
35-36: LGTM - Environment override ensures full app profile.Setting
MINIMAL_TEST_APP="0"before importing the app correctly ensures that collections/outputs endpoints are tested with the full app profile rather than a minimal configuration.tldw_Server_API/tests/Collections/test_outputs_templates_api.py (1)
26-29: LGTM - Deferred app import honors environment configuration.Setting the environment variable before the local import ensures the test uses the full app profile. The pattern of importing within the fixture scope is appropriate for test isolation.
tldw_Server_API/tests/Collections/test_reading_highlights_api.py (1)
26-29: LGTM - Consistent pattern with deferred app import.The environment override and local import pattern correctly ensures reading/collections endpoints are tested with the full app profile. The different user ID (321) and username ("reader") appropriately differentiate this test context from the other collection tests.
tldw_Server_API/app/core/Resource_Governance/governor_redis.py (3)
429-440: LGTM: Non-mutating check logic is sound.The change to only invoke the Lua helper when the window is full (count >= limit) ensures this path remains non-mutating for check-only scenarios. When the window is not full or when using the stub client, approximating retry_after via the oldest member's score is appropriate and avoids unnecessary script execution.
1225-1232: Correct logic: allows tokens to proceed to saturation.The change to only short-circuit on requests denials (rather than any denial) enables tokens category to continue to the reservation phase where saturation up to capacity is permitted. This handles the case where an initial over-limit token request can still be partially satisfied.
1302-1321: LGTM: Concurrency denial short-circuit is appropriate.This short-circuit path correctly avoids creating handles or acquiring leases when streams or jobs categories are denied. Since concurrency limits are binary (allow/deny with no partial fulfillment), returning early without a handle is the right approach.
tldw_Server_API/tests/Notes/test_auto_title_integration.py (1)
5-26: Fixture correctly forces full app profile and user overrideThe fixture looks well-structured: it forces
MINIMAL_TEST_APP=0/ULTRA_MINIMAL_APP=0, reloadsapp_mainso router gating re-evaluates with the new env, and overridesget_request_userwith an async stub user. UsingTestClientas a context manager and clearingdependency_overridesafterward prevents leakage into other tests. This setup should reliably exercise the full Notes API in integration tests.tldw_Server_API/tests/Notes_NEW/integration/test_notes_graph_edges.py (1)
7-8: Graph DB fixture correctly reloads full app profile and injects per-user DBThe updated
client_with_graph_dbfixture is well-structured:
- It forces
TEST_MODE,AUTH_MODE=multi_user, JWT settings, andMINIMAL_TEST_APP=0/ULTRA_MINIMAL_APP=0, then callsreset_settings()soget_settings()sees the new env.- It reloads
tldw_Server_API.app.mainafter the env tweaks so the router gating logic re-evaluates with the full app profile.- It overrides both
get_request_userandget_chacha_db_for_useron the reloadedfastapi_app, giving an authenticated user and a temp ChaChaNotes DB bound to this test.- It uses
TestClientas a context manager and clearsdependency_overridesplusreset_settings()afterward to avoid leaking state.This should give deterministic coverage of Notes + Notes Graph behavior against a temporary DB under the intended multi-user + JWT configuration.
Also applies to: 26-63
tldw_Server_API/tests/Notes_NEW/integration/test_notes_graph_rbac.py (1)
22-45: Focused Notes Graph RBAC app is isolated and consistent with AuthNZ settingsThe
client_with_user_overridefixture cleanly isolates the Notes Graph RBAC tests:
- It sets
TEST_MODE,AUTH_MODE=multi_user, and JWT settings, plusMINIMAL_TEST_APP=0/ULTRA_MINIMAL_APP=0, then callsreset_settings()soJWTService(get_settings())matches the env.- Instead of relying on the full
main.app, it constructs a smallFastAPI()instance and mounts onlynotes_graph_routerunderf"{API_V1_PREFIX}/notes", which keeps the surface minimal and focused.- It overrides
get_request_userwith a benign activeUser, lettingrequire_token_scopeenforce scope purely via the JWT claims.- It uses
TestClient(app)as a context manager, clearsdependency_overrides, and resets settings after the fixture, so later tests see a clean AuthNZ configuration.This pattern should give predictable, fast RBAC tests for the notes graph endpoints without coupling to the full app wiring.
tldw_Server_API/tests/Notes_NEW/integration/test_notes_api.py (1)
35-42: Module reload pattern is intentional and established—current isolation concerns are mitigated.Verification shows this pattern is used consistently across 18 test locations throughout the codebase (Notes_NEW, Notes, AuthNZ, Admin, sandbox, etc.) to force the app to reload with environment variables applied. While
importlib.reload()has side effects, pytest-xdist is not configured as a dependency and no parallel execution settings are found, so tests run sequentially by default.However, be aware: This pattern will break if parallel execution (pytest-xdist or similar) is enabled in the future. To future-proof this, consider adding a guard or warning if
pytestdetects concurrent execution, or documenting the limitation explicitly.
| @pytest.fixture(scope="session", autouse=True) | ||
| def preserve_app_state(): | ||
| # Override chat_fixtures.preserve_app_state to avoid early app import | ||
| yield | ||
|
|
||
|
|
||
| @pytest.fixture(autouse=True) | ||
| def reset_app_overrides(): | ||
| # Override chat_fixtures.reset_app_overrides for these tests | ||
| yield |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Consolidate duplicate fixtures into a shared conftest.
The preserve_app_state and reset_app_overrides fixtures are identical across test_items_and_outputs_api.py, test_outputs_templates_api.py, and test_reading_highlights_api.py. This violates DRY principles and makes maintenance harder.
Consider moving these fixtures to a shared conftest.py file in the Collections directory:
Create tldw_Server_API/tests/Collections/conftest.py:
import pytest
@pytest.fixture(scope="session", autouse=True)
def preserve_app_state():
# Override chat_fixtures.preserve_app_state to avoid early app import
yield
@pytest.fixture(autouse=True)
def reset_app_overrides():
# Override chat_fixtures.reset_app_overrides for these tests
yieldThen remove these fixtures from all three test files.
🤖 Prompt for AI Agents
In tldw_Server_API/tests/Collections/test_items_and_outputs_api.py lines 18-27
the fixtures preserve_app_state and reset_app_overrides are duplicated across
multiple test files; move these two fixtures into a shared
tldw_Server_API/tests/Collections/conftest.py (define the same fixtures there
with same scopes and autouse settings) and then remove the duplicate fixture
definitions from test_items_and_outputs_api.py, test_outputs_templates_api.py,
and test_reading_highlights_api.py so tests pick them up from the common
conftest.
tldw_Server_API/tests/Collections/test_items_and_outputs_api.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tldw_Server_API/tests/Evaluations/integration/test_api_endpoints.py (1)
296-298: Fix incorrect test marker.Line 296 marks the class with
@pytest.mark.unit, but the docstring at line 298 states "Integration tests for batch evaluation." This inconsistency could cause the tests to be incorrectly categorized or skipped.Since the module already has
pytestmark = [pytest.mark.integration, pytest.mark.evaluations]at line 12, you can either:Option 1 (recommended): Remove the decorator entirely and rely on the module-level marker:
-@pytest.mark.unit class TestBatchEvaluationEndpoint: """Integration tests for batch evaluation."""Option 2: Change to integration marker if you want it explicit:
-@pytest.mark.unit +@pytest.mark.integration class TestBatchEvaluationEndpoint: """Integration tests for batch evaluation."""tldw_Server_API/app/main.py (1)
3090-3193: Avoid double registration ofchat_routerin MINIMAL_TEST_APP and consider tightening minimal collections importsWithin the MINIMAL_TEST_APP routing block,
chat_routeris included twice:
- Line 3095:
app.include_router(chat_router, prefix=f"{API_V1_PREFIX}/chat")- Line 3123:
app.include_router(chat_router, prefix=f"{API_V1_PREFIX}/chat", tags=["chat"])This registers the same router twice on the same prefix, which is unnecessary and can lead to duplicate path operations in the app/router list and OpenAPI schema.
You can safely drop the first inclusion and rely on the tagged one:
@@ - # Include lightweight chat/character routes needed by tests - app.include_router(chat_router, prefix=f"{API_V1_PREFIX}/chat") - app.include_router(character_router, prefix=f"{API_V1_PREFIX}/characters", tags=["characters"]) + # Include lightweight chat/character routes needed by tests + app.include_router(character_router, prefix=f"{API_V1_PREFIX}/characters", tags=["characters"]) app.include_router(character_chat_sessions_router, prefix=f"{API_V1_PREFIX}/chats", tags=["character-chat-sessions"]) app.include_router(character_messages_router, prefix=f"{API_V1_PREFIX}", tags=["character-messages"])For the new minimal “collections” routers (
outputs_templates_router,outputs_router,reading_highlights_router,items_router):
- The try/except wrappers are consistent with the rest of the module, but they catch bare
Exception. If you want to align more closely with Ruff BLE001 and avoid swallowing unexpected runtime errors from those modules, consider catchingImportErrorinstead, while still logging and skipping missing optional endpoints.
🧹 Nitpick comments (6)
tldw_Server_API/tests/Notes_NEW/integration/test_notes_graph_rbac.py (2)
2-2: Remove unused import.The
importlibmodule is imported but never used in this test file.Apply this diff to remove the unused import:
-import importlib - import pytest
33-57: Improve stub consistency with actual implementation.The
_StubChaChaDB.create_manual_note_edgemethod has some inconsistencies with the actual implementation:
- The
metadataparameter uses typeobject, but the actual implementation usesOptional[Dict[str, Any]]- The return dict uses
"id"as the key instead of"edge_id"- Missing fields:
"created_at"and"type"While this stub may work for current tests, aligning it more closely with the actual implementation would improve maintainability and prevent issues if tests evolve to check response structure more thoroughly.
Apply this diff to improve consistency:
+from typing import Optional, Dict, Any + class _StubChaChaDB: def __init__(self, user_id: int) -> None: self.client_id = str(user_id) def create_manual_note_edge( self, *, user_id: str, from_note_id: str, to_note_id: str, directed: bool, weight: float, - metadata: object, + metadata: Optional[Dict[str, Any]], created_by: str, ) -> dict: return { - "id": "edge:test", + "edge_id": "edge:test", "user_id": user_id, "from_note_id": from_note_id, "to_note_id": to_note_id, + "type": "manual", "directed": directed, "weight": weight, + "created_at": "2025-01-01T00:00:00Z", "metadata": metadata, "created_by": created_by, }tldw_Server_API/tests/Evaluations/integration/test_ocr_pdf_points_backend_sglang_accuracy.py (1)
99-100: Consider using try-finally or a fixture for guaranteed cleanup.If an assertion fails before line 99, the dependency overrides won't be cleaned up, which could affect subsequent tests. While the app import pattern might mitigate this, using a try-finally block or a pytest fixture with yield would guarantee cleanup even when assertions fail.
Example with try-finally:
app.dependency_overrides[auth_deps.get_rate_limiter_dep] = _get_rl + try: - import pymupdf + import pymupdf - doc = pymupdf.open() + doc = pymupdf.open() # ... rest of test code ... - assert item["coverage"] >= thresholds["min_coverage"] - 1e-6 + assert item["coverage"] >= thresholds["min_coverage"] - 1e-6 + finally: + app.dependency_overrides.pop(eval_mod.verify_api_key, None) + app.dependency_overrides.pop(auth_deps.get_rate_limiter_dep, None) - - app.dependency_overrides.pop(eval_mod.verify_api_key, None) - app.dependency_overrides.pop(auth_deps.get_rate_limiter_dep, None)tldw_Server_API/tests/Evaluations/integration/test_ocr_pdf_dots_backend_integration.py (1)
13-103: Consider using try/finally for dependency override cleanup.The dependency overrides at lines 34 and 46 are cleaned up at lines 102-103, but if an exception occurs before that cleanup (e.g., during PDF generation or the API call), the overrides will persist and could affect subsequent tests.
Apply this pattern to ensure cleanup always happens:
def test_ocr_pdf_endpoint_with_dots_backend_integration(monkeypatch): # Skip unless dots_ocr is importable pytest.importorskip("dots_ocr") # Import app and endpoint module from tldw_Server_API.app.main import app from tldw_Server_API.app.api.v1.endpoints import evaluations_unified as eval_mod from tldw_Server_API.app.core.Evaluations.unified_evaluation_service import ( UnifiedEvaluationService, ) # Create a temporary DB for the service to avoid polluting the repo DB with tempfile.NamedTemporaryFile(suffix="_eval_test.db", delete=True) as dbf: service = UnifiedEvaluationService(db_path=dbf.name) # Force the endpoint module to use our service instance eval_mod._evaluation_service = service - # Override auth to bypass API key verification - async def _ok(*args, **kwargs) -> str: - return "test_user" - - app.dependency_overrides[eval_mod.verify_api_key] = _ok - - # Override rate limiter to always allow - from tldw_Server_API.app.api.v1.API_Deps import auth_deps - - class _DummyRateLimiter: - async def check_rate_limit(self, *args, **kwargs): - return True, {"retry_after": 0} - - async def _get_rl() -> Any: - return _DummyRateLimiter() - - app.dependency_overrides[auth_deps.get_rate_limiter_dep] = _get_rl - - # Generate a tiny single-page PDF in-memory with known text - import pymupdf - - doc = pymupdf.open() - page = doc.new_page() - page.insert_text((72, 72), "HELLO") - pdf_bytes = doc.tobytes() - doc.close() - - files = [ - ("files", ("test.pdf", pdf_bytes, "application/pdf")), - ] - - data = { - # Use JSON field to avoid multi-part list parsing complexities - "ground_truths_json": json.dumps(["HELLO"]), - "enable_ocr": "true", - "ocr_backend": "dots", - "ocr_mode": "always", - "ocr_dpi": "200", - "ocr_lang": "eng", - # keep metrics default - } - - # Exercise the endpoint - with TestClient(app) as client: - # Obtain CSRF cookie if middleware is active - headers = {} - try: - r0 = client.get("/api/v1/health") - token = r0.cookies.get("csrf_token") - if token: - headers["X-CSRF-Token"] = token - except Exception: - pass - - r = client.post( - "/api/v1/evaluations/ocr-pdf", - files=files, - data=data, - headers=headers, - ) - - assert r.status_code == 200, r.text - body = r.json() - assert isinstance(body, dict) - assert "evaluation_id" in body - assert "results" in body and isinstance(body["results"], dict) - # results["results"] should contain one entry for our single PDF - inner = body.get("results", {}) - assert isinstance(inner.get("results", []), list) - assert len(inner.get("results", [])) == 1 - - # Cleanup dependency overrides - app.dependency_overrides.pop(eval_mod.verify_api_key, None) - app.dependency_overrides.pop(auth_deps.get_rate_limiter_dep, None) + # Override auth to bypass API key verification + async def _ok(*args, **kwargs) -> str: + return "test_user" + + # Override rate limiter to always allow + from tldw_Server_API.app.api.v1.API_Deps import auth_deps + + class _DummyRateLimiter: + async def check_rate_limit(self, *args, **kwargs): + return True, {"retry_after": 0} + + async def _get_rl() -> Any: + return _DummyRateLimiter() + + try: + app.dependency_overrides[eval_mod.verify_api_key] = _ok + app.dependency_overrides[auth_deps.get_rate_limiter_dep] = _get_rl + + # Generate a tiny single-page PDF in-memory with known text + import pymupdf + + doc = pymupdf.open() + page = doc.new_page() + page.insert_text((72, 72), "HELLO") + pdf_bytes = doc.tobytes() + doc.close() + + files = [ + ("files", ("test.pdf", pdf_bytes, "application/pdf")), + ] + + data = { + # Use JSON field to avoid multi-part list parsing complexities + "ground_truths_json": json.dumps(["HELLO"]), + "enable_ocr": "true", + "ocr_backend": "dots", + "ocr_mode": "always", + "ocr_dpi": "200", + "ocr_lang": "eng", + # keep metrics default + } + + # Exercise the endpoint + with TestClient(app) as client: + # Obtain CSRF cookie if middleware is active + headers = {} + try: + r0 = client.get("/api/v1/health") + token = r0.cookies.get("csrf_token") + if token: + headers["X-CSRF-Token"] = token + except Exception: + pass + + r = client.post( + "/api/v1/evaluations/ocr-pdf", + files=files, + data=data, + headers=headers, + ) + + assert r.status_code == 200, r.text + body = r.json() + assert isinstance(body, dict) + assert "evaluation_id" in body + assert "results" in body and isinstance(body["results"], dict) + # results["results"] should contain one entry for our single PDF + inner = body.get("results", {}) + assert isinstance(inner.get("results", []), list) + assert len(inner.get("results", [])) == 1 + finally: + # Cleanup dependency overrides + app.dependency_overrides.pop(eval_mod.verify_api_key, None) + app.dependency_overrides.pop(auth_deps.get_rate_limiter_dep, None)tldw_Server_API/tests/Evaluations/integration/test_api_endpoints.py (1)
26-26: Remove redundant@pytest.mark.integrationdecorators.The class-level
@pytest.mark.integrationdecorators at lines 26, 244, 410, 466, 572, and 615 are redundant since the module-levelpytestmarkat line 12 already applies theintegrationmarker to all tests in this file.Apply this diff to remove the redundant decorators:
-@pytest.mark.integration class TestGEvalEndpoint:-@pytest.mark.integration class TestResponseQualityEndpoint:-@pytest.mark.integration class TestEvaluationHistoryEndpoint:-@pytest.mark.integration class TestWebhookEndpoints:-@pytest.mark.integration class TestRateLimitEndpoints:-@pytest.mark.integration class TestErrorHandling:Also applies to: 244-244, 410-410, 466-466, 572-572, 615-615
tldw_Server_API/app/main.py (1)
506-538: ULTRA_MINIMAL_APP still imports and exposes research/paper routes; verify intended behaviorThe
if _MINIMAL_TEST_APP and not _ULTRA_MINIMAL_APP:block means that whenULTRA_MINIMAL_APP=TrueandMINIMAL_TEST_APP=False, theelsebranch will still importresearch_router,paper_search_router, andprivileges_router, and later_include_if_enabled("research"/"paper-search", ...)will register those routes.This appears to conflict with the earlier “Ultra-minimal diagnostic mode: only import health endpoints” contract. If ULTRA mode is meant to truly expose only health endpoints (for fastest/least-side-effect diagnostics), consider gating this block (and the associated route inclusion) on
not _ULTRA_MINIMAL_APPas well, or adding a dedicatedif _ULTRA_MINIMAL_APP: ... else:structure here.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
tldw_Server_API/app/main.py(5 hunks)tldw_Server_API/tests/Evaluations/TEST_STRATEGY.md(1 hunks)tldw_Server_API/tests/Evaluations/integration/test_admin_cleanup.py(2 hunks)tldw_Server_API/tests/Evaluations/integration/test_api_endpoints.py(1 hunks)tldw_Server_API/tests/Evaluations/integration/test_batch_propositions_ocr.py(1 hunks)tldw_Server_API/tests/Evaluations/integration/test_ocr_pdf_dots_backend_integration.py(1 hunks)tldw_Server_API/tests/Evaluations/integration/test_ocr_pdf_dots_backend_vllm_accuracy.py(1 hunks)tldw_Server_API/tests/Evaluations/integration/test_ocr_pdf_points_backend_sglang_accuracy.py(1 hunks)tldw_Server_API/tests/Evaluations/integration/test_rate_limits_endpoint.py(1 hunks)tldw_Server_API/tests/Notes_NEW/integration/test_notes_graph_rbac.py(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- tldw_Server_API/tests/Evaluations/integration/test_rate_limits_endpoint.py
🧰 Additional context used
🧬 Code graph analysis (3)
tldw_Server_API/tests/Evaluations/integration/test_batch_propositions_ocr.py (2)
tldw_Server_API/tests/Evaluations/integration/test_admin_cleanup.py (1)
_setup_env(12-15)tldw_Server_API/tests/Evaluations/integration/test_rate_limits_endpoint.py (1)
_setup_env(12-15)
tldw_Server_API/tests/Notes_NEW/integration/test_notes_graph_rbac.py (3)
tldw_Server_API/app/core/AuthNZ/jwt_service.py (3)
JWTService(33-465)reset_jwt_service(479-781)reset_jwt_service(800-803)tldw_Server_API/app/api/v1/API_Deps/ChaCha_Notes_DB_Deps.py (1)
get_chacha_db_for_user(211-339)tldw_Server_API/app/core/DB_Management/ChaChaNotes_DB.py (1)
create_manual_note_edge(3254-3369)
tldw_Server_API/tests/Evaluations/integration/test_admin_cleanup.py (2)
tldw_Server_API/tests/Evaluations/integration/test_batch_propositions_ocr.py (1)
_setup_env(12-15)tldw_Server_API/tests/Evaluations/integration/test_rate_limits_endpoint.py (1)
_setup_env(12-15)
🪛 Ruff (0.14.4)
tldw_Server_API/app/main.py
3177-3177: Do not catch blind exception: Exception
(BLE001)
3182-3182: Do not catch blind exception: Exception
(BLE001)
3187-3187: Do not catch blind exception: Exception
(BLE001)
3192-3192: Do not catch blind exception: Exception
(BLE001)
3380-3380: Do not catch blind exception: Exception
(BLE001)
3390-3390: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (11)
tldw_Server_API/tests/Evaluations/TEST_STRATEGY.md (1)
259-273: Verify consistency between documented gating approach and actual implementation.The new "Evaluations Suite Gating" section references environment variables (
RUN_EVALUATIONS,MINIMAL_TEST_APP) and a marker (@pytest.mark.evaluations), but the PR's enriched summary indicates the actual implementation introduces a--run-slowCLI flag and arate_limitsmarker viaconftest.py.Please clarify and verify:
- Are the documented env vars (
RUN_EVALUATIONS,MINIMAL_TEST_APP) the actual mechanism, or should the docs reference--run-slowinstead?- Should the documented marker be
@pytest.mark.evaluationsor@pytest.mark.rate_limits?- If
@pytest.mark.evaluationsis correct, it should be added to the Test Markers list (lines 244–257) for consistency.tldw_Server_API/tests/Evaluations/integration/test_admin_cleanup.py (1)
8-9: Good addition of module-level test markers.The
pytestmarkdeclaration properly categorizes this integration test module, enabling selective test execution with-m integrationor-m evaluations. This aligns well with the PR's objective of organizing E2E and integration tests.tldw_Server_API/tests/Evaluations/integration/test_ocr_pdf_dots_backend_vllm_accuracy.py (1)
8-9: LGTM! Clean refactoring to module-level markers.The refactoring from per-test decorators to module-level
pytestmarkis correct and aligns with the PR's broader test marker reorganization. Since this file contains a single test, the behavior is semantically equivalent while improving consistency across the test suite.tldw_Server_API/tests/Evaluations/integration/test_batch_propositions_ocr.py (2)
14-15: LGTM! Consistent quote style with other test files.The change from single to double quotes aligns with the quote style used in other integration test files (e.g., test_admin_cleanup.py, test_rate_limits_endpoint.py).
8-9: LGTM! Module-level markers properly registered.The pytestmark applies
integrationandevaluationsmarkers to all tests in this module. Both markers are already registered in the pytest configuration:
integrationmarker:./tldw_Server_API/tests/Evaluations/conftest.pyevaluationsmarker:./tldw_Server_API/tests/conftest.pyThe code change is correct and aligns with the test reorganization objectives.
tldw_Server_API/tests/Evaluations/integration/test_ocr_pdf_points_backend_sglang_accuracy.py (2)
8-9: LGTM: Module-level markers improve consistency.The use of module-level
pytestmarkis a good practice that aligns this test with the pattern used in other Evaluations integration tests.
11-20: Helper function structure looks good.The
_sglang_available()function appropriately checks endpoint availability, and importinghttpxinside the function is acceptable for optional dependencies.tldw_Server_API/tests/Evaluations/integration/test_ocr_pdf_dots_backend_integration.py (1)
10-11: LGTM! Module-level markers correctly applied.The module-level
pytestmarkwith bothintegrationandevaluationsmarkers aligns with the PR's goal to properly categorize evaluation tests.tldw_Server_API/tests/Evaluations/integration/test_api_endpoints.py (1)
1-12: LGTM! Module-level markers and documentation updated correctly.The docstring now clearly indicates these are evaluation tests that require opt-in, and the
pytestmarklist properly includes bothintegrationandevaluationsmarkers.tldw_Server_API/app/main.py (2)
3357-3395: Guarded_include_if_enabledcalls for optional routers look correctThe new guards like:
if 'chat_router' in locals(): _include_if_enabled("chat", chat_router, prefix=f"{API_V1_PREFIX}/chat") ... if 'chunking_templates_router' in locals(): _include_if_enabled("chunking-templates", chunking_templates_router, prefix=f"{API_V1_PREFIX}", tags=["chunking-templates"]) ... if 'embeddings_router' in locals(): _include_if_enabled("embeddings", embeddings_router, prefix=f"{API_V1_PREFIX}", tags=["embeddings"]) ... if 'claims_router' in locals(): _include_if_enabled("claims", claims_router, prefix=f"{API_V1_PREFIX}")are a good safety improvement: they prevent
NameErrorwhen those routers weren’t imported in ULTRA/minimal deployments while preserving previous behavior when they are present.Using
locals()at module scope is fine here (equivalent toglobals()in practice); no functional issues spotted.
3420-3424: Notes graph routing order fix is appropriateIncluding the notes-graph router before the generic notes router to avoid
/graphbeing captured by a/{note_id}route is the right ordering:if _HAS_NOTES_GRAPH: _include_if_enabled("notes", notes_graph_router, prefix=f"{API_V1_PREFIX}/notes", tags=["notes"]) _include_if_enabled("notes", notes_router, prefix=f"{API_V1_PREFIX}/notes", tags=["notes"])This should prevent shadowing while keeping route gating consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tldw_Server_API/tests/conftest.py (1)
37-56: RUN_EVALUATIONS / MINIMAL_TEST_APP gating looks good, with a small robustness tweak possibleThe logic that:
- disables Evaluations routes and prefers a minimal app profile by default, and
- re-enables Evaluations routes and forces
MINIMAL_TEST_APP=0whenRUN_EVALUATIONSis setis consistent and matches the intent to keep heavy routes and imports out of general test runs.
One small edge case:
ROUTES_DISABLEmembership checks currently use substring search ("research" not in existing_disable,"evaluations" not in ",".join([_rd])). If you ever allow values where these names appear as substrings of other tokens, you could switch to token-based checks to avoid false positives. For example:- existing_disable = os.getenv("ROUTES_DISABLE", "") - if "research" not in existing_disable: - os.environ["ROUTES_DISABLE"] = (existing_disable + ",research").strip(",") + existing_disable = os.getenv("ROUTES_DISABLE", "") + disable_tokens = [p.strip().lower() for p in existing_disable.split(",") if p] + if "research" not in disable_tokens: + os.environ["ROUTES_DISABLE"] = (existing_disable + ",research").strip(",") @@ - if "evaluations" not in ",".join([_rd]): - os.environ["ROUTES_DISABLE"] = ((_rd + ",evaluations").strip(",")) + disable_tokens = [p.strip().lower() for p in _rd.split(",") if p] + if "evaluations" not in disable_tokens: + os.environ["ROUTES_DISABLE"] = ((_rd + ",evaluations").strip(","))Purely optional, but it makes the behavior more explicit if the list ever grows more complex.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tldw_Server_API/tests/conftest.py(3 hunks)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
tldw_Server_API/tests/conftest.py (3)
7-12: Clear rationale for centralizingpytest_pluginsThe expanded docstring nicely explains why shared plugins live in this top-level conftest and will help future maintainers understand the layout; no changes needed.
93-100: Suite-wide plugin registration is well-structuredDeclaring
pytest_pluginshere to cover the shared E2E/chat/media/postgres fixtures gives a single, obvious place for plugin wiring and avoids per-suite duplication; this structure looks solid.
233-290:bypass_api_limitsfixture correctly scopes and restores middleware & limiter stateThe fixture cleanly:
- Forces test-friendly env flags,
- Temporarily strips
RGSimpleMiddlewarefromapp.user_middlewareand rebuilds the stack, and- Disables any passed limiters while recording their prior
enabledstate for restoration infinally.Combined with
monkeypatch, this keeps side effects scoped to the test and avoids leaking changes across tests. The pattern looks appropriate for exercising rate-limited endpoints in E2E flows.
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tldw_Server_API/app/core/Ingestion_Media_Processing/Upload_Sink.py (2)
893-926: Strong defense-in-depth approach for HTML sanitization.The preprocessing step to remove script/style/noscript tags and comments before Bleach cleaning is excellent. Using BeautifulSoup as the primary approach with regex fallback provides good coverage.
However, consider logging exceptions instead of silently catching them. While graceful degradation is important for sanitization code, logging would help detect issues:
for tag_name in ('script', 'style', 'noscript'): for t in soup.find_all(tag_name): try: t.decompose() - except Exception: + except Exception as e: + logging.debug(f"decompose failed for {tag_name}, trying extract: {e}") try: t.extract() - except Exception: + except Exception as e2: + logging.warning(f"Failed to remove {tag_name} tag: {e2}") passBased on static analysis hints.
958-968: Consider simplifying fallback logic for clarity.The current fallback logic works but relies on checking if
preprocessedis in scope via try-except, which can be confusing. Consider initializingpreprocessedat the function start for clearer flow:def sanitize_html_content(self, html_content: str, config: Optional[Dict] = None) -> str: # Sanitizer using bleach with restricted protocols; otherwise strip tags. # Explicitly remove script/style/noscript contents and comments before cleaning to prevent code leakage. + preprocessed = html_content # Initialize with original content try: - preprocessed = html_content try: from bs4 import BeautifulSoup, Comment # type: ignore ... except Exception as e: logging.warning(f"Bleach not available or failed ({e}); falling back to tag stripping.") try: - # Prefer the preprocessed (script/style/comments removed) content if available - try: - _pre = preprocessed # type: ignore[name-defined] - except Exception: - _pre = html_content - return re.sub(r"<[^>]+>", " ", _pre) + return re.sub(r"<[^>]+>", " ", preprocessed) except Exception: - # As a last resort, return the original content return html_contentThis makes it explicit that
preprocessedis always defined and eliminates the nested try-except for variable access.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tldw_Server_API/app/core/Ingestion_Media_Processing/Upload_Sink.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.4)
tldw_Server_API/app/core/Ingestion_Media_Processing/Upload_Sink.py
904-904: Do not catch blind exception: Exception
(BLE001)
907-908: try-except-pass detected, consider logging the exception
(S110)
907-907: Do not catch blind exception: Exception
(BLE001)
913-914: try-except-pass detected, consider logging the exception
(S110)
913-913: Do not catch blind exception: Exception
(BLE001)
916-916: Do not catch blind exception: Exception
(BLE001)
924-924: Do not catch blind exception: Exception
(BLE001)
952-954: try-except-pass detected, consider logging the exception
(S110)
952-952: Do not catch blind exception: Exception
(BLE001)
956-956: Consider moving this statement to an else block
(TRY300)
957-957: Do not catch blind exception: Exception
(BLE001)
963-963: Do not catch blind exception: Exception
(BLE001)
966-966: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
tldw_Server_API/app/core/Ingestion_Media_Processing/Upload_Sink.py (2)
928-931: LGTM: Expanded allowed tags support richer content safely.The addition of header elements (h1-h6), blockquote, code, and pre tags allows for richer HTML content while maintaining security. These semantic and formatting tags don't introduce risks when combined with the preprocessing that removes scripts and the attribute restrictions.
917-925: Regex fallback patterns verified and working correctly.Verification confirms all edge cases pass, including closing tags with spaces (
</script >), case variations, and tags with attributes. The regex patterns are properly constructed to handle common XSS vectors as a fallback when BeautifulSoup is unavailable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
tldw_Server_API/app/core/AuthNZ/database.py (1)
38-76: Excellent consolidation of duplicated fallback logic!The new
_apply_single_user_fallbackfunction successfully eliminates the duplication flagged in the previous review. The defensive error handling is appropriate for ensuring robustness in this critical configuration path.Minor optimization: The local import on lines 49-50 is redundant since
get_settingsis already imported at the module level (line 20). Consider simplifying:- # Local import form retained for defensive use in non-pooled contexts. - from tldw_Server_API.app.core.AuthNZ.settings import ( # type: ignore - get_settings as _get_settings, - ) - - auth_mode_value = getattr(_get_settings(), "AUTH_MODE", "single_user") + auth_mode_value = getattr(get_settings(), "AUTH_MODE", "single_user")tldw_Server_API/app/api/v1/endpoints/media.py (2)
116-117: Dynamic search rate limit wiring is reasonable; consider future configurabilityDefining
_SEARCH_RATE_LIMITonce at import using_is_test_mode()and consuming it in@limiter.limit(_SEARCH_RATE_LIMIT)cleanly separates test vs non-test throttling and should avoid flaky 429s under load. If you later need per-environment tuning without redeploy, consider making this come from config instead of being evaluated at import time.Also applies to: 2405-2407
8691-9143: Redirect/URL download helper: good policy surface; minor simplifications possibleThe expanded
_download_url_asynclogic:
- cleanly adds
allow_redirectsand environment-driven redirect policy (HTTP_MAX_REDIRECTS,HTTP_ALLOW_REDIRECTS,HTTP_ALLOW_CROSS_HOST_REDIRECTS,HTTP_ALLOW_SCHEME_DOWNGRADE),- centralizes manual redirect resolution with scheme-downgrade and cross-host guards,
- keeps a clean separation between “caller-supplied client” vs “internal client” cases,
- and removes the previous
get_respTDZ bug by delegating body download back through_download_url_async.Two observations:
Redundant HEAD-path filename logic when
client is None
In theclient is Nonebranch you fully derivecandidate_name/target_pathfrom the lightweight_m_afetchresponse, then immediately call_download_url_asyncagain with a new client, which recomputes naming from scratch. That first derivation and the localtarget_pathare effectively dead code except for cleanup in the outerexcept. You could either:
- reuse
candidate_name/effective_suffixin the recursive call (e.g., pass a seed), or- drop the HEAD-derived naming and rely solely on the inner call, keeping the HEAD probe only for content-type checks.
Exception messages around redirect policy are quite verbose
SeveralValueErrorraises embed long user-facing messages (e.g., redirect policy explanations). If these bubble to API responses, they may be more detail than you want to expose and are harder to localize/standardize. Consider raising with shorter, standardized messages and logging the detailed reason.Neither is a blocker, but tightening these would simplify the function and address some of the static-analysis noise.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
Docs/Published/API-related/Chunking_API_Documentation.md(1 hunks)tldw_Server_API/app/api/v1/endpoints/media.py(17 hunks)tldw_Server_API/app/core/AuthNZ/database.py(3 hunks)tldw_Server_API/tests/Collections/conftest.py(1 hunks)tldw_Server_API/tests/Collections/test_items_and_outputs_api.py(1 hunks)tldw_Server_API/tests/Collections/test_outputs_templates_api.py(1 hunks)tldw_Server_API/tests/Collections/test_reading_highlights_api.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tldw_Server_API/tests/Collections/test_items_and_outputs_api.py (3)
tldw_Server_API/tests/Collections/test_outputs_templates_api.py (2)
client_with_user(11-23)override_user(12-13)tldw_Server_API/tests/Collections/test_reading_highlights_api.py (2)
client_with_user(11-23)override_user(12-13)tldw_Server_API/tests/Watchlists/test_watchlists_api.py (2)
client_with_user(18-32)override_user(19-20)
tldw_Server_API/app/api/v1/endpoints/media.py (4)
tldw_Server_API/app/core/DB_Management/backends/base.py (6)
BackendType(27-31)transaction(348-358)connection(283-285)backend(92-94)backend_type(330-332)DatabaseError(17-19)tldw_Server_API/app/core/DB_Management/Media_DB_v2.py (4)
get(155-156)transaction(1313-1399)execute_query(1054-1158)DatabaseError(100-102)tldw_Server_API/app/core/Utils/metadata_utils.py (1)
normalize_safe_metadata(18-82)tldw_Server_API/app/core/DB_Management/backends/postgresql_backend.py (3)
transaction(657-696)connection(202-207)backend_type(237-239)
tldw_Server_API/app/core/AuthNZ/database.py (2)
tldw_Server_API/app/core/AuthNZ/settings.py (1)
get_settings(844-883)tldw_Server_API/tests/DB_Management/test_media_db_fallback_warning.py (2)
warning(15-19)warning(72-73)
🪛 markdownlint-cli2 (0.18.1)
Docs/Published/API-related/Chunking_API_Documentation.md
271-271: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
322-322: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.4)
tldw_Server_API/app/api/v1/endpoints/media.py
1443-1443: Do not catch blind exception: Exception
(BLE001)
1553-1553: Do not catch blind exception: Exception
(BLE001)
1589-1590: try-except-pass detected, consider logging the exception
(S110)
1589-1589: Do not catch blind exception: Exception
(BLE001)
1759-1759: Do not catch blind exception: Exception
(BLE001)
1795-1796: try-except-pass detected, consider logging the exception
(S110)
1795-1795: Do not catch blind exception: Exception
(BLE001)
8739-8740: Abstract raise to an inner function
(TRY301)
8739-8740: Avoid specifying long messages outside the exception class
(TRY003)
8816-8816: Do not catch blind exception: Exception
(BLE001)
8836-8837: try-except-pass detected, consider logging the exception
(S110)
8836-8836: Do not catch blind exception: Exception
(BLE001)
8841-8841: Avoid specifying long messages outside the exception class
(TRY003)
8844-8844: Avoid specifying long messages outside the exception class
(TRY003)
8847-8847: Do not catch blind exception: Exception
(BLE001)
8850-8850: Do not catch blind exception: Exception
(BLE001)
8851-8851: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
8851-8851: Avoid specifying long messages outside the exception class
(TRY003)
8853-8853: Avoid specifying long messages outside the exception class
(TRY003)
8856-8856: Avoid specifying long messages outside the exception class
(TRY003)
8864-8865: try-except-pass detected, consider logging the exception
(S110)
8864-8864: Do not catch blind exception: Exception
(BLE001)
8881-8882: Abstract raise to an inner function
(TRY301)
8881-8882: Avoid specifying long messages outside the exception class
(TRY003)
tldw_Server_API/app/core/AuthNZ/database.py
57-57: Do not catch blind exception: Exception
(BLE001)
63-63: Do not catch blind exception: Exception
(BLE001)
72-73: try-except-pass detected, consider logging the exception
(S110)
72-72: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (11)
tldw_Server_API/tests/Collections/conftest.py (1)
1-14: LGTM! Fixture consolidation addresses past review feedback.The creation of this conftest.py successfully consolidates the
preserve_app_stateandreset_app_overridesfixtures that were previously duplicated across multiple test files. The emptyyieldstatements are intentional—these autouse fixtures override thechat_fixturesversions to prevent early app imports, ensuring proper test isolation for Collections tests.tldw_Server_API/tests/Collections/test_items_and_outputs_api.py (2)
23-24: LGTM! Full app profile enables comprehensive E2E testing.The addition of
MINIMAL_TEST_APP="0"ensures the full application profile is loaded for Collections/outputs endpoint testing, which is essential for end-to-end validation of user workflows.
26-51: LGTM! Proper fixture cleanup and state restoration.The fixture correctly:
- Sets up isolated test directories for DB writes
- Preserves and restores previous settings state
- Uses try-finally to ensure cleanup even on test failure
- Handles both cases where
prev_base_direxists or nottldw_Server_API/tests/Collections/test_reading_highlights_api.py (1)
15-23: LGTM! Consistent E2E test pattern with proper app initialization.The fixture correctly:
- Enables full app profile with
MINIMAL_TEST_APP="0"- Defers app import until after environment configuration
- Uses a distinct user ID (321) for test isolation
- Properly cleans up dependency overrides
tldw_Server_API/tests/Collections/test_outputs_templates_api.py (1)
15-23: LGTM! Consistent E2E test setup across Collections tests.The fixture follows the same pattern established in other Collections tests:
- Enables full app profile with
MINIMAL_TEST_APP="0"- Defers app import to respect environment configuration
- Applies user authentication override (User id=123)
- Ensures proper cleanup of dependency overrides
tldw_Server_API/app/core/AuthNZ/database.py (2)
136-144: LGTM! Fallback now applied in a single location.The initialization correctly applies
_apply_single_user_fallbackbefore passing the URL to_resolve_sqlite_paths, eliminating the duplication. The explicitauth_modeparameter ensures the fallback logic uses the correct configuration.
175-181: LGTM! Clear documentation of the updated contract.The updated docstring accurately documents that this method expects a URL that has already been processed by the single-user fallback logic. This clarity will help future maintainers understand the flow.
tldw_Server_API/app/api/v1/endpoints/media.py (4)
95-95: BackendType import usage looks correct and consistentImporting
BackendTypehere aligns with its use in backend-aware identifier upsert logic below; no issues.
1309-1318: Canonical identifier key normalization in metadata search is solidUsing
canonical_orderandnormalize_safe_metadatato prefer lowercase keys (doi,pmid,pmcid,arxiv_id,s2_paper_id) ensures filters query the canonical columns and behave consistently regardless of input casing/aliasing.
1437-1483: Identifier upsert inpatch_metadatais backend-aware with good error handlingThe updated in-place metadata patch path:
- bumps
versionandlast_modifiedinside a singledb.transaction()connection,- uses backend-aware SQL (
ON CONFLICTfor PostgreSQL,INSERT OR REPLACEfor SQLite-like backends),- and narrows the “skip index” path to
(sqlite3.OperationalError, DatabaseError)with a debug log, while treating other exceptions as real errors.This is a good balance between robustness and diagnosability.
1642-1645: Canonical ordering inget_by_identifiermatches metadata search behaviorReusing the same
canonical_orderapproach here keeps query semantics aligned with/metadata-search, which should reduce edge-case mismatches between these two lookup APIs.
| Discover runtime methods and defaults. | ||
| Example response (trimmed): | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add json language specifier to capabilities response code block.
Line 271 is missing a language identifier on the opening fence, violating markdown linting standard MD040. The block contains JSON response data and should be marked accordingly.
-```
+```json
{
"methods": ["words", "sentences", "paragraphs", "tokens", "semantic", "json", "xml", "ebook_chapters", "rolling_summarize", "structure_aware", "code", "code_ast"],🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
271-271: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In Docs/Published/API-related/Chunking_API_Documentation.md around line 271, the
fenced code block for the capabilities response is missing a language specifier
which triggers MD040; update the opening fence to include "json" (i.e., change
``` to ```json) so the block is properly highlighted and linted, ensuring the
JSON response block shows as JSON in rendered docs.
| ``` | ||
|
|
||
| Minimal OpenAPI schema (response) for stubbing | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add yaml language specifier to OpenAPI schema code block.
Line 322 is missing a language identifier on the opening fence, violating markdown linting standard MD040. The block contains an OpenAPI YAML schema and should be marked accordingly.
-```
+```yaml
openapi: 3.0.3🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
322-322: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In Docs/Published/API-related/Chunking_API_Documentation.md around line 322 the
fenced code block containing the OpenAPI YAML is missing a language specifier,
violating MD040; fix it by adding the yaml language identifier to the opening
fence (i.e., change the opening ``` to ```yaml) so the block is marked as YAML
and lints correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tldw_Server_API/app/api/v1/endpoints/media.py (2)
8876-8891: Don’t stream 4xx/5xx bodies; enforce status check before writingraise_for_status is currently swallowed, so 404/500 responses may be written to disk. Explicitly error on 4xx/5xx while preserving test stubs.
Apply:
- try: - get_resp.raise_for_status() - except Exception: - # Allow tests' minimal stubs that implement no-op raise_for_status - pass + # Enforce HTTP success while tolerating minimal stubs + if hasattr(get_resp, "raise_for_status"): + get_resp.raise_for_status() # Avoid hard dependency on status_code attribute for redirect detection in tests try: _sc = int(getattr(get_resp, "status_code", 200) or 200) except Exception: _sc = 200 + if 400 <= _sc: + raise ConnectionError(f"HTTP error {_sc} for {cur_url}") if 300 <= _sc < 400: # Should not happen because we resolved redirects just above, but guard anyway - raise ValueError( - f"Redirect encountered while downloading {cur_url}; redirects must be resolved pre-stream.") + raise ValueError(f"Redirect encountered while downloading {cur_url}; redirects must be resolved pre-stream.")
4964-4971: Fix attribute name fromoverlaptochunk_overlapat line 4970The form_data object inherits from ChunkingOptions which defines
chunk_overlap: int(notoverlap). Line 4970 references the non-existentform_data.overlapattribute, which will raiseAttributeErrorat runtime. The functiongenerate_embeddings_for_mediaexpects thechunk_overlapparameter.- chunk_overlap=form_data.overlap or 200 + chunk_overlap=form_data.chunk_overlap or 200
🧹 Nitpick comments (4)
tldw_Server_API/app/api/v1/endpoints/media.py (4)
1437-1483: Backend‑aware upsert + narrowed exceptions: nice hardeningCorrectly bumps version/last_modified, detects backend robustly, and avoids blind excepts. Consider extracting the repeated identifier upsert into a small helper to reduce duplication across endpoints.
1549-1593: Repeat of identifier upsert logic — factor outSame pattern as above; a shared helper like _upsert_identifiers(conn, dv_id, new_meta, db) would de‑duplicate and keep behavior consistent.
1758-1803: Repeat of identifier upsert logic — factor outSame refactor applies here for consistency and easier future changes.
8706-8714: Type hint nit: avoid FastAPI Path shadowingThis module imports FastAPI’s Path, so using Path in type hints can confuse readers/tools. Prefer the pathlib alias FilePath already used elsewhere.
Apply:
-async def _download_url_async( - client: Optional[httpx.AsyncClient], - url: str, - target_dir: Path, +async def _download_url_async( + client: Optional[httpx.AsyncClient], + url: str, + target_dir: FilePath, allowed_extensions: Optional[Set[str]] = None, # Use a Set for faster lookups check_extension: bool = True, # Flag to enable/disable check disallow_content_types: Optional[Set[str]] = None, # Optional set of content-types to reject for inference allow_redirects: bool = True, -) -> Path: +) -> FilePath:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tldw_Server_API/app/api/v1/endpoints/media.py(17 hunks)tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tldw_Server_API/app/api/v1/endpoints/media.py (5)
tldw_Server_API/app/core/DB_Management/backends/base.py (6)
BackendType(27-31)transaction(348-358)connection(283-285)backend(92-94)backend_type(330-332)DatabaseError(17-19)tldw_Server_API/app/core/DB_Management/Media_DB_v2.py (4)
get(155-156)transaction(1313-1399)execute_query(1054-1158)DatabaseError(100-102)tldw_Server_API/app/core/Utils/metadata_utils.py (1)
normalize_safe_metadata(18-82)tldw_Server_API/app/core/DB_Management/backends/postgresql_backend.py (3)
transaction(657-696)connection(202-207)backend_type(237-239)tldw_Server_API/app/core/DB_Management/backends/sqlite_backend.py (3)
transaction(237-262)connection(141-148)backend_type(180-182)
🪛 Ruff (0.14.4)
tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py
1028-1030: try-except-pass detected, consider logging the exception
(S110)
1028-1028: Do not catch blind exception: Exception
(BLE001)
1031-1033: try-except-pass detected, consider logging the exception
(S110)
1031-1031: Do not catch blind exception: Exception
(BLE001)
tldw_Server_API/app/api/v1/endpoints/media.py
1443-1443: Do not catch blind exception: Exception
(BLE001)
1553-1553: Do not catch blind exception: Exception
(BLE001)
1762-1762: Do not catch blind exception: Exception
(BLE001)
8747-8748: Abstract raise to an inner function
(TRY301)
8747-8748: Avoid specifying long messages outside the exception class
(TRY003)
8824-8824: Do not catch blind exception: Exception
(BLE001)
8844-8845: try-except-pass detected, consider logging the exception
(S110)
8844-8844: Do not catch blind exception: Exception
(BLE001)
8849-8849: Avoid specifying long messages outside the exception class
(TRY003)
8852-8852: Avoid specifying long messages outside the exception class
(TRY003)
8855-8855: Do not catch blind exception: Exception
(BLE001)
8858-8858: Do not catch blind exception: Exception
(BLE001)
8859-8859: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
8859-8859: Avoid specifying long messages outside the exception class
(TRY003)
8861-8861: Avoid specifying long messages outside the exception class
(TRY003)
8864-8864: Avoid specifying long messages outside the exception class
(TRY003)
8872-8873: try-except-pass detected, consider logging the exception
(S110)
8872-8872: Do not catch blind exception: Exception
(BLE001)
8889-8890: Abstract raise to an inner function
(TRY301)
8889-8890: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 140000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Full Suite (macos-latest / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.11)
- GitHub Check: Full Suite (Ubuntu / Python 3.12)
- GitHub Check: Full Suite (Ubuntu / Python 3.13)
- GitHub Check: Full Suite (windows-latest / Python 3.12)
- GitHub Check: Analyze (CodeQL) (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (6)
tldw_Server_API/app/core/Embeddings/Embeddings_Server/Embeddings_Create.py (2)
827-832: LGTM! Early attribute initialization prevents GC-time AttributeErrors.The explicit early initialization of
session,unload_timer,last_used_time, anddevice_providersensures that__del__can safely access these attributes even if initialization fails partway through. This addresses the timing issues flagged in previous reviews.
1021-1034: LGTM! Cleanup implementation correctly follows__del__best practices.This implementation properly addresses the finalizer issues raised in previous reviews by using a simple
__del__method that safely cancels the timer. The nestedtry-except-passblocks are appropriate here despite static analysis warnings because:
- Logging is unreliable during GC: The interpreter may be shutting down when
__del__runs, making logging facilities unavailable.- Broad exception handling is necessary: Any exception from
__del__can cause issues, so catchingExceptionand suppressing it is the correct pattern.- Session cleanup is intentionally omitted: ONNX Runtime's garbage collector handles session cleanup, as noted in previous reviews.
The static analysis warnings are false positives in this context.
tldw_Server_API/app/api/v1/endpoints/media.py (4)
116-118: Good fix: dynamic search rate limit at module scopeThis resolves the earlier decorator ordering SyntaxError and cleanly supports TEST_MODE. Looks good.
1309-1318: Canonical identifier normalization is solidPrefers lowercase keys in a deterministic order and reuses normalize_safe_metadata; reduces query skew.
2413-2415: Search endpoint limiter uses dynamic rate — goodDecorator now references _SEARCH_RATE_LIMIT and will be lenient in tests while staying tight in prod.
8703-8704: Expose DEFAULT_MAX_REDIRECTS via import aliasMakes redirect policy tunable and testable; good choice.
Title.
Point of this PR is to work through the end-to-end test workflows that simulate user workflows for common use-cases.
These act as advanced canaries for code quality/robustness, by ensuring that user workflows continue to work after changes to individual modules.
Summary by CodeRabbit
Tests
Documentation
New Features
Tweaks & Fixes