Refactor: Consolidate conversion code into marin/convert/ package #2008
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This major refactoring consolidates all HTML/Markdown conversion functionality into a single, cohesive
marin.convertpackage, resolving architectural issues and improving code organization.Changes
New Structure
marin/convert/package with clean public APIconfig.py- Extraction configs (ExtractionConfig, TrafilaturaConfig, etc.)html.py- HTML conversion functions (merged from web/convert.py + web/utils.py)markdown.py- Markdown conversion utilities_code_detection.py- Code language detection (internal)data/- Supporting files (model, xsl, languages.json)Removed
marin/schemas/web/- Moved to convert/config.pymarin/web/convert.py- Merged into convert/html.pymarin/web/utils.py- Merged into convert/html.pymarin/markdown/- Moved to convert/markdown.pyCleaned Up
marin/web/now only contains actual web utilities (rpv2.py, lookup_cc.py)Updated Imports
Updated 23 files across:
All imports now use:
from marin.convert import ...Benefits
Migration Guide
Old imports:
New imports:
Resolves discussion about marin/schemas organization and consolidates scattered conversion functionality.
Description
Fixes #(issue number)
[Please include a summary of the changes and the related issue.]
Checklist
uv run python infra/pre-commit.py --all-filesto lint/format your code