Inspired by issue 272. This issue proposes enhancing the robustness of how <title> tags are handled when generating ZIM files.
Currently, the parseAndAdaptHtml method extracts the <title> from HTML files, and if it's missing, it falls back to generating one from the URL. However, some titles may still be technically present but not meaningful (e.g. titles consisting only of symbols or whitespace), which could cause issues with tools like the Kiwix suggestion system.
Proposed Improvements:
- Detect when the <title> is missing or consists of meaningless content and provide a fallback mechanism that generates a human-readable title from the URL or file name.
- Log a warning when a fallback title is generated.