Skip to content

Possible enhancement for handling missing or invalid <title> tags in HTML front articles #463

@kazuhidelee

Description

@kazuhidelee

Inspired by issue 272. This issue proposes enhancing the robustness of how <title> tags are handled when generating ZIM files.

Currently, the parseAndAdaptHtml method extracts the <title> from HTML files, and if it's missing, it falls back to generating one from the URL. However, some titles may still be technically present but not meaningful (e.g. titles consisting only of symbols or whitespace), which could cause issues with tools like the Kiwix suggestion system.

Proposed Improvements:

  1. Detect when the <title> is missing or consists of meaningless content and provide a fallback mechanism that generates a human-readable title from the URL or file name.
  2. Log a warning when a fallback title is generated.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions