jsh9
diff --git a/‎CHANGELOG.md‎
Lines changed: 9 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/style_mismatch.md‎
Lines changed: 36 additions & 33 deletions b/‎docs/style_mismatch.md‎
Lines changed: 36 additions & 33 deletions
diff --git a/‎muff.toml‎
Lines changed: 3 additions & 0 deletions b/‎muff.toml‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎pydoclint/utils/parse_docstring.py‎
Lines changed: 129 additions & 33 deletions b/‎pydoclint/utils/parse_docstring.py‎
Lines changed: 129 additions & 33 deletions
@@ -1,5 +1,14 @@
 # Change Log
 
+## [0.8.1] - 2025-11-03
+
+- Changed
+  - The logic to detect docstring style mismatches, fixing a false positive
+    case where non-Sphinx style docstrings are detected as Sphinx style
+    (because there are some rST keywords in them)
+- Full diff
+  - https://github.com/jsh9/pydoclint/compare/0.8.0...0.8.1
+
 ## [0.8.0] - 2025-11-03
 
 - Added
 
@@ -7,11 +7,12 @@ ______________________________________________________________________
 **Table of Contents**
 
 - [1. How does _pydoclint_ detect the style of a docstring?](#1-how-does-pydoclint-detect-the-style-of-a-docstring)
-  - [1.1. Numpy-style pattern detection (enhanced detection)](#11-numpy-style-pattern-detection-enhanced-detection)
-  - [1.2. Fallback to size-based detection](#12-fallback-to-size-based-detection)
+  - [1.1. Keyword heuristics for each style](#11-keyword-heuristics-for-each-style)
+  - [1.2. Handling ambiguous or missing matches](#12-handling-ambiguous-or-missing-matches)
+  - [1.3. What happens after a mismatch is detected?](#13-what-happens-after-a-mismatch-is-detected)
 - [2. How accurate is this detection heuristic?](#2-how-accurate-is-this-detection-heuristic)
 - [3. Can I turn this off?](#3-can-i-turn-this-off)
-- [4. Is it much slower to parse a docstring in all 3 styles?](#4-is-it-much-slower-to-parse-a-docstring-in-all-3-styles)
+- [4. Is it much slower to parse a docstring with the heuristics?](#4-is-it-much-slower-to-parse-a-docstring-with-the-heuristics)
 - [5. What violation code is associated with style mismatch?](#5-what-violation-code-is-associated-with-style-mismatch)
 - [6. How to fix this violation code?](#6-how-to-fix-this-violation-code)
 
@@ -27,40 +28,40 @@ config option.
 
 _pydoclint_ detects the style of a docstring with this procedure:
 
-### 1.1. Numpy-style pattern detection (enhanced detection)
+### 1.1. Keyword heuristics for each style
 
-As of recent updates, _pydoclint_ first checks if the docstring contains
-numpy-style section headers with dashes. If it detects patterns like:
+We now rely on lightweight heuristics that look for style-specific keywords at
+the indentation level where the docstring begins:
 
-```
-Returns
--------
+- **NumPy**: section headers followed by dashed underlines (for example,
+  `Returns` + `-------`), using a curated list of keywords.
+- **Google**: top-level section headers such as `Args:`, `Returns:`, `Yields:`,
+  `Raises:`, `Examples:`, or `Notes:` with matching indentation.
+- **Sphinx/reST**: top-level field lists such as `:param`, `:type`, `:raises`,
+  `:return:`, `:rtype:`, `:yield:`, or `:ytype:`.
 
-Parameters
-----------
+Each helper only considers keywords that start at the same indentation level as
+the opening triple quotes to avoid counting inline roles or nested blocks.
 
-Examples
---------
-```
+### 1.2. Handling ambiguous or missing matches
 
-It immediately identifies the docstring as numpy-style and parses it
-accordingly, even if it may not be fully parsable as numpy style. This
-pattern-based detection looks for common section headers (Args, Arguments,
-Parameters, Returns, Yields, Raises, Examples, Notes, See Also, References)
-followed by 3 or more dashes on the next line.
+- **Exactly one match** We parse the docstring using the detected style. If it
+  differs from the configured style, DOC003 is emitted. Google parse failures
+  are also treated as style mismatches because malformed Google sections almost
+  always indicate another style.
+- **No matches** We assume the docstring uses the configured style and skip
+  style mismatch warnings entirely.
+- **Multiple matches** The docstring appears to mix styles (for example, Google
+  `Args:` plus Sphinx `:param` directives), so we emit DOC003 for every
+  configured style.
 
-### 1.2. Fallback to size-based detection
+### 1.3. What happens after a mismatch is detected?
 
-If no numpy-style patterns are detected, _pydoclint_ falls back to the original
-size-based detection:
-
-- It attempts to parse the docstring in all 3 styles: numpy, Google, and Sphinx
-- It then compares the "size" of the parsed docstring objects
-  - The "size" is a human-made metric to measure how "fully parsed" a docstring
-    object is. For example, a docstring object without the return section is
-    larger in "size" than that with the return section (all others being equal)
-- The style that yields the largest "size" is considered the style of the
-  docstring
+When DOC003 is triggered we still return the docstring parsed in the configured
+style, but we suppress many follow-up checks that would otherwise generate
+cascading false positives (argument type-hint expectations, return/yield/raise
+consistency, etc.). This keeps the feedback focused on resolving the style
+mismatch first.
 
 ## 2. How accurate is this detection heuristic?
 
@@ -84,10 +85,12 @@ Actually, this style mismatch detection feature is by default _off_.
 You can turn this feature on by setting `--check-style-mismatch` (or `-csm`) to
 `True` (or `--check-style-mismatch=True`).
 
-## 4. Is it much slower to parse a docstring in all 3 styles?
+## 4. Is it much slower to parse a docstring with the heuristics?
 
-It is not. The authors of _pydoclint_ benchmarked some very large code bases,
-and here are the results (as of 2025/01/12):
+No. The new detection flow usually parses at most one style per docstring, but
+even when we fall back to the configured style the cost is still negligible.
+For reference, benchmarking large code bases (as of 2025/01/12) shows the
+overhead of style detection is only a few percent:
 
 |                              | numpy | scikit-learn | Bokeh | Airflow |
 | ---------------------------- | ----- | ------------ | ----- | ------- |
 
@@ -1,9 +1,12 @@
 # Docs: https://docs.astral.sh/ruff/configuration
 
 exclude = ["tests/test_data"]
+fix = true
 line-length = 79
 output-format = "grouped"
+show-fixes = true
 target-version = "py310"
+unsafe-fixes = true
 
 [format]
 docstring-code-format = true
 
@@ -6,6 +6,25 @@
 
 from pydoclint.utils.doc import Doc
 
+_SPHINX_KEYWORDS = (
+    ':param ',
+    ':type ',
+    ':raises ',
+    ':return:',
+    ':rtype:',
+    ':yield:',
+    ':ytype:',
+)
+
+_GOOGLE_KEYWORDS = (
+    'Args:',
+    'Returns:',
+    'Yields:',
+    'Raises:',
+    'Examples:',
+    'Notes:',
+)
+
 
 def _containsNumpyStylePattern(docstring: str) -> bool:
     # Check if docstring contains numpy-style section headers with dashes.
@@ -31,6 +50,72 @@ def _containsNumpyStylePattern(docstring: str) -> bool:
     return bool(re.search(pattern, docstring, re.MULTILINE | re.IGNORECASE))
 
 
+def _containsSphinxStylePattern(docstring: str) -> bool:
+    """
+    Check if docstring contains Sphinx-style field lists at base indentation.
+
+    Only lines that have the same leading indentation as the docstring
+    definition (i.e., the opening triple quotes) count as valid Sphinx
+    directives. Lines with more or fewer leading spaces are ignored.
+    """
+    leadingIndent = _detectDocstringIndent(docstring)
+    for line in docstring.splitlines():
+        stripped = line.lstrip()
+        if stripped == '':
+            continue
+
+        currentIndent = len(line) - len(stripped)
+        if currentIndent != leadingIndent:
+            continue
+
+        for keyword in _SPHINX_KEYWORDS:
+            if stripped.startswith(keyword):
+                return True
+
+    return False
+
+
+def _containsGoogleStylePattern(docstring: str) -> bool:
+    """
+    Check if docstring contains Google-style section headers at base indent.
+    """
+    leadingIndent = _detectDocstringIndent(docstring)
+    for line in docstring.splitlines():
+        stripped = line.lstrip()
+        if stripped == '':
+            continue
+
+        currentIndent = len(line) - len(stripped)
+        if currentIndent != leadingIndent:
+            continue
+
+        for keyword in _GOOGLE_KEYWORDS:
+            if stripped.startswith(keyword):
+                return True
+
+    return False
+
+
+def _detectDocstringIndent(docstring: str) -> int:
+    """
+    Detect the leading indentation level of a docstring.
+
+    This approximates the column where the opening triple quotes are placed by
+    measuring the smallest indentation across non-empty lines.
+    """
+    indent: int | None = None
+    for line in docstring.splitlines():
+        stripped = line.lstrip()
+        if stripped == '':
+            continue
+
+        currentIndent = len(line) - len(stripped)
+        if indent is None or currentIndent < indent:
+            indent = currentIndent
+
+    return 0 if indent is None else indent
+
+
 def parseDocstring(
         docstring: str,
         userSpecifiedStyle: str,
@@ -39,40 +124,51 @@ def parseDocstring(
     Parse docstring in all 3 docstring styles and return the one that is parsed
     with the most likely style.
     """
-    # Check if docstring contains numpy-style section headers with dashes
-    if _containsNumpyStylePattern(docstring):
-        # Force numpy style parsing when numpy pattern is detected
-        docNumpy, excNumpy = parseDocstringInGivenStyle(docstring, 'numpy')
-        return docNumpy, excNumpy, userSpecifiedStyle != 'numpy'
-
-    docNumpy, excNumpy = parseDocstringInGivenStyle(docstring, 'numpy')
-    docGoogle, excGoogle = parseDocstringInGivenStyle(docstring, 'google')
-    docSphinx, excSphinx = parseDocstringInGivenStyle(docstring, 'sphinx')
-
-    docstrings: dict[str, Doc] = {
-        'numpy': docNumpy,
-        'google': docGoogle,
-        'sphinx': docSphinx,
-    }
-    docstringSizes: dict[str, int] = {
-        'numpy': docNumpy.docstringSize,
-        'google': docGoogle.docstringSize,
-        'sphinx': docSphinx.docstringSize,
-    }
-    parsingExceptions: dict[str, ParseError | None] = {
-        'numpy': excNumpy,
-        'google': excGoogle,
-        'sphinx': excSphinx,
+    isLikelyNumpy: bool = _containsNumpyStylePattern(docstring)
+    isLikelyGoogle: bool = _containsGoogleStylePattern(docstring)
+    isLikelySphinx: bool = _containsSphinxStylePattern(docstring)
+
+    if isLikelyNumpy:
+        # Numpy-style headers with dashes are strong indicators; ignore other
+        # potential matches when they appear alongside them.
+        isLikelyGoogle = False
+        isLikelySphinx = False
+
+    likelyStyles = {
+        'numpy': isLikelyNumpy,
+        'google': isLikelyGoogle,
+        'sphinx': isLikelySphinx,
     }
-    # Whichever style has the largest docstring size, we think that it is
-    # the actual style that the docstring is written in.
-    maxDocstringSize = max(docstringSizes.values())
-    styleMismatch: bool = docstringSizes[userSpecifiedStyle] < maxDocstringSize
-    return (
-        docstrings[userSpecifiedStyle],
-        parsingExceptions[userSpecifiedStyle],
-        styleMismatch,
-    )
+    matchedStyles = [
+        style for style, matched in likelyStyles.items() if matched
+    ]
+
+    styleMismatch: bool
+
+    if len(matchedStyles) == 1:
+        detectedStyle = matchedStyles[0]
+        if detectedStyle == userSpecifiedStyle:
+            doc, exc = parseDocstringInGivenStyle(docstring, detectedStyle)
+            # The Google parser raises hard errors when sections are malformed,
+            # which is a strong signal the docstring is effectively written in
+            # a different style. Numpy/Sphinx parsers are more permissive, so
+            # we surface only the parsing error (DOC001) without flagging a
+            # style mismatch in those cases.
+            styleMismatch = exc is not None and detectedStyle == 'google'
+            return doc, exc, styleMismatch
+
+        doc, exc = parseDocstringInGivenStyle(docstring, detectedStyle)
+        styleMismatch = True
+        return doc, exc, styleMismatch
+
+    if len(matchedStyles) == 0:
+        doc, exc = parseDocstringInGivenStyle(docstring, userSpecifiedStyle)
+        styleMismatch = False
+        return doc, exc, styleMismatch
+
+    doc, exc = parseDocstringInGivenStyle(docstring, userSpecifiedStyle)
+    styleMismatch = True
+    return doc, exc, styleMismatch
 
 
 def parseDocstringInGivenStyle(