Skip to content

Commit 8afcc8b

Browse files
committed
SARIF reporter
SARIF is a unified file format used to exchange information between static analysis tools (like pylint) and various types of formatters, meta-runners, broadcasters / alert system, ... This implementation is ad-hoc, and non-validating. Spec v Github ------------- Turns out Github both doesn't implement all of SARIF (which makes sense) and requires a bunch of properties which the spec considers optional. The [official SARIF validator][] (linked to by both oasis and github) was used to validate the output of the reporter, ensuring that all the github requirements it flags are fulfilled, and fixing *some* of the validator's pet issues. As of now the following issues are left unaddressed: - azure requires `run.automationDetails`, looking at the spec I don't think it makes sense for the reporter to inject that, it's more up to the CI - the validator wants a `run.versionControlProvenance`, same as above - the validator wants rule names in PascalCase, lol - the validator wants templated result messages, but without pylint providing the args as part of the `Message` that's a bit of a chore - the validator wants `region` to include a snippet (the flagged content) - the validator wants `physicalLocation` to have a `contextRegion` (most likely with a snippet) On URIs ------- The reporter makes use of URIs for artifacts (~files). Per ["guidance on the use of artifactLocation objects"][3.4.7], `uri` *should* capture the deterministic part of the artifact location and `uriBaseId` *should* capture the non-deterministic part. However as far as I can tell pylint has no requirement (and no clean way to require) consistent resolution roots: `path` is just relative to the cwd, and there is no requirement to have project-level files to use pylint. This makes the use of relative uris dodgy, but absolute uris are pretty much always broken for the purpose of *interchange* so they're not really any better. As a side-note, Github [asserts][relative-uri-guidance] > While this [nb: `originalUriBaseIds`] is not required by GitHub for > the code scanning results to be displayed correctly, it is required > to produce a valid SARIF output when using relative URI references. However per [3.4.4][] this is incorrect, the `uriBaseId` can be resolved through end-user configuration, `originalUriBaseIds`, external information (e.g. envvars), or heuristics. It would be nice to document the "relative root" via `originalUriBaseIds` (which may be omitted for that purpose per [3.14.14][], but per the above claiming a consistent project root is dodgy. We *could* resolve known project files (e.g. pyproject.toml, tox.ini, etc...) in order to find a consistent root (project root, repo root, ...) and set / use that for relative URIs but that's a lot of additional complexity which I'm not sure is warranted at least for a first version. Fixes #5493 [3.4.4]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540869 [3.4.7]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540872 [3.14.14]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540936 [relative-uri-guidance]: https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#relative-uri-guidance-for-sarif-producers [official SARIF validator]: https://sarifweb.azurewebsites.net/
1 parent 7588243 commit 8afcc8b

File tree

6 files changed

+378
-1
lines changed

6 files changed

+378
-1
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Support for SARIF as an output format.
2+
3+
Closes #5493
4+
Closes #10647

pylint/lint/base_options.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,8 @@ def _make_linter_options(linter: PyLinter) -> Options:
104104
"group": "Reports",
105105
"help": "Set the output format. Available formats are: 'text', "
106106
"'parseable', 'colorized', 'json2' (improved json format), 'json' "
107-
"(old json format), msvs (visual studio) and 'github' (GitHub actions). "
107+
"(old json format), msvs (visual studio), 'github' (GitHub actions), "
108+
"and 'sarif'. "
108109
"You can also give a reporter class, e.g. mypackage.mymodule."
109110
"MyReporterClass.",
110111
"kwargs": {"linter": linter},

pylint/reporters/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
from pylint.reporters.json_reporter import JSON2Reporter, JSONReporter
1515
from pylint.reporters.multi_reporter import MultiReporter
1616
from pylint.reporters.reports_handler_mix_in import ReportsHandlerMixIn
17+
from pylint.reporters.sarif_reporter import SARIFReporter
1718

1819
if TYPE_CHECKING:
1920
from pylint.lint.pylinter import PyLinter
@@ -31,4 +32,5 @@ def initialize(linter: PyLinter) -> None:
3132
"JSONReporter",
3233
"MultiReporter",
3334
"ReportsHandlerMixIn",
35+
"SARIFReporter",
3436
]

pylint/reporters/sarif_reporter.py

Lines changed: 255 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,255 @@
1+
# pylint: disable=wrong-spelling-in-comment
2+
from __future__ import annotations
3+
4+
import json
5+
from textwrap import shorten
6+
from typing import TYPE_CHECKING, Literal, TypedDict
7+
8+
import pylint
9+
import pylint.message
10+
from pylint.constants import MSG_TYPES
11+
from pylint.reporters import BaseReporter
12+
13+
if TYPE_CHECKING:
14+
from pylint.lint import PyLinter
15+
from pylint.reporters.ureports.nodes import Section
16+
17+
18+
def register(linter: PyLinter) -> None:
19+
linter.register_reporter(SARIFReporter)
20+
21+
22+
class SARIFReporter(BaseReporter):
23+
name = "sarif"
24+
extension = "sarif"
25+
linter: PyLinter
26+
27+
def display_reports(self, layout: Section) -> None:
28+
"""Don't do anything in this reporter."""
29+
30+
def _display(self, layout: Section) -> None:
31+
"""Do nothing."""
32+
33+
def display_messages(self, layout: Section | None) -> None:
34+
"""Launch layouts display."""
35+
output: Log = {
36+
"version": "2.1.0",
37+
"$schema": "https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/schemas/sarif-schema-2.1.0.json",
38+
"runs": [
39+
{
40+
"tool": {
41+
"driver": {
42+
"name": "pylint",
43+
"fullName": f"pylint {pylint.__version__}",
44+
"version": pylint.__version__,
45+
# should be versioned but not all versions are kept so...
46+
"informationUri": "https://pylint.readthedocs.io/",
47+
"rules": [
48+
{
49+
"id": m.msgid,
50+
"name": m.symbol,
51+
"deprecatedIds": [
52+
msgid for msgid, _ in m.old_names
53+
],
54+
"deprecatedNames": [
55+
name for _, name in m.old_names
56+
],
57+
# per 3.19.19 shortDescription should be a
58+
# single sentence which can't be guaranteed,
59+
# however github requires it...
60+
"shortDescription": {
61+
"text": m.description.split(".", 1)[0]
62+
},
63+
# github requires that this is less than 1024 characters
64+
"fullDescription": {
65+
"text": shorten(
66+
m.description, 1024, placeholder="..."
67+
)
68+
},
69+
"help": {"text": m.format_help()},
70+
"helpUri": f"https://pylint.readthedocs.io/en/stable/user_guide/messages/{MSG_TYPES[m.msgid[0]]}/{m.symbol}.html",
71+
# handle_message only gets the formatted message,
72+
# so to use `messageStrings` we'd need to
73+
# convert the templating and extract the args
74+
# out of the msg
75+
}
76+
for checker in self.linter.get_checkers()
77+
for m in checker.messages
78+
if m.symbol in self.linter.stats.by_msg
79+
],
80+
}
81+
},
82+
"results": [self.serialize(message) for message in self.messages],
83+
}
84+
],
85+
}
86+
json.dump(output, self.out)
87+
88+
@staticmethod
89+
def serialize(message: pylint.message.Message) -> Result:
90+
region: Region = {
91+
"startLine": message.line,
92+
"startColumn": message.column + 1,
93+
"endLine": message.end_line or message.line,
94+
"endColumn": (message.end_column or message.column) + 1,
95+
}
96+
97+
location: Location = {
98+
"physicalLocation": {
99+
"artifactLocation": {
100+
"uri": message.path.replace("\\", "/"),
101+
},
102+
"region": region,
103+
},
104+
}
105+
if message.obj:
106+
logical_location: LogicalLocation = {
107+
"name": message.obj,
108+
"fullyQualifiedName": f"{message.module}.{message.obj}",
109+
}
110+
location["logicalLocations"] = [logical_location]
111+
112+
return {
113+
"ruleId": message.msg_id,
114+
"message": {"text": message.msg},
115+
"level": CATEGORY_MAP[message.category],
116+
"locations": [location],
117+
"partialFingerprints": {
118+
# encoding the node path seems like it would be useful to dedup alerts?
119+
"nodePath/v1": "",
120+
},
121+
}
122+
123+
124+
CATEGORY_MAP: dict[str, ResultLevel] = {
125+
"convention": "note",
126+
"refactor": "note",
127+
"statement": "note",
128+
"info": "note",
129+
"warning": "warning",
130+
"error": "error",
131+
"fatal": "error",
132+
}
133+
134+
135+
class Run(TypedDict):
136+
tool: Tool
137+
# invocation parameters / environment for the tool
138+
# invocation: list[Invocations]
139+
results: list[Result]
140+
# originalUriBaseIds: dict[str, ArtifactLocation]
141+
142+
143+
Log = TypedDict(
144+
"Log",
145+
{
146+
"version": Literal["2.1.0"],
147+
"$schema": Literal[
148+
"https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/schemas/sarif-schema-2.1.0.json"
149+
],
150+
"runs": list[Run],
151+
},
152+
)
153+
154+
155+
class Tool(TypedDict):
156+
driver: Driver
157+
158+
159+
class Driver(TypedDict):
160+
name: Literal["pylint"]
161+
# optional but azure wants it
162+
fullName: str
163+
version: str
164+
informationUri: str # not required but validator wants it
165+
rules: list[ReportingDescriptor]
166+
167+
168+
class ReportingDescriptorOpt(TypedDict, total=False):
169+
deprecatedIds: list[str]
170+
deprecatedNames: list[str]
171+
messageStrings: dict[str, MessageString]
172+
173+
174+
class ReportingDescriptor(ReportingDescriptorOpt):
175+
id: str
176+
# optional but validator really wants it (then complains that it's not pascal cased)
177+
name: str
178+
# not required per spec but required by github
179+
shortDescription: MessageString
180+
fullDescription: MessageString
181+
help: MessageString
182+
helpUri: str
183+
184+
185+
class MarkdownMessageString(TypedDict, total=False):
186+
markdown: str
187+
188+
189+
class MessageString(MarkdownMessageString):
190+
text: str
191+
192+
193+
ResultLevel = Literal["none", "note", "warning", "error"]
194+
195+
196+
class ResultOpt(TypedDict, total=False):
197+
ruleId: str
198+
ruleIndex: int
199+
200+
level: ResultLevel
201+
202+
203+
class Result(ResultOpt):
204+
message: Message
205+
# not required per spec but required by github
206+
locations: list[Location]
207+
partialFingerprints: dict[str, str]
208+
209+
210+
class Message(TypedDict, total=False):
211+
# needs to have either text or id but it's a PITA to type
212+
213+
#: plain text message string (can have markdown links but no other formatting)
214+
text: str
215+
#: formatted GFM text
216+
markdown: str
217+
#: rule id
218+
id: str
219+
#: arguments for templated rule messages
220+
arguments: list[str]
221+
222+
223+
class Location(TypedDict, total=False):
224+
physicalLocation: PhysicalLocation # actually required by github
225+
logicalLocations: list[LogicalLocation]
226+
227+
228+
class PhysicalLocation(TypedDict):
229+
artifactLocation: ArtifactLocation
230+
# not required per spec, required by github
231+
region: Region
232+
233+
234+
class ArtifactLocation(TypedDict, total=False):
235+
uri: str
236+
#: id of base URI for resolving relative `uri`
237+
uriBaseId: str
238+
description: Message
239+
240+
241+
class LogicalLocation(TypedDict, total=False):
242+
name: str
243+
fullyQualifiedName: str
244+
#: schema is `str` with a bunch of *suggested* terms, of which this is a subset
245+
kind: Literal[
246+
"function", "member", "module", "parameter", "returnType", "type", "variable"
247+
]
248+
249+
250+
class Region(TypedDict):
251+
# none required per spec, all required by github
252+
startLine: int
253+
startColumn: int
254+
endLine: int
255+
endColumn: int

tests/lint/unittest_expand_modules.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,14 @@ def test__is_in_ignore_list_re_match() -> None:
151151
"name": "reporters.unittest_reporting",
152152
"isignored": False,
153153
},
154+
str(REPORTERS_PATH / "unittest_sarif_reporter.py"): {
155+
"basename": "reporters",
156+
"basepath": str(REPORTERS_PATH / "__init__.py"),
157+
"isarg": False,
158+
"path": str(REPORTERS_PATH / "unittest_sarif_reporter.py"),
159+
"name": "reporters.unittest_sarif_reporter",
160+
"isignored": False,
161+
},
154162
}
155163

156164

0 commit comments

Comments
 (0)