Skip to content

Commit 6f1aef0

Browse files
committed
SARIF reporter
SARIF is a unified file format used to exchange information between static analysis tools (like pylint) and various types of formatters, meta-runners, broadcasters / alert system, ... This implementation is ad-hoc, and non-validating. Spec v Github ------------- Turns out Github both doesn't implement all of SARIF (which makes sense) and requires a bunch of properties which the spec considers optional. The [official SARIF validator][] (linked to by both oasis and github) was used to validate the output of the reporter, ensuring that all the github requirements it flags are fulfilled, and fixing *some* of the validator's pet issues. As of now the following issues are left unaddressed: - azure requires `run.automationDetails`, looking at the spec I don't think it makes sense for the reporter to inject that, it's more up to the CI - the validator wants a `run.versionControlProvenance`, same as above - the validator wants rule names in PascalCase, lol - the validator wants templated result messages, but without pylint providing the args as part of the `Message` that's a bit of a chore - the validator wants `region` to include a snippet (the flagged content) - the validator wants `physicalLocation` to have a `contextRegion` (most likely with a snippet) On URIs ------- The reporter makes use of URIs for artifacts (~files). Per ["guidance on the use of artifactLocation objects"][3.4.7], `uri` *should* capture the deterministic part of the artifact location and `uriBaseId` *should* capture the non-deterministic part. However as far as I can tell pylint has no requirement (and no clean way to require) consistent resolution roots: `path` is just relative to the cwd, and there is no requirement to have project-level files to use pylint. This makes the use of relative uris dodgy, but absolute uris are pretty much always broken for the purpose of *interchange* so they're not really any better. As a side-note, Github [asserts][relative-uri-guidance] > While this [nb: `originalUriBaseIds`] is not required by GitHub for > the code scanning results to be displayed correctly, it is required > to produce a valid SARIF output when using relative URI references. However per [3.4.4][] this is incorrect, the `uriBaseId` can be resolved through end-user configuration, `originalUriBaseIds`, external information (e.g. envvars), or heuristics. It would be nice to document the "relative root" via `originalUriBaseIds` (which may be omitted for that purpose per [3.14.14][], but per the above claiming a consistent project root is dodgy. We *could* resolve known project files (e.g. pyproject.toml, tox.ini, etc...) in order to find a consistent root (project root, repo root, ...) and set / use that for relative URIs but that's a lot of additional complexity which I'm not sure is warranted at least for a first version. Fixes #5493 [3.4.4]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540869 [3.4.7]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540872 [3.14.14]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540936 [relative-uri-guidance]: https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#relative-uri-guidance-for-sarif-producers [official SARIF validator]: https://sarifweb.azurewebsites.net/
1 parent 7588243 commit 6f1aef0

File tree

4 files changed

+320
-1
lines changed

4 files changed

+320
-1
lines changed

pylint/lint/base_options.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,8 @@ def _make_linter_options(linter: PyLinter) -> Options:
104104
"group": "Reports",
105105
"help": "Set the output format. Available formats are: 'text', "
106106
"'parseable', 'colorized', 'json2' (improved json format), 'json' "
107-
"(old json format), msvs (visual studio) and 'github' (GitHub actions). "
107+
"(old json format), msvs (visual studio), 'github' (GitHub actions), "
108+
"and 'sarif'. "
108109
"You can also give a reporter class, e.g. mypackage.mymodule."
109110
"MyReporterClass.",
110111
"kwargs": {"linter": linter},

pylint/reporters/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
from pylint.reporters.json_reporter import JSON2Reporter, JSONReporter
1515
from pylint.reporters.multi_reporter import MultiReporter
1616
from pylint.reporters.reports_handler_mix_in import ReportsHandlerMixIn
17+
from pylint.reporters.sarif_reporter import SARIFReporter
1718

1819
if TYPE_CHECKING:
1920
from pylint.lint.pylinter import PyLinter
@@ -31,4 +32,5 @@ def initialize(linter: PyLinter) -> None:
3132
"JSONReporter",
3233
"MultiReporter",
3334
"ReportsHandlerMixIn",
35+
"SARIFReporter",
3436
]

pylint/reporters/sarif_reporter.py

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# pylint: disable=wrong-spelling-in-comment
2+
from __future__ import annotations
3+
4+
import json
5+
from textwrap import shorten
6+
from typing import TYPE_CHECKING, TypedDict, Literal
7+
8+
import pylint
9+
import pylint.message
10+
from pylint.constants import MSG_TYPES
11+
from pylint.reporters import BaseReporter
12+
13+
if TYPE_CHECKING:
14+
from pylint.lint import PyLinter
15+
from pylint.reporters.ureports.nodes import Section
16+
17+
18+
def register(linter: PyLinter) -> None:
19+
linter.register_reporter(SARIFReporter)
20+
21+
22+
class SARIFReporter(BaseReporter):
23+
name = "sarif"
24+
extension = "sarif"
25+
linter: PyLinter
26+
27+
def display_reports(self, layout: Section) -> None:
28+
"""Don't do anything in this reporter."""
29+
30+
def _display(self, layout: Section) -> None:
31+
"""Do nothing."""
32+
33+
def display_messages(self, layout: Section | None) -> None:
34+
"""Launch layouts display."""
35+
output: Log = {
36+
"version": "2.1.0",
37+
"$schema": "https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/schemas/sarif-schema-2.1.0.json",
38+
"runs": [
39+
{
40+
"tool": {
41+
"driver": {
42+
"name": "pylint",
43+
"fullName": f"pylint {pylint.__version__}",
44+
"version": pylint.__version__,
45+
# should be versioned but not all versions are kept so...
46+
"informationUri": "https://pylint.readthedocs.io/",
47+
"rules": [
48+
{
49+
"id": m.msgid,
50+
"name": m.symbol,
51+
"deprecatedIds": [msgid for msgid, _ in m.old_names],
52+
"deprecatedNames": [name for _, name in m.old_names],
53+
# per 3.19.19 shortDescription should be a
54+
# single sentence which can't be guaranteed,
55+
# however github requires it...
56+
"shortDescription": {"text": m.description.split(".", 1)[0]},
57+
# github requires that this is less than 1024 characters
58+
"fullDescription": {"text": shorten(m.description, 1024, placeholder="...")},
59+
"help": {"text": m.format_help()},
60+
"helpUri": f"https://pylint.readthedocs.io/en/stable/user_guide/messages/{MSG_TYPES[m.msgid[0]]}/{m.symbol}.html",
61+
# handle_message only gets the formatted message,
62+
# so to use `messageStrings` we'd need to
63+
# convert the templating and extract the args
64+
# out of the msg
65+
}
66+
for checker in self.linter.get_checkers()
67+
for m in checker.messages
68+
if m.symbol in self.linter.stats.by_msg
69+
],
70+
}
71+
},
72+
"results": [self.serialize(message) for message in self.messages],
73+
}
74+
],
75+
}
76+
json.dump(output, self.out)
77+
78+
@staticmethod
79+
def serialize(message: pylint.message.Message) -> Result:
80+
region: Region = {
81+
"startLine": message.line,
82+
"startColumn": message.column + 1,
83+
"endLine": message.end_line or message.line,
84+
"endColumn": (message.end_column or message.column) + 1,
85+
}
86+
87+
location: Location = {
88+
"physicalLocation": {
89+
"artifactLocation": {
90+
"uri": message.path.replace("\\", "/"),
91+
},
92+
"region": region,
93+
},
94+
}
95+
if message.obj:
96+
logical_location: LogicalLocation = {
97+
"name": message.obj,
98+
"fullyQualifiedName": f"{message.module}.{message.obj}",
99+
}
100+
location["logicalLocations"] = [logical_location]
101+
102+
return {
103+
"ruleId": message.msg_id,
104+
"message": {"text": message.msg},
105+
"level": CATEGORY_MAP[message.category],
106+
"locations": [location],
107+
"partialFingerprints": {
108+
# encoding the node path seems like it would be useful to dedup alerts?
109+
"nodePath/v1": "",
110+
},
111+
}
112+
113+
114+
CATEGORY_MAP: dict[str, ResultLevel] = {
115+
"convention": "note",
116+
"refactor": "note",
117+
"statement": "note",
118+
"info": "note",
119+
"warning": "warning",
120+
"error": "error",
121+
"fatal": "error",
122+
}
123+
124+
125+
class Run(TypedDict):
126+
tool: Tool
127+
# invocation parameters / environment for the tool
128+
# invocation: list[Invocations]
129+
results: list[Result]
130+
# originalUriBaseIds: dict[str, ArtifactLocation]
131+
132+
133+
Log = TypedDict(
134+
"Log",
135+
{
136+
"version": Literal["2.1.0"],
137+
"$schema": Literal["https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/schemas/sarif-schema-2.1.0.json"],
138+
"runs": list[Run],
139+
},
140+
)
141+
142+
143+
class Tool(TypedDict):
144+
driver: Driver
145+
146+
147+
class Driver(TypedDict):
148+
name: Literal["pylint"]
149+
# optional but azure wants it
150+
fullName: str
151+
version: str
152+
informationUri: str # not required but validator wants it
153+
rules: list[ReportingDescriptor]
154+
155+
156+
class ReportingDescriptorOpt(TypedDict, total=False):
157+
deprecatedIds: list[str]
158+
deprecatedNames: list[str]
159+
messageStrings: dict[str, MessageString]
160+
161+
162+
class ReportingDescriptor(ReportingDescriptorOpt):
163+
id: str
164+
# optional but validator really wants it (then complains that it's not pascal cased)
165+
name: str
166+
# not required per spec but required by github
167+
shortDescription: MessageString
168+
fullDescription: MessageString
169+
help: MessageString
170+
helpUri: str
171+
172+
173+
class MarkdownMessageString(TypedDict, total=False):
174+
markdown: str
175+
176+
177+
class MessageString(MarkdownMessageString):
178+
text: str
179+
180+
181+
ResultLevel = Literal["none", "note", "warning", "error"]
182+
183+
184+
class ResultOpt(TypedDict, total=False):
185+
ruleId: str
186+
ruleIndex: int
187+
188+
level: ResultLevel
189+
190+
191+
class Result(ResultOpt):
192+
message: Message
193+
# not required per spec but required by github
194+
locations: list[Location]
195+
partialFingerprints: dict[str, str]
196+
197+
198+
class Message(TypedDict, total=False):
199+
# needs to have either text or id but it's a PITA to type
200+
201+
#: plain text message string (can have markdown links but no other formatting)
202+
text: str
203+
#: formatted GFM text
204+
markdown: str
205+
#: rule id
206+
id: str
207+
#: arguments for templated rule messages
208+
arguments: list[str]
209+
210+
211+
class Location(TypedDict, total=False):
212+
physicalLocation: PhysicalLocation # actually required by github
213+
logicalLocations: list[LogicalLocation]
214+
215+
216+
class PhysicalLocation(TypedDict):
217+
artifactLocation: ArtifactLocation
218+
# not required per spec, required by github
219+
region: Region
220+
221+
222+
class ArtifactLocation(TypedDict, total=False):
223+
uri: str
224+
#: id of base URI for resolving relative `uri`
225+
uriBaseId: str
226+
description: Message
227+
228+
229+
class LogicalLocation(TypedDict, total=False):
230+
name: str
231+
fullyQualifiedName: str
232+
#: schema is `str` with a bunch of *suggested* terms, of which this is a subset
233+
kind: Literal["function", "member", "module", "parameter", "returnType", "type", "variable"]
234+
235+
236+
class Region(TypedDict):
237+
# none required per spec, all required by github
238+
startLine: int
239+
startColumn: int
240+
endLine: int
241+
endColumn: int
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
import io
2+
import json
3+
4+
from pylint import __version__, checkers
5+
from pylint.lint import PyLinter
6+
from pylint.reporters import SARIFReporter
7+
8+
9+
def test_simple_sarif() -> None:
10+
output = io.StringIO()
11+
reporter = SARIFReporter(output)
12+
linter = PyLinter(reporter=reporter)
13+
checkers.initialize(linter)
14+
linter.config.persistent = 0
15+
linter.open()
16+
linter.set_current_module("0123")
17+
linter.add_message("line-too-long", line=1, args=(1, 2), end_lineno=1, end_col_offset=4)
18+
reporter.display_messages(None)
19+
assert json.loads(output.getvalue()) == {
20+
"version": "2.1.0",
21+
"$schema": "https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/schemas/sarif-schema-2.1.0.json",
22+
"runs": [
23+
{
24+
"tool": {
25+
"driver": {
26+
"name": "pylint",
27+
"fullName": f"pylint {__version__}",
28+
"version": __version__,
29+
"informationUri": "https://pylint.readthedocs.io/",
30+
"rules": [
31+
{
32+
"id": "C0301",
33+
"deprecatedIds": [],
34+
"name": "line-too-long",
35+
"deprecatedNames": [],
36+
"shortDescription": {
37+
"text": "Used when a line is longer than a given number of characters"
38+
},
39+
"fullDescription": {
40+
"text": "Used when a line is longer than a given number of characters."
41+
},
42+
"help": {
43+
"text": ":line-too-long (C0301): *Line too long (%s/%s)*\n Used when a line is longer than a given number of characters."
44+
},
45+
"helpUri": "https://pylint.readthedocs.io/en/stable/user_guide/messages/convention/line-too-long.html",
46+
}
47+
],
48+
},
49+
},
50+
"results": [
51+
{
52+
"ruleId": "C0301",
53+
"message": {"text": "Line too long (1/2)"},
54+
"level": "note",
55+
"locations": [
56+
{
57+
"physicalLocation": {
58+
"artifactLocation": {
59+
"uri": "0123",
60+
},
61+
"region": {
62+
"startLine": 1,
63+
"startColumn": 1,
64+
"endLine": 1,
65+
"endColumn": 5,
66+
},
67+
},
68+
}
69+
],
70+
"partialFingerprints": {"nodePath/v1": ""},
71+
}
72+
],
73+
}
74+
],
75+
}

0 commit comments

Comments
 (0)