Skip to content

Commit 489e69f

Browse files
committed
docs: definitions provided for rules and rulesets
Signed-off-by: Carl Flottmann <[email protected]>
1 parent 0df29ba commit 489e69f

File tree

2 files changed

+15
-4
lines changed

2 files changed

+15
-4
lines changed

src/macaron/config/defaults.ini

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -601,6 +601,14 @@ epoch_threshold = 3
601601
# The number of days +/- the day of publish the calendar versioning day may be.
602602
day_publish_error = 4
603603

604+
# ==== The following sections are for source code analysis using Semgrep ====
605+
# rulesets: a reference to a 'ruleset' in this section refers to a Semgrep .yaml file containing one or more rules.
606+
# rules: a reference to a 'rule' in this section refers to an individual rule ID, specified by the '- id:' field in
607+
# the Segmrep .yaml file.
608+
# default rulesets: these are a collection of rulesets provided with Macaron which are run by default with the sourcecode
609+
# analyzer. These live in src/macaron/resources/pypi_malware_rules.
610+
# custom rulesets: this is a collection of user-provided rulesets, living inside the path provided to 'custom_semgrep_rules_path'.
611+
604612
# disable default semgrep rulesets here (i.e. all rule IDs in a Semgrep .yaml file) using ruleset names, the name
605613
# without the .yaml prefix. Currently, we disable the exfiltration rulesets by default due to a high false positive rate.
606614
# This list may not contain duplicated elements. Macaron's default ruleset names are all unique.

src/macaron/malware_analyzer/pypi_heuristics/sourcecode/pypi_sourcecode_analyzer.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -238,10 +238,11 @@ def analyze(self, pypi_package_json: PyPIPackageJsonAsset) -> tuple[HeuristicRes
238238
if there is no source code available.
239239
"""
240240
analysis_result: dict = {}
241-
disabled_results: dict = (
242-
{}
243-
) # since we have to run them anyway, return disabled rule findings for debug information
244-
# only run semgrep open-source features, and disable 'nosemgrep' ignoring so this does not bypass our scan
241+
# since we have to run them anyway, return disabled rule findings for debug information
242+
disabled_results: dict = {}
243+
# Here, we disable 'nosemgrep' ignoring so that this is not an evasion method of our scan (i.e. malware includes
244+
# 'nosemgrep' comments to prevent our scan detecting those code lines). Read more about the 'nosemgrep' feature
245+
# here: https://semgrep.dev/docs/ignoring-files-folders-code
245246
semgrep_commands: list[str] = ["semgrep", "scan", "--oss-only", "--disable-nosem"]
246247
result: HeuristicResult = HeuristicResult.PASS
247248

@@ -302,6 +303,8 @@ def analyze(self, pypi_package_json: PyPIPackageJsonAsset) -> tuple[HeuristicRes
302303
# only work if `--experimental` is also supplied to enable experimental features, which we do not use.
303304
# Semgrep provides a relative path separated by '.' to the rule ID, where the rule ID is always the
304305
# final element in that path, so we use that to match our rule IDs.
306+
# e.g. rule_id = src.macaron.resources.pypi_malware_rules.obfuscation_decode-and-execute, which comes from
307+
# the rule ID 'obfuscation_decode-and-execute' inside 'obfuscation.yaml'.
305308
if rule_id.split(".")[-1] in self.disabled_rule_ids:
306309
if rule_id not in self.disabled_rule_ids:
307310
disabled_results[rule_id] = {"message": message, "detections": []}

0 commit comments

Comments
 (0)