From b8b9b542bf6ef8bf71a93fa387e47f0c7bdb7b4c Mon Sep 17 00:00:00 2001 From: Carl Flottmann Date: Wed, 11 Jun 2025 16:39:23 +1000 Subject: [PATCH 1/6] docs: included tutorial section explaining source code analysis Signed-off-by: Carl Flottmann --- .../pages/tutorials/detect_malicious_package.rst | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/docs/source/pages/tutorials/detect_malicious_package.rst b/docs/source/pages/tutorials/detect_malicious_package.rst index 22c236700..c69bdad35 100644 --- a/docs/source/pages/tutorials/detect_malicious_package.rst +++ b/docs/source/pages/tutorials/detect_malicious_package.rst @@ -232,6 +232,21 @@ Macaron also provides a confidence score for each check result, represented as a is_component(component_id, purl), match("pkg:pypi/django@.*", purl). +'''''''''''''''''''' +Source Code Analysis +'''''''''''''''''''' + +Macaron supports static code analysis as a malware analysis heuristic. This can be enabled by supplying the command line argument ``--analyze-source``. Macaron uses the open-source static code analysis tool Semgrep to analyse the source code of a python package, looking for malicious code patterns defined in Macaron's own Semgrep rules. Currently supported are detection of attempts to obfuscate the source code, and detection of code that exfiltrates sensitive data to a remote connection. + +By default, the source code analyzer is run in conjunction with the other metadata heuristics. The source code heuristic is optimised such that it is not always required to be run to ensure a package is benign, so it will not always be run as part of the heuristic analysis, even when enabled. To force it to run regardless of the result of other heuristics, the command line argument ``--force-analyze-source`` must be supplied. To analyze ``django@5.0.6`` with source code analysis enabled and enforced, the following command may be run: + +.. code-block:: shell + + ./run_macaron.sh analyze -purl pkg:pypi/django@5.0.6 --python-venv "/tmp/.django_venv" --analyze-source --force-analyze-source + +If any suspicious patterns are triggered, this will be identified in the ``mcn_detect_malicious_metadata_1`` result for the heuristic named ``suspicious_patterns``. The output database ``output/macaron.db`` can be used to get the specific results of the analysis by querying the :class:`detect_malicious_metadata_check.result field `. This will provide detailed JSON information about all data collected by the ``mcn_detect_malicious_metadata_1`` check, including, for source code analysis, any malicious code patterns detected, what Semgrep rule detected it, the file in which it was detected, and the line number for the detection. + + *********** Future Work *********** From 343b65626f973c021a1261c301781b4270f7bc22 Mon Sep 17 00:00:00 2001 From: Carl Flottmann Date: Thu, 12 Jun 2025 14:52:37 +1000 Subject: [PATCH 2/6] chore: addressing PR feedback Signed-off-by: Carl Flottmann --- .../tutorials/detect_malicious_package.rst | 29 +++++++++---------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/docs/source/pages/tutorials/detect_malicious_package.rst b/docs/source/pages/tutorials/detect_malicious_package.rst index c69bdad35..ace5d378b 100644 --- a/docs/source/pages/tutorials/detect_malicious_package.rst +++ b/docs/source/pages/tutorials/detect_malicious_package.rst @@ -122,6 +122,20 @@ Note that the ``match`` constraint applies a regex pattern and can be expanded t is_component(component_id, purl), match("pkg:pypi.*", purl). +'''''''''''''''''''' +Source Code Analysis +'''''''''''''''''''' + +Macaron supports static code analysis as a malware analysis heuristic. This can be enabled by supplying the command line argument ``--analyze-source``. Macaron uses the open-source static code analysis tool Semgrep to analyse the source code of a python package, looking for malicious code patterns defined in Macaron's own Semgrep rules. Example detection patterns include identifying attempts to obfuscate source code and detecting code that exfiltrates sensitive data to remote connections. + +By default, the source code analyzer is run in conjunction with the other metadata heuristics. The source code heuristic is optimised such that it is not always required to be run to ensure a package is benign, so it will not always be run as part of the heuristic analysis, even when enabled. To force it to run regardless of the result of other heuristics, the command line argument ``--force-analyze-source`` must be supplied. To analyze ``django@5.0.6`` with source code analysis enabled and enforced, the following command may be run: + +.. code-block:: shell + + ./run_macaron.sh analyze -purl pkg:pypi/django@5.0.6 --python-venv "/tmp/.django_venv" --analyze-source --force-analyze-source + +If any suspicious patterns are triggered, this will be identified in the ``mcn_detect_malicious_metadata_1`` result for the heuristic named ``suspicious_patterns``. The output database ``output/macaron.db`` can be used to get the specific results of the analysis by querying the :class:`detect_malicious_metadata_check.result field `. This will provide detailed JSON information about all data collected by the ``mcn_detect_malicious_metadata_1`` check, including, for source code analysis, any malicious code patterns detected, what Semgrep rule detected it, the file in which it was detected, and the line number for the detection. + +++++++++++++++++++++++++++++++++++++++ Verification Summary Attestation report +++++++++++++++++++++++++++++++++++++++ @@ -232,21 +246,6 @@ Macaron also provides a confidence score for each check result, represented as a is_component(component_id, purl), match("pkg:pypi/django@.*", purl). -'''''''''''''''''''' -Source Code Analysis -'''''''''''''''''''' - -Macaron supports static code analysis as a malware analysis heuristic. This can be enabled by supplying the command line argument ``--analyze-source``. Macaron uses the open-source static code analysis tool Semgrep to analyse the source code of a python package, looking for malicious code patterns defined in Macaron's own Semgrep rules. Currently supported are detection of attempts to obfuscate the source code, and detection of code that exfiltrates sensitive data to a remote connection. - -By default, the source code analyzer is run in conjunction with the other metadata heuristics. The source code heuristic is optimised such that it is not always required to be run to ensure a package is benign, so it will not always be run as part of the heuristic analysis, even when enabled. To force it to run regardless of the result of other heuristics, the command line argument ``--force-analyze-source`` must be supplied. To analyze ``django@5.0.6`` with source code analysis enabled and enforced, the following command may be run: - -.. code-block:: shell - - ./run_macaron.sh analyze -purl pkg:pypi/django@5.0.6 --python-venv "/tmp/.django_venv" --analyze-source --force-analyze-source - -If any suspicious patterns are triggered, this will be identified in the ``mcn_detect_malicious_metadata_1`` result for the heuristic named ``suspicious_patterns``. The output database ``output/macaron.db`` can be used to get the specific results of the analysis by querying the :class:`detect_malicious_metadata_check.result field `. This will provide detailed JSON information about all data collected by the ``mcn_detect_malicious_metadata_1`` check, including, for source code analysis, any malicious code patterns detected, what Semgrep rule detected it, the file in which it was detected, and the line number for the detection. - - *********** Future Work *********** From dcd53c9870f71f0126ed61e0820958cf3be4a534 Mon Sep 17 00:00:00 2001 From: Carl Flottmann Date: Thu, 12 Jun 2025 15:03:50 +1000 Subject: [PATCH 3/6] docs: added note to indicate new feature Signed-off-by: Carl Flottmann --- docs/source/pages/tutorials/detect_malicious_package.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/pages/tutorials/detect_malicious_package.rst b/docs/source/pages/tutorials/detect_malicious_package.rst index ace5d378b..af363c907 100644 --- a/docs/source/pages/tutorials/detect_malicious_package.rst +++ b/docs/source/pages/tutorials/detect_malicious_package.rst @@ -126,6 +126,8 @@ Note that the ``match`` constraint applies a regex pattern and can be expanded t Source Code Analysis '''''''''''''''''''' +.. note:: This is a new feature recently added to Macaron in 2025. + Macaron supports static code analysis as a malware analysis heuristic. This can be enabled by supplying the command line argument ``--analyze-source``. Macaron uses the open-source static code analysis tool Semgrep to analyse the source code of a python package, looking for malicious code patterns defined in Macaron's own Semgrep rules. Example detection patterns include identifying attempts to obfuscate source code and detecting code that exfiltrates sensitive data to remote connections. By default, the source code analyzer is run in conjunction with the other metadata heuristics. The source code heuristic is optimised such that it is not always required to be run to ensure a package is benign, so it will not always be run as part of the heuristic analysis, even when enabled. To force it to run regardless of the result of other heuristics, the command line argument ``--force-analyze-source`` must be supplied. To analyze ``django@5.0.6`` with source code analysis enabled and enforced, the following command may be run: From ed89354ddbc8067acdb3f69156e5898d12833517 Mon Sep 17 00:00:00 2001 From: Carl Flottmann Date: Thu, 12 Jun 2025 15:33:37 +1000 Subject: [PATCH 4/6] chore: PR change with conventional commit message Signed-off-by: Carl Flottmann --- docs/source/pages/tutorials/detect_malicious_package.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/pages/tutorials/detect_malicious_package.rst b/docs/source/pages/tutorials/detect_malicious_package.rst index af363c907..907d7827a 100644 --- a/docs/source/pages/tutorials/detect_malicious_package.rst +++ b/docs/source/pages/tutorials/detect_malicious_package.rst @@ -126,7 +126,7 @@ Note that the ``match`` constraint applies a regex pattern and can be expanded t Source Code Analysis '''''''''''''''''''' -.. note:: This is a new feature recently added to Macaron in 2025. +.. note:: This is a new feature recently added to Macaron. Macaron supports static code analysis as a malware analysis heuristic. This can be enabled by supplying the command line argument ``--analyze-source``. Macaron uses the open-source static code analysis tool Semgrep to analyse the source code of a python package, looking for malicious code patterns defined in Macaron's own Semgrep rules. Example detection patterns include identifying attempts to obfuscate source code and detecting code that exfiltrates sensitive data to remote connections. From 21bfa9b1f4b412b2a1148de3e3877becd90ad766 Mon Sep 17 00:00:00 2001 From: Carl Flottmann Date: Thu, 12 Jun 2025 16:00:08 +1000 Subject: [PATCH 5/6] docs: include force analyze source in cli docs Signed-off-by: Carl Flottmann --- docs/source/pages/cli_usage/command_analyze.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/source/pages/cli_usage/command_analyze.rst b/docs/source/pages/cli_usage/command_analyze.rst index a04f88bd2..2cea5b0e1 100644 --- a/docs/source/pages/cli_usage/command_analyze.rst +++ b/docs/source/pages/cli_usage/command_analyze.rst @@ -84,6 +84,9 @@ Options Allow the analysis to attempt to verify provenance files as part of its normal operations. +.. option:: --force-analyze-source + + Forces PyPI sourcecode analysis to run regardless of other heuristic results. ----------- Environment From 42f0c151ce23c92602e7785b6425dded325e7da6 Mon Sep 17 00:00:00 2001 From: Carl Flottmann Date: Thu, 12 Jun 2025 16:10:14 +1000 Subject: [PATCH 6/6] docs: add analyze source to cli docs Signed-off-by: Carl Flottmann --- docs/source/pages/cli_usage/command_analyze.rst | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/source/pages/cli_usage/command_analyze.rst b/docs/source/pages/cli_usage/command_analyze.rst index 2cea5b0e1..e5fa9b1db 100644 --- a/docs/source/pages/cli_usage/command_analyze.rst +++ b/docs/source/pages/cli_usage/command_analyze.rst @@ -86,7 +86,11 @@ Options .. option:: --force-analyze-source - Forces PyPI sourcecode analysis to run regardless of other heuristic results. + Forces PyPI sourcecode analysis to run regardless of other heuristic results. Requires '--analyze-source'. + +.. option:: --analyze-source + + For improved malware detection, analyze the source code of the (PyPI) package using a textual scan and dataflow analysis. ----------- Environment