-
Notifications
You must be signed in to change notification settings - Fork 8
Description
This is an issue proposed by Kurt Hornik, an R core developer and member of the CRAN Team.
Background
The licenses of R packages are specified in via the License field of the package DESCRIPTION file. This uses a standardized format which is documented in the Licensing section of "Writing R Extensions", and can be processed using code in the tools package (currently unexported, in file license.R).
A key element of the license info infrastructure is the license database (file share/licenses/license.db in the R sources) which provides DCF format entries like
Name: GNU General Public License
Abbrev: GPL
Version: 2
SSS: GPL-2
OSI: open (https://opensource.org/licenses/gpl-license)
FSF: free (https://www.gnu.org/licenses/license-list.html#GPLv2)
File: share/licenses/GPL-2
URL: https://www.r-project.org/Licenses/GPL-2
FOSS: yes
with "obvious" meanings.
System Package Data Exchange (SPDX) is an open standard capable of representing systems with digital components which allows the expression of components, licenses, copyrights, security references and other metadata relating to systems. With its original purpose to improve license compliance, SPDX provides a mechanism to "communicate license information in a simple, efficient, portable and machine-readable manner" using SPDX identifiers such as GPL-2.0-or-later or MIT OR Apache-2.0 based on license ids from the SPDX license list, a database of "commonly found licenses and exceptions used in free and open or collaborative software, data, hardware, or documentation" which "includes a standardized short identifier, the full name, the license text, and a canonical permanent URL for each license and exception".
See https://spdx.dev/learn/handling-license-info/ for more information about handling license info with SPDX.
It would clearly be very useful to have a way of turning the R package license specs into the corresponding SPDX license ids, and perhaps also take advantage of the SPDX license URLs when hyperlinking R package license information, as for example done on the CRAN package web pages.
Task 1: Enhance the R license database with SPDX info
The basic idea is to start by enhancing the R license.db with SPDX info, e.g., add
SPDX: GPL-2.0-only
to the entry shown above. This assumes that the SPDX license URL can be obtained from the SPDX license short identifier (by using https://spdx.org/licenses/ as base URL and adding .html, which correctly gives https://spdx.org/licenses/GPL-2.0-only.html for the case at hand): if not, one could consider adding separate SPDX fields for identifier and URL, or use something like
SPDX: GPL-2.0-only <https://spdx.org/licenses/GPL-2.0-only.html>
(which cannot hurt but adds some processing overhead).
As the number of licenses in license.db is small this might be done manually with reference to https://spdx.org/licenses/, or programmatically using https://github.com/spdx/license-list-data/blob/main/json/licenses.json.
Task 2: Write a function to convert expanded R license info to SPDX license info
After task 1, one should be able to write a function which uses the expansion of the R license info obtained by tools:::analyze_license() to obtain the corresponding SPDX license info. For example,
tools:::analyze_license("GPL-2")$expansions
#> [[1]]
#> [1] "GPL-2"would be looked up to give GPL-2.0-only, and
tools:::analyze_license("GPL (>= 2)")$expansions
#> [[1]]
#> [1] "GPL-2" "GPL-3"would be looked up and combined into GPL-2.0-only OR GPL-3.0-only.
Of course SPDX would also allow GPL-2.0-or-later, but this is not straightforward to get from the expansions. One could map some of the standardized components directly, in our case
tools:::analyze_license("GPL (>= 2)")$components
#> [1] "GPL (>= 2)"and the function for converting could special-case such components.
Task 3: Write a function to convert SPDX license info to R license info
As an extension, one could look into the possibility of turning SPDX license info into the corresponding R license info.