-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Prepare for FOSS compliance #247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh-pages
Are you sure you want to change the base?
Conversation
The proposed list of free licences should probably be wider than just the OSI-approved software licenses. Here's why:
Crucially, this broader definition of "free" still firmly excludes:
|
An audit of all packages in nltk_data/index.xml has been performed from a FOSS (Free and Open Source Software) compliance perspective. This comprehensive and exhaustive categorization of all packages has resulted in two new files added to this pull request:
These two lists together provide a complete overview of the licensing status for every single package in the NLTK data collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses FOSS compliance by categorizing NLTK data packages into OSI-compliant and non-compliant groups. The primary purpose is to provide a clear framework for splitting nltk_data
to support mainstream software distribution channels that require OSI-compliant licensing.
- Creates comprehensive documentation of license status for all NLTK data packages
- Establishes clear categories for FOSS-compliant vs. non-compliant packages
- Provides foundation for potential future redistribution strategy
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
free_packages_foss.md | Documents packages with OSI-approved, public domain, or FOSS-compatible licenses |
nonfree_packages_foss.md | Documents packages with restrictive, non-commercial, or ambiguous licenses |
Co-authored-by: Copilot <[email protected]>
Marking this PR as "Ready for Review" to encourage broader feedback and community input. While I anticipate some modifications may be necessary, the current state provides a solid foundation for discussion and refinement regarding FOSS compliance. All feedback and suggestions are welcome! |
This PR is intended to address Issue #102 by documenting a possible way to split
nltk_data
into OSI (Open Source Initiative)-compliant and nonfree parts.Why use the OSI rather than the FSF definition of free?
The overwhelming majority of major software and data distributors (Linux distros, conda-forge, Homebrew, etc.) use the OSI definition as their primary standard. The FSF definition is important for the free software movement and documentation/content (e.g., GNU, Wikimedia), but is not the baseline for most mainstream software/data distribution channels.
Two markdown files are introduced:
free_packages_osi.md
: Packages with OSI-approved, public domain, or similarly permissive licenses.nonfree_packages_osi.md
: Packages with more restrictive, ambiguous, or otherwise non-OSI-compliant licenses.Every effort has been made to classify each package based on available license information, but feedback and corrections are very welcome—especially for any unclear or disputed cases.
Discussion is welcome and encouraged! If you spot anything that should be reviewed or improved, please join the conversation.