Skip to content

Proposal: remove unpopular licenses #33467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wxiaoguang opened this issue Feb 1, 2025 · 9 comments · Fixed by #33832
Closed

Proposal: remove unpopular licenses #33467

wxiaoguang opened this issue Feb 1, 2025 · 9 comments · Fixed by #33832
Labels
proposal/accepted We have reviewed the proposal and agree that it should be implemented like that/at all. type/proposal The new feature has not been accepted yet but needs to be discussed first.

Comments

@wxiaoguang
Copy link
Contributor

There are more than 720 licenses in Gitea https://github.com/go-gitea/gitea/tree/main/options/license

Most of them are out-dated/inactive/unpopular

We could just keep about 20-40 popular licenses:

Benefits:

  • reduce binary size
  • speed up execution time
  • save memory
    • especially the License Detector, it consumes more than 100MB memory at the moment
    • by removing most unnecessary licenses, License Detector could only consume 3-5MB memory then
@wxiaoguang wxiaoguang added the type/proposal The new feature has not been accepted yet but needs to be discussed first. label Feb 1, 2025
@delvh
Copy link
Member

delvh commented Feb 2, 2025

Another benefit:

  • users are not overloaded with possible options anymore (I've often asked myself "which of these licenses could be the one I want")

@delvh delvh added the proposal/accepted We have reviewed the proposal and agree that it should be implemented like that/at all. label Feb 2, 2025
@wxiaoguang
Copy link
Contributor Author

So, as the first step, need to stop this:

@lunny @techknowlogick

Image

@lunny
Copy link
Member

lunny commented Feb 3, 2025

So, as the first step, need to stop this:

@lunny @techknowlogick

Image

A pull request could be sent to change cron-licenses.yml to only manually. #33486

@silverwind
Copy link
Member

silverwind commented Feb 3, 2025

especially the License Detector, it consumes more than 100MB memory at the moment

100MB persistent? Excluding actual license data, it ideally should have close to zero persistent memory, only run when needed (e.g. when a repo was pushed to, does not matter if license update takes a few seconds in such cases, ideally it start off a lazy goroutine, debounced on like ~10s of repo push inactivity).

@silverwind
Copy link
Member

20-40 popular licenses

How about keeping 50 most popular? Gitea's usersbase is diverse, 20 sounds definitely too low.

lafriks pushed a commit that referenced this issue Feb 3, 2025
Help #33467
The file can be changed or removed after that issue is resolved.
@eeyrjmr
Copy link
Contributor

eeyrjmr commented Feb 7, 2025

20-40 popular licenses

How about keeping 50 most popular? Gitea's usersbase is diverse, 20 sounds definitely too low.

maybe...
There only really appears to be ~30 that are really used on github,

import pandas as pd
url= "https://raw.githubusercontent.com/github/innovationgraph/refs/heads/main/data/licenses.csv"
tables= pd.read_csv(url)
tables[(tables.year == 2024)].pivot_table(values="num_pushers", index="spdx_license",aggfunc="sum").sort_values('num_pushers',ascending=False).plot(kind='barh',figsize=(9,9))

2024 data
Image

2023 data

Image

@silverwind
Copy link
Member

silverwind commented Feb 11, 2025

If there is a reliable data source on license usage, could define a threshold of let's say >= 0.1% usage above which to include a certain license. I would commit the popularity source data into the repo with some instructions on how to update it.

@yp05327
Copy link
Contributor

yp05327 commented Mar 6, 2025

another benefit: we can remove the codes for detecting same license contents with similar license name. Actually, these codes are “magic codes”, it is better to remove them.
I think this is a good solution, the only one point is how we maintain these popular licenses list in the future. e.g. how to define the license is a popular license, are there any third part resources we can refer, are there any rules when the license is not popular any more.

@wxiaoguang
Copy link
Contributor Author

-> Only keep popular licenses #33832

wxiaoguang added a commit that referenced this issue Mar 9, 2025
hiifong pushed a commit to hiifong/gitea that referenced this issue Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal/accepted We have reviewed the proposal and agree that it should be implemented like that/at all. type/proposal The new feature has not been accepted yet but needs to be discussed first.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants