Skip to content

Exclude private repositories #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yarikoptic opened this issue Feb 21, 2025 · 1 comment · Fixed by #61
Closed

Exclude private repositories #60

yarikoptic opened this issue Feb 21, 2025 · 1 comment · Fixed by #61

Comments

@yarikoptic
Copy link
Member

For the purpose of this service we should avoid indexing private repositories. Since for some services we might require using API keys, if service returns private ones, we should avoid it "explicitly".

From prior "research" it seems that forgejo has query option to avoid listing private repos, so it should be used while querying such services. And such option is absent on GIN so we might not be able to avoid listing them but we should be able to filter out post-fact using private attribute.

@jwodder
Copy link
Member

jwodder commented Feb 24, 2025

@yarikoptic For the record, using the below script, I was unable to find any private DataLad-using repositories on GIN or hub.datalad.org that were listed in datalad-repos.json:

#!/usr/bin/env -S pipx run
# /// script
# requires-python = ">=3.11"
# dependencies = ["ghreq ~= 0.1", "python-dotenv ~= 1.0"]
# ///

from __future__ import annotations
import json
import os
from pathlib import Path
from dotenv import load_dotenv
import ghreq

REPO_DATABASE = Path("datalad-repos.json")


def main() -> None:
    load_dotenv()
    with REPO_DATABASE.open() as fp:
        db = json.load(fp)
    db["gin"] = rm_private(
        db["gin"], "https://gin.g-node.org/api/v1/", os.environ["GIN_TOKEN"]
    )
    db["hub_datalad_org"] = rm_private(
        db["hub_datalad_org"],
        "https://hub.datalad.org/api/v1/",
        os.environ["HUB_DATALAD_ORG_TOKEN"],
    )
    with REPO_DATABASE.open("w") as fp:
        print(json.dumps(db, indent=4), file=fp)


def rm_private(repos: list, api_url: str, token: str) -> list:
    public = []
    with ghreq.Client(
        api_url=api_url,
        accept=None,
        api_version=None,
        headers={"Authorization": f"token {token}"},
    ) as client:
        for r in repos:
            if r["status"] == "gone":
                public.append(r)
            else:
                data = client.get(f"/repos/{r['name']}")
                if data["private"]:
                    print(f"Found private repo r['name'] on {api_url}")
                else:
                    public.append(r)
    return public


if __name__ == "__main__":
    main()

yarikoptic added a commit that referenced this issue Feb 24, 2025
GIN family: Exclude private repos
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants