Skip to content

Rewrite repo_info_extractor #160

@peti2001

Description

@peti2001

The current solution has quite a few problems. But most importantly it is hard to use. It is written in Python and for developers who have no Python experience, it is not very convenient to use.

Problems with the current solution

  • hard to install, requires Python
  • too big output, JSON takes too much memory and CPU to process, GRPCGateway and filechanges use a lot of memory.
  • it is slow, because of that we have to skip large repos

Requirements:

Nice to have

  • Merge with the multi_repo_info_extractor, being able to extract multiple repos by passing tokens, credentials.
  • Serverless compatibility (Go is available in Google Cloud)
  • parse the code instead of using regex, to improve the accuracy of the import detection
  • Minimalize disk IO, don't check out the code, do it in memory
  • Support multiple outputs. Easily extendable by the community.
  • Recognize squashed commits, not just merges
  • GUI

If you have any suggestions, problems with the current implementation, please share.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions