Rewrite repo_info_extractor

The current solution has quite a few problems. But most importantly it is hard to use. It is written in Python and for developers who have no Python experience, it is not very convenient to use.

# Problems with the current solution
- hard to install, requires Python  
- too big output, JSON takes too much memory and CPU to process, GRPCGateway and filechanges use a lot of memory.
- it is slow, because of that we have to skip large repos
# Requirements:
- It must have the same feature that the current solution has
- Binary executable (Windows, Linux, OSX)
- The output must be parsable line by line to avoid loading everything into the memory
- Make at least 10 times faster. Must be able to progress theses in less than 10 minutes:
  - https://github.com/aosp-mirror/platform_frameworks_base
  - https://github.com/eXpandFramework/eXpand
  - https://github.com/expand/eXpand
  - https://github.com/eXpandFramework/eXpand.lab
  - https://github.com/eXpandFramework/lab
  - https://github.com/laravel-enso/enso
  - https://github.com/fellipegpbotelho/odonto-uni
- Import existing trained Python model to Go to find similar emails. Being able to detect similar emails. For example, if I have a commit with test.peter@example.com peti2001@example.com, testpeter@codersrank.io it has to recognize it comes from the same user. 

# Nice to have
- Merge with the multi_repo_info_extractor, being able to extract multiple repos by passing tokens, credentials.
- Serverless compatibility (Go is available in Google Cloud)
- parse the code instead of using regex, to improve the accuracy of the import detection
- Minimalize disk IO, don't check out the code, do it in memory
- Support multiple outputs. Easily extendable by the community.
- Recognize squashed commits, not just merges
- GUI

If you have any suggestions, problems with the current implementation, please share.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rewrite repo_info_extractor #160

Problems with the current solution

Requirements:

Nice to have

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rewrite repo_info_extractor #160

Description

Problems with the current solution

Requirements:

Nice to have

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions