-
Notifications
You must be signed in to change notification settings - Fork 7
Use go-enry for language detection #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
aleh-null
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice! I wonder if we really need to pinpoint main language in a repo, I like the way do count for each language.
cmd/run.go
Outdated
| lspServerExecutables[lsp.LanguageId("python")] = lsp.NewCommand("pyright-langserver", []string{"--stdio"}) | ||
| lspServerExecutables[lsp.LanguageId("javascript")] = lsp.NewCommand("typescript-language-server", []string{"--stdio"}) | ||
| lspServerExecutables[lsp.LanguageId("typescript")] = lsp.NewCommand("typescript-language-server", []string{"--stdio"}) | ||
| lspServerExecutables[lsp.Go] = lsp.NewCommand("gopls", []string{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move all this logic to lsp pkg. Something like lsp.Excecutables() map[lsp.LanguageId]lsp.Command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
we don't really pin it. we just need to know which lsp server to start. |
This PR fixes #115 where the project language was detected incorrectly. The previous language detection algorithm naively counted file extensions and returned the language corresponding to the most popular file extension.
The new algorithm is based on https://github.com/go-enry/go-enry, SOTA for language detection. The language of the file is detected by
enryand the project language is defined as the language having most bytes, not just files. This algorithm is still not ideal but it's good enough and corresponds to how GitHub detects languages for the repos.I tested it on the reported repo
djangoand it was accurate.