Skip to content

Conversation

@artmoskvin
Copy link
Collaborator

@artmoskvin artmoskvin commented Sep 30, 2024

This PR fixes #115 where the project language was detected incorrectly. The previous language detection algorithm naively counted file extensions and returned the language corresponding to the most popular file extension.

The new algorithm is based on https://github.com/go-enry/go-enry, SOTA for language detection. The language of the file is detected by enry and the project language is defined as the language having most bytes, not just files. This algorithm is still not ideal but it's good enough and corresponds to how GitHub detects languages for the repos.

I tested it on the reported repo django and it was accurate.

@artmoskvin artmoskvin requested a review from aleh-null October 1, 2024 14:42
@artmoskvin artmoskvin marked this pull request as ready for review October 1, 2024 14:42
Copy link
Collaborator

@aleh-null aleh-null left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice! I wonder if we really need to pinpoint main language in a repo, I like the way do count for each language.

cmd/run.go Outdated
lspServerExecutables[lsp.LanguageId("python")] = lsp.NewCommand("pyright-langserver", []string{"--stdio"})
lspServerExecutables[lsp.LanguageId("javascript")] = lsp.NewCommand("typescript-language-server", []string{"--stdio"})
lspServerExecutables[lsp.LanguageId("typescript")] = lsp.NewCommand("typescript-language-server", []string{"--stdio"})
lspServerExecutables[lsp.Go] = lsp.NewCommand("gopls", []string{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move all this logic to lsp pkg. Something like lsp.Excecutables() map[lsp.LanguageId]lsp.Command

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@artmoskvin
Copy link
Collaborator Author

I wonder if we really need to pinpoint main language in a repo, I like the way do count for each language.

we don't really pin it. we just need to know which lsp server to start.

@artmoskvin artmoskvin merged commit 8d6a754 into main Oct 1, 2024
@artmoskvin artmoskvin deleted the artm/better-lang-detection branch October 1, 2024 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

search_symbols fails with None is not iterable

3 participants