Skip to content

Suggestion: improved process for language server setup #1489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
guibou opened this issue Feb 10, 2021 · 12 comments
Open

Suggestion: improved process for language server setup #1489

guibou opened this issue Feb 10, 2021 · 12 comments
Labels

Comments

@guibou
Copy link
Contributor

guibou commented Feb 10, 2021

I'm configuring haskell-language-server on a huge codebase. I do have many problems with the approach currently documented in https://rules-haskell.readthedocs.io/en/latest/haskell-use-cases.html#configuring-ide-integration-with-ghcide.

I open this ticket in order to describe a different approach which mostly only impact documentation. Based on the result of the discussion, I could open a documentation PR.

First of all, I do have a huge codebase, with many Haskell packages, depth dependency tree. I also have build time generated Haskell code, and dependencies on many shared library built in different languages. I'm not aware of a more complex codebase using rules_haskell.

The documentation proposes to set a global haskell_repl which references all the haskell_library of the repository. The documentation does not document it, but you may have two solutions to references your libraries:

  • using from_source, each Haskell module will be loaded from source. This solution does not scale for me, haskell-language-server is taking hours (and lot of RAM) in order to start.
  • using from_binary, the packages linked in the repl will be first compiled by bazel and then added to the build information as -package. This solution works for me, however it forces the user to wait for a full build of the codebase, which takes more than an hours or a quarter of hour if the user does have a fast remote cache access.
  • It forces the creation of a global repl referencing all the target of the repository. This global repl is difficult to maintain (you need to add / remove entries everytime that something is changed in the repo). We used an automated process (using bazel query) in order to generate the BUILD file for this repl. One problem was that some targets are failing, for some reasons, so we had to tag all of our failing target to ensure that they won't end in the repl.

I tried a different approach than with success. Instead of building a global repl, I'm using a repl associated with an haskell_library which uses the file.

The hie-bios file does accept the path to the file being checked as first argument. I'm then using the following query:

bazel build $(bazel query "kind(haskell_library, //...) intersect somepath(kind(haskell_library, //...), $(kazel query "$FILEPATH" ))")@repl

where FILEPATH is the path to the file provided by HLS, and @repl is my repl attached to each haskell_library or haskell_binary.

There is a few positives points with this approach:

  • I can use from_binary on the repl without forcing the rebuild of the full codebase.
  • If I'm using from_source, it is way faster because it does not need to evaluate 1000 Haskell files
  • I do still have a global "wildcard" in order to feed from_binary and from_source for each repl, that I can tune.
  • No need for a global repl anymore. That's a huge improvement. I don't need to maintain a static file for it (and the linter scripts in order to ensure that the static file is up to date).
  • More robust to failure. If one "target" is broken for any reason, it will only fail in the language server for that target, instead of crashing the repl for all targets. This was really painful in my former implementation and I had to manually tag all the "working" targets in the repo.
  • Does now work for haskell_binary and haskell_test, because they can have a repl.

What are your thoughts about this approach? I can update the rules_haskell documentation with it if you think that's a good idea.

Haskell language server

Unrelated note, I'm using haskell-language-server instead of ghcide as depicted in the documentation. I have no special problem with it, except:

  • Haskell Language Server failure when rule have same package name #1482, fails if there is two packages with the same name in the project (such as //third_party/hackage:lens and //my/project:lens which shares the same package name lens). This problem appears less with the approach I just depicted.
  • I do have a problem with PATH. haskell-language-server does not find the ghc used by my project (this ghc is provided by nixpkgs_package in bazel). I had to write a wrapper for that.

Considering that ghcide is now deprecated in favor of haskell-language-server, I propose to update the documentation to recommend haskell-language-server instead.

@aherrmann
Copy link
Member

Thank you for raising this! I agree the docs on IDE integration could definitely use improvement. As they say at the beginning, the current status is preliminary. I'd be very happy to take PRs on this! I don't think I'll have time to work on hls support myself in the foreseeable future.

Considering that ghcide is now deprecated in favor of haskell-language-server, I propose to update the documentation to recommend haskell-language-server instead.

Agreed, the docs should be updated from ghcide to hls.

The documentation does not document it, but you may have two solutions to references your libraries: [...]

Yes, this should be clarified in the use-case docs as well. The from_binary and from_source attributes are documented in the API docs. Just to clarify, the from_source and from_binary attribute don't directly reference targets, instead they are patterns to match on targets included in the transitive closure of deps. More technically, they are of type attr.string_list, i.e. they cannot incur dependency.

I'm then using the following query:

Could you clarify what motivates this query and how it works?
IIUC it aims to find the closest haskell_library to a given source file and picks it's autogenerated @repl target. But, IIUC it relies on somepath returning the shortest path. Is that guaranteed? I can't find it stated in the docs. Also, this only seems to support haskell_library, how do you treat haskell_binary targets?

Another note, query does not take configuration into account. So, query may yield bogus results on cross platform projects. In the past I have used this cquery to discover the haskell_library/binary/test to a source file. Any thoughts on this?

  • I can use from_binary on the repl without forcing the rebuild of the full codebase.

  • If I'm using from_source, it is way faster because it does not need to evaluate 1000 Haskell files

Could you clarify how you set these attributes and what values you set them to?
The @repl targets are autogenerated with from_source set to only include the target at hand. I.e. all dependencies are loaded as packages and, therefore, have to be built. E.g. if the given target depends on all other Haskell targets in the project, then indeed all other targets would need to be built first.

I do still have a global "wildcard" in order to feed from_binary and from_source for each repl, that I can tune.

Similar to above, it's not clear to me where you can define these global patterns.

No need for a global repl anymore. That's a huge improvement. I don't need to maintain a static file for it (and the linter scripts in order to ensure that the static file is up to date).

Agreed, requiring a single global repl for hls integration is not what we want. The historical reason is that multi-cradle support was not available at the time when this was developed. See related discussion here.

Relatedly, I experimented with generating hie-bios files on-the-fly directly from the aspect in the past, see here. Unfortunately, I ran out of time on these experiments. An issue I encountered in that approach, that may be relevant here, was that ghcide (this was not on hls, yet) ended up creating too many overlapping ghc sessions, which ate up too much memory.

More robust to failure. If one "target" is broken for any reason, it will only fail in the language server for that target, instead of crashing the repl for all targets. This was really painful in my former implementation and I had to manually tag all the "working" targets in the repo.

Build failure should only be an issue for from_binary dependencies. from_source dependencies should not be built and any errors in them can be reported by ghci or hls.

Does now work for haskell_binary and haskell_test, because they can have a repl.

I don't understand this point. haskell_binary and haskell_test could have a repl target before, in fact rules_haskell autogenerates such for them.

@guibou
Copy link
Contributor Author

guibou commented Feb 11, 2021

Thank you for reading this and commenting, let's address your questions.

The documentation does not document it, but you may have two solutions to references your libraries: [...]

[..] Just to clarify, the from_source and from_binary attribute don't directly reference targets, instead they are patterns to match on targets included in the transitive closure of deps. More technically, they are of type attr.string_list, i.e. they cannot incur dependency.

Yes, thank you for the clarification. I'm using them indeed as pattern, for example, I do have //... in from_source and //third_party/... in from_binary, considering that third_party are not changing much and may very well have a weird build setup.

I'm then using the following query:

Could you clarify what motivates this query and how it works?
IIUC it aims to find the closest haskell_library to a given source file and picks it's autogenerated @repl target. But, IIUC it relies on somepath returning the shortest path. Is that guaranteed? I can't find it stated in the docs. Also, this only seems to support haskell_library, how do you treat haskell_binary targets?

somepath returning the shortest path is not guaranteed, but that's something I observed. Actually, the semantic is correct with any path, it will just have an influence on performances.

bazel build $(bazel query "kind(haskell_library, //...) intersect somepath(kind(haskell_library, //...), $(kazel query "$FILEPATH" ))")@repl

You are right about your understanding of the query. Let me detail it:

  • somepath does indeed selects "one" haskell_library, based on the "target name' for $FILEPATH.
  • I'm getting the "target name" for $FILEPATH using bazel query $FILEPATH, but it actually works by passing $FILEPATH directly in the main query.
  • The intersect is in order to refine the final result which contains the initial file. I don't really understand why, but the example usafe of somepath in the documentation does actually includes the intersect.
  • I do treat haskell_binary similarly. One solution may be to just do the same query and join them with some, but I have no guarantee about which one will be returned first. Another solution (which I currently use) is to do two queries, one with haskell_library, a other with haskell_binary and pick the first result returned.

Another really good solution is:

(kind(haskell_library, //...) union kind(haskell_binary, //...)) intersect rdeps(kind(haskell_library, //...) union kind(haskell_binary, //...), build/rule/haskell/prelude/P.hs, 1)

If ensures that the shortest path is selected with rdeps(...., 1) and picks both haskell_library and haskell_binary. However there is a problem if the Haskell file is loaded through a pre-processing because the depth of the dependency tree until it finds a haskell_library may be more than 2.

Another note, query does not take configuration into account. So, query may yield bogus results on cross platform projects. In the past I have used this cquery to discover the haskell_library/binary/test to a source file. Any thoughts on this?

No. I just tried cquery and it fails for me (we may have a weird configuration for which it does not work). Fortunately we do not have any configuration which impacts the dependency tree of Haskell files. Thank you raising the issue, I'll try to make it work with cquery.

  • I can use from_binary on the repl without forcing the rebuild of the full codebase.
  • If I'm using from_source, it is way faster because it does not need to evaluate 1000 Haskell files

Could you clarify how you set these attributes and what values you set them to?
The @repl targets are autogenerated with from_source set to only include the target at hand. I.e. all dependencies are loaded as packages and, therefore, have to be built. E.g. if the given target depends on all other Haskell targets in the project, then indeed all other targets would need to be built first.

I was unclear. The @repl I'm using in the example is our custom repl (called krepl actually) which does have a from_binary and from_source set using the union of global values + values set on each local haskell_library or haskell_binary. (We do have a wrapper, k_haskell_xxx which accepts theses flags and dispatch to the official haskell_library/repl, ....)

Build failure should only be an issue for from_binary dependencies. from_source dependencies should not be built and any errors in them can be reported by ghci or hls.

Well, some from_source targets are failing because they cannot find some C symbols, which is not happening when using from_binary. That's totally a problem with our setup that I didn't take the time to understand.

Does now work for haskell_binary and haskell_test, because they can have a repl.

I don't understand this point. haskell_binary and haskell_test could have a repl target before, in fact rules_haskell autogenerates such for them.

I was unclear, I meant that, afaik, you cannot have a "global" repl which references haskell_binary and haskell_library at the same time. I may be wrong however.

Thank you for the different link to your experimentation, I'll have a look.

@aherrmann
Copy link
Member

Thanks for clarifying and explaining the query.

Well, some from_source targets are failing because they cannot find some C symbols, which is not happening when using from_binary. That's totally a problem with our setup that I didn't take the time to understand.

Just a hunch: Are these missing libraries coming from external repositories? For the hie-bios file we don't prefix command line library search paths with the execroot. This works fine for local library targets, but external libraries will have entries such as -Lexternal/some_workspace/.... These paths are only valid in the execroot, but not in the repository root. To fix this we'd need to add a prefix, e.g. $RULES_HASKELL_EXEC_ROOT to external paths here and resolve and replace it in the .hie-bios script. There was a similar issue here.

I was unclear, I meant that, afaik, you cannot have a "global" repl which references haskell_binary and haskell_library at the same time. I may be wrong however.

That should work, haskell_repl can have multiple deps and they can be either of haskell_library|binary|test.

@guibou
Copy link
Contributor Author

guibou commented Feb 13, 2021

Just a hunch: Are these missing libraries coming from external repositories?

No, that's actually not missing libraries, but missing symbols.

@guibou
Copy link
Contributor Author

guibou commented Feb 13, 2021

Good news, I did some progress reading haskell/haskell-language-server#1160 (comment) , compiling my haskell-language-server as dynamic does solves most of my missing symbol issues in a template haskell context.

@Anrock
Copy link

Anrock commented May 7, 2021

Hello. Any progress on this?

@Xophmeister Xophmeister added P3 minor: not priorized type: documentation labels Mar 3, 2022
@googleson78
Copy link
Contributor

What's the status here? Is this only a docs issue, or are there some improvements we could make towards better compatibility with HLS?

@tonicebrian
Copy link

Hi, I'm also interested in this. Currently I'm replicating Bazel in stack just for Language Server Support. Anyway we can help? It seems like @guibou did all the work already

@guibou
Copy link
Contributor Author

guibou commented Oct 19, 2022

@tonicebrian sorry, I won't help you more here, at work I've stopped using bazel for HLS a few months ago and last week I've merged the removal of bazel from the codebase at work.

@tonicebrian
Copy link

@tonicebrian sorry, I won't help you more here, at work I've stopped using bazel for HLS a few months ago and last week I've merged the removal of bazel from the codebase at work.

Ups, if you don't mind could you explain why did you abandon bazel? I'm doing the reverse path because I think it will help me in my polyglot purescript+Haskell project. I don't want to find out 6 months in the future that it wasn't the right choice. With what did you swap bazel?

@guibou
Copy link
Contributor Author

guibou commented Oct 20, 2022

@tonicebrian Really long story short. Bazel had never worked for me, it had never answered any of its promise of being simple, fast, robust, composable, ... (lot of buzzwords).

We swapped bazel to a builder written integrally in nix. We got the same features in 600 lines of nix and got a reproducible build, remote build, remote cache, as well as feature that bazel was not providing (as far as I know) without patching it or the rulesets.

In a first company, I started experimenting with nix as a build system and I had promising results, but I was not in a position to propose a switch (I was hired as a bazel consultant). In another company, I joined when bazel was already the build system of choice. Considering my bazel "experience", other developeres asked me if we should switch away from bazel. I decided to not, the situation was acceptable and I did not wanted to be involved much in build system. However, after one year, I had to fix / workaround a lot of problems similar to what happened in the previous company, that the idea to move away from bazel reappeared. Things accelerated recently after we spent a few days tracking a numerical error in simulation code which was actually due to an hermeticity problem in bazel (in short, bazel is using the system library loader in its test runner, which had impacts on our numerical code). I then restarted the nix experiment. It took me a day to write a POC in order to push binaries built with nix in production. We then decided that we'll prioritize the transition

Do I recommend this approach? It depends. The current benefit is that things are working as intended. The main drawbacks are that nix have slower evaluation time (no-op build is ~1.5s), which is not a problem for us, and that we have a super specific piece of software that nobody else than me and a few colleague understand. Is that a problem? Yes. Is that worse than the 3k lines of bazel code + 5 forks of ruleset + 30 bugs open in our bugtracker, I don't know. Would this approach have been possible if I had not fighted 4 years with bazel, definitely no, because I acquired a lot of knowledge about build system, build, link, ghc, gcc, python, ... and how theses stuff are interacting together.

Sorry, my answer is mostly feelings and buzzwords (we may not have the same definitions of what "composable" or "robust" means). I would be able to develop with countless examples of what nightmares I had to fight with bazel, and unfortunately I don't have enough feedbacks on the new nix based build system to comment. Maybe we'll go back to bazel in a few month, who knows (Note that our nix build system uses BUILD.bazel files as input and I'm tempted to keep both build system working in the codebase in a few month so it should not be difficult to revert this change in case we discover a blocker in the future)

In short, try bazel. If it works for you, be happy. If not, don't hesitate to move to something else.

@tonicebrian
Copy link

Thanks for the thorough response, I've just realised that I asked you same thing in Twitter a couple of weeks before without noticing that both avatars were the same person 😀

My use case is server in Haskell and frontend in purescript with some code generation for DTOs and client calls, and lots of microservices. That was the reason for going for bazel: multilanguague, dependencies between different language artifacts and remote caching. BUT rules_purescript is far from satisfactory and now I'm more than doubtful. I need to learn nix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants