-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New classpath and sourcepath implementation #4060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
// creates a test dir in a temporary dir containing compiled files of this test | ||
// root dir will be automatically deleted after the end of test | ||
private val rootDir = new JFile(getClass.getResource("Test.class").toURI).getParentFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could just use sys.props("partest.output") to get a path to this rootDir.
Hi there! |
import scala.tools.nsc.util.ClassRepresentation | ||
import scala.collection.mutable.ArrayBuffer | ||
|
||
case class AggregateFlatClassPath(aggregates: Seq[FlatClassPath]) extends FlatClassPath { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs some basic docs. How aggregate works? What's the typical use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll add them later in all these places.
What do you exactly mean about a quick validation? If it's e.g. a build when it's merged into current 2.11.x master, I wouldn't expect that it will work without problems after so long time. |
No, this is just building your branch without merging to 2.11.x. It's a quick check whether all tests pass on Jenkins. |
I had a quick look at the code. First impression is pretty good but this will require a few passes. My normal workflow for reviewing such a large pull request is to look at individual commits, read their commit messages and then look at the code. I check if intention expressed in the message matches what's implemented in the code. For this method to work, we need good commit messages. For example, the commit message in befe2f7 is coming from squashing a few smaller commits. It will need to be rephrased so it becomes a coherent story behind this single commit. The good way to write commit messages is to start with why (why changes are needed) and then explain what changes got applied. If there are any tricky design decisions, those should be described as well. When looking at the commit, reader should not be guessing anything. For inspiration, please check sbt/sbt@b837169, sbt/sbt@aac19fd, sbt/sbt@304796b or any other PR linked from sbt/sbt#1010. You don't need to fix all commit messages in one go. I can review this PR piecemeal so ping me once you fix commit message of a single commit so I can provide you an early feedback. PS. I assign 2.11.5 as a milestone. That means we would need to finish reviewing by around 25th or 26th of November to make it into 2.11.5. |
|
Yes, please rebase on top of flatClasspath (that I updated to match 2.11.x branch). I think we should defer classpath invalidation questions until later to limit the scope of the initial PR. For now, meaningful crash (e.g. exception with good error message) is ok when combination of both flat classpath and invalidation is used. I'm happy to discuss ideas around integrating the two once initial flat classpath PR is merged. |
c41613b
to
7684b99
Compare
@gkossakowski I pushed the new version to review. "the commit message in befe2f7" Another thing - I think these changes are not very big and in general are quite simple. As for me they could be even squashed into one commit. But obviously I understand the idea of having more smaller commits so I did what you were asking for. You stated that we don't need tests of various approaches (with a description why one of them has been chosen) in the history. Therefore I just squashed all your commits into one your commit with backports of some cleanups and corrections from my further commits - to make also this initial commit compiling. This time I significantly reduced amount of code which has been added and after that modified in another commits. I know that probably you'd like to split the main changes into smaller commits but it doesn't provide any additional value from my perspective, could be error-prone and - post factum - would force me to think which changes could be safely applied without others, run a whole test suite for each new commit (with all.clean and a build each one would take like 40-50 minutes) etc. It just doesn't have sense - now, when it's already done and there's no the mess. Regarding iterations of improvements in commit messages - I think these messages are quite sensible and there's nothing more needed taking into account the fact that changes are not very complicated and there are also docs in the code which, by the way, itself should be quite well self-documenting. Of course in the case of such messages and documentation it's more important why something has been done - not just how (as everyone can see it in the code). And I think that it's enough what I've already done. I would consider many iterations of improving commit messages as wasting my time. In the last commit I set flat classpath as a default one so it can be tested on Jenkins. I know that finally it should be just an alternative so then I can push changes with -f to revert this one change in ScalaSettings. |
You don't write the commit messages just for the reviewer but for all people that work now or will work on the project in the future. I'll spin a jenkins job and have another pass on commits. |
Job is running here: https://scala-webapps.epfl.ch/jenkins/job/scala-checkin-manual/1361/console |
Thanks, I hope that the test, which failed previously as I was testing it only on Windows, will pass after corrections. |
Ok, everything passed. |
The goal of these changes is to create an alternative classpath implementation which would be possibly more efficient and would use less memory. It can be enabled using -YclasspathImpl:flat flag. This implementation is more low level than the old one as e.g. .NET backend has been dropped so we can perform operations directly on a few Java-specific file types. It was needed to abstract over the new and the old implementation. There's added hack for sbt incremental compiler interface. It depends on ClassPath.findClass method. Since there's no common interface between the old ClassPath and the new FlatClasspath, the old one has been changed to be aware of the new one, and dynamically dispatch to FlatClasspath if it's enabled. The old classpath is all lazy so if the only call to the old classpath is coming from incremental compiler it'll be dispatched to the new classpath and nothing in old classpath will get forced.
…tation It's mostly related to replacing hardcoded strings with some constant and also to creating utils containing methods which were used in several places as duplicated code.
7684b99
to
9656152
Compare
I just added a few sentences to the commit message in the third commit. There are no changes in a code. Edit: And by the way, as I see, I introduced some grammatical mistakes but I hope you'll forgive me them. :) // Obviously I can push it once again with corrections. |
Push corrected version please. I'm reading commit messages and cross-checking against code now. I might need to finish this after dinner, though. |
Okay, I'll do it in several minutes. |
Finished implementations for various classpath types - also using ManifestResources. In accordance with the idea and the initial implementation of @gkossakowski they try to make the best use of the specificity of a given file type instead of using AbstractFile everywhere. From now flat classpath really works and can be turned on. There are following types of flat classpath: for both source and class files: - for directories - it uses the real directory having representation on a disk - for zip and jar files - it uses FileZipArchive - Grzegorz was doing some experiments with different ways of handling such files (efficiency etc.) and stated that this one is the best. - aggregate - equivalent of MergedClassPath from the old implementation. It groups classpath instances created for different files. It ensures the distinction of cp entries and merges corresponding entries for class and source files into one entry as SymbolLoaders class makes use of that. only for class files: - for ManifestResources - it's closely related to the support for JSR-223 and to classes listed in the manifest file placed in the JAR. In general right now it's not possible to use it in Scala out of the box (without using additional tools such as jarlister) as this support is postponed. The old classpath has been properly prepared in the PR created by @rjolly scala#2238 so the new one also got this feature. ManifestResources is a ZipArchive without a real underlying file placed on a disk and in addition implementing some methods declared in AbstractFile as unsupported operations. Therefore the implementation has to use the iterator provided by AbstractFile. There's added caching for classpath for zip and jar files and also for ManifestResources. It can be turned off using a flag -YdisableFlatCpCaching. The cache is just a map AbstractFile -> FlatClassPath what should reduce the number of created cp and file instances e.g. in the case of many ScalaPresentationCompilers in Scala IDE. There's no added cache for loaded symbols etc. (honestly I didn't look at that - I'm even not sure how much sense would have such things). In SymbolLoaders there's added PackageLoaderUsingFlatClassPath that was needed as the new implementation uses relatively few cp objects which can return entries from many packages. In the case of old implementation there's the structure of nested classpath objects, where each such an object can return only entries from one level of hierarchy but it returns also another classpath objects for nested levels included in it. The sbt compiler interface hack is removed - instead of it there's introduced the dynamic dispatch in a Global's classPath method. The type of a classPath is changed from ClassPath[AbstractFile] to ClassFileLookup[AbstractFile] but it should be okay as ClassFileLookup defines needed methods. I tested it with sbt and everything seemed to work correctly. But it could be problematic if someone uses a classPath from a Global and requires exactly the type of the old implementation. I'm not aware of such cases. In the case of compatibility problems with some libraries using compiler's internals we could think about bringing back some hacks. But it would be better to avoid this. There's added PathResolverFactory which is used instead of a concrete implementation of a PathResolver - it dispatches to the proper classpath implementation so, in combination with matches in some other places, in all places using classpath we create/get the proper type. Scalap's Main.scala is changed to be able to use both implementations and also to use flags related to the classpath implementation. By the way the structure of its methods has been refactored - there's no change in a way scalap works. The classpath invalidation is modified to work properly for the old cp implementation after changes made in a Global. In the case of the attempt to use the invalidation for the flat cp it just throws exception with a message that the flat one currently doesn't support the invalidation. And also that's why the partest's test for the invalidation has been changed to use (always) the old implementation. There's added an adequate comment with TODO to this file. As I always do - I removed unused private methods, unused imports and commented out lines which I saw by the way and were obvious for me. In the case of less obvious commented out lines I added TODOs as someone should look at such places some day and clean them up.
Added: - Tests comparing entries created using the old and the new classpath implementations. - Tests for AggregateFlatClassPath which check whether it obtains correct entries from classpaths specified in a constructor and whether it preserves the ordering in the case of repeated entries. - Partest test generating various dependencies (directories, zips and jars with sources and class files) and testing whether the compilation and further running an application works correctly when there are these many types of entries specified as -classpath and -sourcepath. Sample JUnit test that shows that all pieces of JUnit infrastructure work correctly now uses JUnit assert as it should do from the beginning.
The goal of these changes is to add possibility to: - compare an efficiency and a content of both cp implementations (ClassPathImplComparator) - examine the memory consumption by creating a lot of globals using specified classpath (ClassPathMemoryConsumptionTester) - it can be considered as e.g. some approximation of ScalaPresentationCompilers in Scala IDE when working with many projects ClassPathMemoryConsumptionTester is placed in main (I mean not test) sources so thanks to that it has properly, out of the box configured boot classpath etc. and it's easy to use it, e.g.: scala scala.tools.nsc.ClassPathMemoryConsumptionTester -YclasspathImpl:<implementation_to_test> -cp <some_cp> -sourcepath <some_sp> -requiredInstances 50 SomeFileToCompile.scala At the end it waits for the "exit" command so there can be used a profiler like JProfiler to look how given implementation behaves. Also set flat classpath implementation as a default one (e.g. to test on Jenkins)
9656152
to
a9847dc
Compare
Done. Sorry for the delay. |
Hi, and thanks for following the boy scout rule, @mpociecha! As you know, classpath handling is a crucial part of the compiler -- regarding correctness, performance and interop (IDE, sbt, partest, other build tools)! So, thanks for your patience while we make sure this PR won't break anything. In general, when changing this much code, it would be easier to review if refactorings were done separately from the key additions. Rest assured, though, that we absolutely want to integrate your improvement (ideally shipping it in 2.11.5). The Scala team has been running on lower occupancy, with one young father and two recent changes of continents. I have two initial concerns after a brief look:
|
I do also have to underscore the importance of good commit messages, as asked by Grzegorz earlier. Imagine you're you one year ago and you started this project by expanding the commit messages in this PR -- which additions would you have hoped for? Would the awkward line wrapping have put you off? |
As for the "Copyright ..." - it's required by lawyers in our company to include it in files we contribute to open-source software. We've done the same in Scala IDE recently. |
"Copyright (c) 2014 Contributor. All rights reserved." Regarding other things: yes, I know, it was discussed. We're trying to minimize changes which could break compatibility. There are already comments about that. As I wrote, I tested these changes with sbt. Some time ago I tested it also with Scala IDE. And in general at the beginning the goal was to merge it to the separate branch (note that PR is created for flatClasspath, not 2.11.x) so at the beginning it could be safely tested. Edit: Jerzy was faster. :) |
Ok, thanks for explaining -- I was a bit confused by the "Contributor" "variable". |
Really? "All rights reserved" has zero legal significance. |
I was too at first and even asked them directly if we need to include our names there. No. |
"Imagine you're you one year ago and you started this project by expanding (...)" But being serious - it's not hard to understand this so it's not needed to repeat this all the time. If you think that some commit messages are still not enough detailed, please, just let me know. "it would be easier to review if refactorings were done separately from the key additions." Anyway thanks for looking at that! |
If it's really so crucial for you (and it really makes a difference), just let me know, and I will split it into smaller commits where there's added e.g. one containing only classes related to directories, other containing zips and so on. And it will be pushed tomorrow. FYI, at the beginning I wasn't very eager to do this (as I said - post factum and taking into account the mentioned deadline) as this PR was waiting for more than a month and it could be said much earlier. :) |
I remember reading the README for partest the first time. How depressing that was for some reason. I feel for you. That's aside from the Scala team's high bar for commit messages and copyright infringement notices. They've come a long way, baby. |
I apologize for the long wait, and I really appreciate both your technical contribution and your patience with us as we get this reviewed and merged. As I alluded to, the whole Scala team has had personal reasons recently for being a bit less responsive than we would like to be. |
I will defer to @gkossakowski for the review -- I was just trying to give a higher-level picture of why we ask for the things we ask for in these reviews. If you like, I'm happy to organize a hangout to motivate our requests in person. We're trying to raise the bar a little more every time we do a review (isn't that the whole point?). This time your PR is the target and you suffer, next time it will be someone else's and you benefit from nicer code in that area. |
Please, just let me know whether it will be more valuable for you if it will be divided into smaller commits that way I described above. I'll just do this and then it will be easier to describe more things in commit messages (even if in my opinion they are obvious). That should be enough and finish all unrelated discussions. Sorry for doing a mess. It was not needed. |
No need to apologize! I'm very happy with the improvements over all, and, again, we are excited to include them in the next release. I understand very well that writing commit messages is kind of boring, but the fact that they're obvious to you (now!) makes it crucial to write them down, because they are not obvious to me, nor, potentially, to your future self. The messages are already pretty good, so I don't think they need much more work (though some final polish would be nice). Mostly, I believe the shape of the PR (and thus the review process) could be improved by splitting it up in commits of two categories:
Regarding ordering, it's always good to get low-risk commits out of the way first, so that if we have to revert something, we can do so with fine granularity, and you don't have to redo the boring parts of the work :-) |
We'll need to get our validator validate all of commits. Initially we decided to work on this PR against a separate branch to not overload Jenkins with validating a large number of commits. However, at this point this PR could be easily handled by Jenkins. I think we should retarget it against 2.11.x branch. @adriaanm, WDYT? |
Thanks for the explanation, hovewer - once again - I'm not the first-year CS student and I would do that way. ;) // edit: it was supposed to be a joke What I was asking about (this time should be clearer): If we want to be meticulous I wanted to know whether it's okay for you to e.g. place a whole new class in one commit (so quite big changes but they rather don't require changes in other files) or it should simulate the whole process of a creation and add another bigger methods in separate commits etc (I'd prefer to avoid that but if you want, I could do even that). It's not an unfounded question as i saw in the history even a separate commit for removed unused import - what in normal situation I would just add to some other commit to avoid the mess in the history. In other words you can choose the level of the granularity - to do this right. I simply could do this reasonably (whatever it means for each of us) in my opinion, but it's better to ask before. FYI, I was squashing commits because, when some time ago I created WIP PR, there were remarks that Jenkins had much to do. I wanted to make it better but it looks that I didn't. |
Sorry, I didn't mean to imply anything about your background. We like slightly finer granularity, roughly along the categories I outlined above. For reference, here's a bigger PR that refactors stuff using the kind of granularity that we like: https://github.com/scala/scala/pull/3820/commits |
@gkossakowski, yes, let's target 2.11.x -- I didn't even see that wasn't the case. I'll make sure Jenkins has enough capacity. PS: Sorry if I am repeating obvious things, I haven't followed this PR closely and I just moved back to SF after being out of the country for 7 months, so I'm still catching up with stuff. |
Squashing commits was fine because previously a lot of those commits were canceling each other. E.g. one would introduce some experiment and another one would get rid of it. It was a lot of noise. From my past experience, work on a non-trivial change consists of three stages when you consider commit list:
None of those stages can be skipped. That's a law of nature :) |
@adriaanm Thanks, that's what I needed - the approved reference (right, Grzegorz pasted here some links previously but formerly I thought it will be possible to avoid changes in the structure of commits - as there's no garbage but they are just quite big :) ). By the way, the another reason to have this in a separate branch was - at least I thought so - to be able to test this with all those things which could be problematic. I mean problems with some lack of compatibility etc. Edit: If you want to change the target, by the way I will rebase it once again to current 2.11.x head. |
I'll create retargeted PR. |
This commit is a very crude port of the classpath handling as it exists in the 2.12.x branch of scalac (hash: 232d95a198c94da0c6c8393624e83e9b9ac84e81), this replaces the existing Classpath code that was adapted from scalac years ago. This code was written by Grzegorz Kossakowski, Michał Pociecha, Lukas Rytz, Jason Zaugg and other scalac contributors, many thanks to them! For more information on this implementation, see the description of the PR that originally added it to scalac: scala/scala#4060 Changes made to the copied code to get it to compile with dotty: - Rename scala.tools.nsc.util.ClassPath to dotty.tools.io.ClassPath - Rename scala.tools.nsc.classpath.* to dotty.tools.dotc.classpath.* - Replace "private[nsc]" by "private[dotty]" - Changed `isClass` methods in FileUtils to skip Scala 2.11 implementation classes (needed until we stop being retro-compatible with Scala 2.11) I also copied PlainFile.scala from scalac to get access to `PlainNioFile`.
This commit is a very crude port of the classpath handling as it exists in the 2.12.x branch of scalac (hash: 232d95a), this replaces the existing Classpath code that was adapted from scalac years ago. This code was written by Grzegorz Kossakowski, Michał Pociecha, Lukas Rytz, Jason Zaugg and other scalac contributors, many thanks to them! For more information on this implementation, see the description of the PR that originally added it to scalac: scala#4060 Changes made to the copied code to get it to compile with dotty: - Rename scala.tools.nsc.util.ClassPath to dotty.tools.io.ClassPath - Rename scala.tools.nsc.classpath.* to dotty.tools.dotc.classpath.* - Replace "private[nsc]" by "private[dotty]" - Changed `isClass` methods in FileUtils to skip Scala 2.11 implementation classes (needed until we stop being retro-compatible with Scala 2.11) I also copied PlainFile.scala from scalac to get access to `PlainNioFile`.
New classpath and sourcepath implementation
It's finished work started and abandoned by @gkossakowski
Related discussion: https://groups.google.com/forum/#!msg/scala-internals/xgb5YbKvQcw/7bcbsvKkzQkJ
The main goal of Grzegorz was to improve the efficiency of classpath. At the end it turned out that the efficiency improvement is not significant so Grzegorz stopped working on that.
On the other hand it was worth to continue these works due to several reasons:
The most of the code written by Grzegorz has been rewritten.
New (flat) classpath has dedicated classes for various file types characteristic for Scala with JVM backend - what allows us to create more efficient, more low-level operations.
To reduce memory consumption there's added caching for jar/zip files. So that we can reuse existing instances, when we create entries for the same files once again.
One of the most important things took into account during creating this implementation was adding it as an optional feature, which can be marked as experimental and turned on using flag. At the beginning users will use an old classpath implementation by default and they will be able to use the new implementation when:
I took into account also planned better support for JSR-223. So it supports for instance ManifestResources introduced here: #2238
There was no proper description what's going on with these changes and how to test them but @rjolly sent me some useful papers.
This particular type of classpath can be tested using e.g. command:
jrunscript -classpath scala-compiler.jar;scala-reflect.jar;scala-library.jar -l scala
where scala-library.jar was modified using https://github.com/rjolly/jarlister
I tested flat classpath in several ways:
The flat classpath with turned on caching allowed me to reduce allocated memory in old gen from about 6GB (for current classpath) to about 750MB. So in the case of IDE and presentation compilers the gain is really significant.
Tests using a computer with SSD:
Tests using other computer with the scala project directory moved to some external HDD connected via USB cable:
Note: I changed the classPath method in Global to the dynamic dispatch. And now there's returned a base class of the old and the new classpath with common interface based on a part of API of the old classpath. I tested it with sbt and there's no problem with compiling projects etc. Previously Grzegorz created certain hack for sbt's compiler interface where current classpath was calling the flat one, when it was needed. I changed this ugly solution to this mentioned dynamic dispatch and removed a hack. We can return to something like this, if it'd turn out that it's necessary.
Note2: FYI, commits of Grzegorz don't compile - as it was one big hack with some files which were not pushed. All my commits compile and for all of them JUnit tests pass. The bootstrapping and partest tests wouldn't work for some first ones when cp implementation type would be set to flat (firstly I had to make it working). But for several last ones they work without problems so finally I even commited a change where the flat classpath is set as a default one.
Anyway I squashed about 70 commits to just several ones trying by the way to preserve some way of thinking which is shown e.g. in the commit messages.
Lastly, the below disclaimer is required by the lawyers:
THIS PROGRAM IS SUBJECT TO THE TERMS OF THE BSD 3-CLAUSE LICENSE.
THE FOLLOWING DISCLAIMER APPLIES TO ALL SOFTWARE CODE AND OTHER MATERIALS CONTRIBUTED IN CONNECTION WITH THIS SOFTWARE:
THIS SOFTWARE IS LICENSED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OF NON-INFRINGEMENT, ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THIS SOFTWARE MAY BE REDISTRIBUTED TO OTHERS ONLY BY EFFECTIVELY USING THIS OR
ANOTHER EQUIVALENT DISCLAIMER IN ADDITION TO ANY OTHER REQUIRED LICENSE TERMS.
ONLY THE SOFTWARE CODE AND OTHER MATERIALS CONTRIBUTED IN CONNECTION WITH THIS SOFTWARE, IF ANY, THAT ARE ATTACHED TO (OR OTHERWISE ACCOMPANY) THIS SUBMISSION (AND ORDINARY
COURSE CONTRIBUTIONS OF FUTURES PATCHES THERETO) ARE TO BE CONSIDERED A CONTRIBUTION. NO OTHER SOFTWARE CODE OR MATERIALS ARE A CONTRIBUTION.