Skip to content

Dart uses raw URIs for library identity. #939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lrhn opened this issue Apr 24, 2020 · 22 comments
Open

Dart uses raw URIs for library identity. #939

lrhn opened this issue Apr 24, 2020 · 22 comments
Labels
request Requests to resolve a particular developer problem

Comments

@lrhn
Copy link
Member

lrhn commented Apr 24, 2020

Dart currently identifies libraries by their import URI.
That causes troubles in some situations:

I suggest that we loosen the language specification to the point where the compiler is in charge of detecting when two URIs represent the same library, and it is allowed to canonicalize the URI of the library prior to resolving imports.

If the compiler can recognize that the underlying file system is case insensitive, then it is free (and encouraged) to consider Foo.dart and foo.dart the same library. It might still want to give a warning/hint that the code won't work on case-sensitive platforms.

If someone runs or imports lib/main.dart, the compiler is free to rewrite that to package:currentPackage/main.dart. We already do that for entry points (see also dart-lang/sdk#37978), but it sometimes fails for direct imports from the bin/ directory or other cases.

In general, allow the compiler to recognize that two URIs are actually referring the same library file using any available information, not just what the language specification can predict.
It still have to be in some way predictable. Generally, if it's the same underlying file, then it should be the same library.

@lrhn lrhn added the request Requests to resolve a particular developer problem label Apr 24, 2020
@eernstg
Copy link
Member

eernstg commented Apr 24, 2020

Generally sounds good!

However, hardlinks on UN*X-ish systems (and probably a bunch of similar mechanisms on same or other OS'es, and specialized file systems) will make it tricky to determine exactly when something is "the same underlying file".

But that shouldn't stop us from allowing an import/export URI normalization in tools, and trusting them to keep it so restricted that it doesn't hurt in practice.

@munificent
Copy link
Member

SGTM and agreed with Erik. There's probably some weird corner cases a user could jam themselves into if they tried to hard enough around symlinks, but they could just... not do that.

@lrhn
Copy link
Member Author

lrhn commented Apr 28, 2020

I probably wouldn't treat files with different paths in the same file system as the same file. It's mostly when we use package URIs as aliases for file paths, or have multiple representations of the same file path that I'd fix something. But if usability studies show that some other situation should be recognized as well, I don't want the specification to get in the way.

@lrhn
Copy link
Member Author

lrhn commented Sep 26, 2022

Having had more time, and more reasons, to think about this issue, I think we should actually ensure that resolution is mostly the same across platforms. At least, it should be predictable whether two imports denote the same library or different libraries.

  • I suggest we do "package canonicalization" of URIs such that any import(/export/path) file URI to a file in the same package, which denotes a file which also has a package: URI, is canonicalized to the package: URI.

To decide if files are in the same package, or if a file has a package: URI, we use the information in package_config.json to find the package root and library directories of available packages. It's already specified how to assign files to packages and lib/ directories of those.

This will mainly help people who do things like import '../lib/foo.dart'; from their own test/ or bin/ directories.
It matches what we currently do for the entry point.

  • Further, we reject any file path access to files in the lib/ directories of other packages. (Just use the damn package: URI.)

That is, it's a compile-time error to reach into another package's lib/ directory using a file path.

  • Further, we reject any program which (transitively) imports the same file using two different canonicalized URIs. With "same file", we mean equivalent OS path. (Don't bother checking contents or following links, but do be case insensitive on Windows) With different URIs, we mean textually different in any way. (Post canonicalization, including "package canonicalization" from above, so it's probably just going to be the different case letters on Windows).

  • We may also want to warn about any two files whose path differ only in case. You can write those on non-WIndows platforms and not refer to the same file, but it'll never work on Windows.

@mit-mit
Copy link
Member

mit-mit commented Sep 27, 2022

@lrhn can you add a few example of what we accept and what we reject?

@lrhn
Copy link
Member Author

lrhn commented Sep 27, 2022

Proposed behavior:

  • Package canonicalization, non-package URI references to own package's lib/ directory:
/// in package foo, bin/foo.dart   -- which has no `package:` URI, so must be file path.
import "../lib/foo.dart";  // Accepted, becomes "package:foo/foo.dart".
  • No cross package canonicalization of file: links
// In package `bar`, bin/bar.dart   -- which has no `package:` URI, so must be file path.
import "../../foo/lib/foo.dart"; // Rejected, now compile-time error.
//                      ^^^^^^^^^^
// This file has canonical URI 'package:foo/foo.dart'. Use it!
  • No two different URIs pointing to same file:
import "package:foo/foo.dart";
import "package:foo//foo.dart";
import "package:foo/Foo.dart";  // At least on Windows.

If any two of these occur in the same program, and they refer to the same file, we give an error because different URIs refer to the same file. We want the URI for a file to be canonical, so pick one. That allows us to safely conflate URI and file path in the compiler, which we already do unsafely.

(We should probably warn you that package:foo//foo.dart would be better without the extra /. URIs consider // different from /, and OS file systems don't)

For better or worse, it's trying to reintroduce the protection against importing the same library twice which was the original goal of the "no two libraries with the same name" rule (which also prevented different versions of the same library). That just didn't work because library names weren't good for anything else, and did not carry their own weight.

@jensjoha
Copy link

I'm not sure we can do the last part.

@jensjoha
Copy link

jensjoha commented Sep 30, 2022

How about we start with

  • Package canonicalization, non-package URI references to own package's lib/ directory.
  • No cross package canonicalization of file.

?

Both of these should be doable and bring value on their own.

@lrhn
Copy link
Member Author

lrhn commented Sep 30, 2022

I'd be fine with that. We'll probably need at least a comment in the language specification, saying that compilers are allowed to canonicalize URIs in some ways, without getting into too many details.

Any objections?
@leafpetersen @munificent @eernstg @stereotype441 @natebosch @rmacnak-google @kallentu

@natebosch
Copy link
Member

natebosch commented Sep 30, 2022

It would be better if we had not allowed imports like ../lib once we introduced packages - if there is a canonical representation it's less confusing for readers to see that canonical representation in the file. It might be too painful to break working imports, so I think it's OK to make those working imports work as close to how the user expects as possible.

@jensjoha
Copy link

jensjoha commented Oct 3, 2022

One thing to note is that dart test unconditionally uses the file uri, meaning that if one were to do dart test lib/file.dart this change would give an error. Also to note is that something like that is apparently done internally.

@stereotype441
Copy link
Member

I would actually prefer a different approach:

  • Treat the file path as the thing that identifies the file, rather than the URI. This fixes file-vs-package canonicalization problems once and for all, because it no longer matters whether the user reached a given file via a package URI or a file URI (and I think it matches user expectations well).
  • If an import, export, or part directive uses a path that resolves to a file, but the analyzer can detect that there will be problems if the package is published and then downloaded by someone else, it should issue a hint, and pub should refuse to publish the package. But otherwise the package should run without problems. This includes:
    • Using a relative path to reach outside of a package's lib directory
    • Using an absolute file path
    • On a case-insensitive file system, using a relative path whose capitalization doesn't match the capitalization returned by listing the containing directory/directories (e.g., if listing the current directory shows a file called Foo.dart, then importing Foo.dart is ok but importing foo.dart isn't).
  • When compiling or running a program on a platform with a case-insensitive filesystem, when chasing transitive imports, exports, and parts from the entry point, if any file is reached via multiple file paths, issue an error and abort compilation.

So, in terms of @lrhn's examples, this would mean:

/// in package foo, bin/foo.dart
import "../lib/foo.dart";  // Accepted, refers to file "lib/foo.dart" in package foo.  No hint.
// In package `bar`, bin/bar.dart   -- which has no `package:` URI, so must be file path.
import "../../foo/lib/foo.dart"; // Allowed, refers to the same file as `package:foo/foo.dart`
// However, the analyzer issues a hint suggesting that the import be changed to `package:foo/foo.dart`
// And `pub` refuses to publish the package.
// Assuming Windows, and that listing the `lib` directory of package `foo` yields a file called `foo.dart`: 
import "package:foo/foo.dart"; // (1)
import "package:foo//foo.dart"; // (2)
import "package:foo/Foo.dart"; // (3)
// (1), (2), and (3) are all permissible in isolation, but (3) causes a hint (and blocks publishing of the package).
// (1) and (2) can be used in the same program, because they both canonicalize to the same path
// Attempting to use (3) in the same program as (1) or (2) results in an error (even if the imports occur in different files).

@jensjoha
Copy link

jensjoha commented Oct 4, 2022

As before I'm not sure we can do that. E.g. the VM has a HttpAwareFileSystem and with that saying import 'http://whatnot//foo.dart'; might return something different from import 'http://whatnot/foo.dart'; (we don't control the server).

I'm all for not issuing an error when dotting from bin/bar.dart to package:foo (while letting the analyzer give a hint and pub refuse to publish such a package) though.

@jensjoha
Copy link

Any new thoughts and/or decisions on this one?

@jensjoha
Copy link

jensjoha commented Jan 6, 2023

ping.

@mraleph
Copy link
Member

mraleph commented Jan 7, 2023

FWIW I think it is fine to remove support for HTTP imports from Dart VM. I don't think it really pulls its weight these days.

@mit-mit
Copy link
Member

mit-mit commented Jan 9, 2023

@jensjoha if we do indeed de-support the HTTP-based imports, are all your concerns addressed?

@natebosch
Copy link
Member

Will the frontend server still be able to use http URIs?

@jensjoha
Copy link

@jensjoha if we do indeed de-support the HTTP-based imports, are all your concerns addressed?

Not really, that was just a good example of us not being able to do it 100%. I think we'll in practise have trouble "canonicalizing correctly" for all file systems.

I would also much prefer rules we could enforce without asking the filesystem (if nothing else, asking the filesystem is slow).
It might be something like "it's an error if the path part of a uri contains a double-slash" and "it's an error if two uris are used in a program where the only difference is in the capitilization of those uris".
These shouldn't really hit anyone anyway (considering it causes trouble like crashes or types not matching etc) and allow us to do a recovery (say rewrite the uri to remove double-slashes).

(It could even be "dart files (and as a consequence all imports and exports) users only lower case", but that might be too extreme.)

This on top of "package canonicalization, non-package URI references to own package's lib/ directory" and "no cross package canonicalization of files" would probably fix most issues I think -- and be doable.

@lrhn
Copy link
Member Author

lrhn commented Jan 11, 2023

If we don't want to look at the file system, then rules that would prevent conflict could be, as you mention:

  • It's a compile-time error if two normalized (as by the Dart Uri class) import/export/part URIs (after resolving relative URI references against the imported library's URI) occurring in the libraries in the transitive import/export closure of the same library, differ only in the case of ASCII letters.
  • It's a compile-time error if a normalized import/export/part URI path contains two consecutive / characters.
  • It's a compile-time error if a normalized import/export/part URI path ends in a / character. (That's probably an error today, but technically an HTTP server can send a file for such a path, but it makes it hard to reason about library roots if they can also be files.)
  • It's a compile-time error if an import/export/part URI has a query or fragment part.
  • Further, if an import/export/part URI is a non-package: URI, and it has a path inside precisely one configured package's package directory (packageUri field of the package config, typically the /lib dir), meaning that the URI has the package-URI as a prefix, the URI is package-normalized to a package: URI by replacing the prefix by package:<packageName>/.

WDYT?

@jensjoha
Copy link

Sounds good to me.
@johnniwinther any thoughts (as someone actually using dart on Windows)?

@johnniwinther
Copy link
Member

SGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
request Requests to resolve a particular developer problem
Projects
None yet
Development

No branches or pull requests

9 participants