Skip to content

Add several undocumented flags (--bazel, --package-root and --cache-map) in support of Bazel #4759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Jun 5, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
13a52e1
New --bazel flag
Mar 14, 2018
4618115
Implement behavior changes for --bazel flag.
Mar 14, 2018
d188e9f
Don't raise OSError for makedirs('')
Mar 14, 2018
9d32192
Use zero mtimes in validate_meta() if bazel
Mar 14, 2018
f1c7834
More zero mtime, and don't compare size -> always believe cache file …
Mar 14, 2018
aff2f72
Different tack for cache under Bazel: use --cache-dir but omit pyversion
Mar 16, 2018
8048ee4
Add --cache-map option
Mar 17, 2018
7bc3a0b
Pass cache mapping on command line
Mar 17, 2018
3632c55
Tweak --cache-map arg parsing
Mar 17, 2018
24de3d5
Drop the -i requirement for --bazel; it's not strictly needed
Mar 19, 2018
ec719df
Fix lint
Mar 20, 2018
fcc9604
Somehow mypy/api.py had DOS line endings. Fix this. (#4768)
gvanrossum Mar 20, 2018
c62a22e
Add docstring for relpath()
Mar 22, 2018
72d2657
Merge remote-tracking branch 'upstream/master' into bazel-flag
Mar 26, 2018
4dd90b8
Monumental effort to support packages without __init__.py for Bazel (…
Mar 26, 2018
c667dfc
Fix mypy errors
Mar 27, 2018
a43ecdc
Fix tests
Mar 27, 2018
e17a702
Fix flake8
Mar 27, 2018
0f4c8af
Merge master (except typeshed; resolved one conflict in main.py)
Apr 4, 2018
cd10259
Merge branch 'master' into bazel-flag
Apr 4, 2018
eade116
Make --package-root a separate flag
Apr 5, 2018
e7ffae4
Merge master
Apr 17, 2018
f19892a
Sync typeshed (same as master)
Apr 18, 2018
a0072ab
Add tests for --package-root and fix behavior
Apr 18, 2018
0096ee0
Add rudimentary tests for --cache-map and --bazel.
Apr 18, 2018
cf9f6e0
Merge master
Apr 23, 2018
268b74c
Fix two type errors found by
Apr 23, 2018
dc1f262
Replace relpath() with os.path.relpath() (D'oh)
Apr 23, 2018
e103d63
Use os.path.relpath() in --package-root normalization
Apr 23, 2018
80daeae
Break long line
Apr 23, 2018
b7fcc9e
Another attempt at fixing the Windows test failures
Apr 23, 2018
363f4b6
Merge master
Apr 24, 2018
2314b54
Another attempt at fixing --package-root on Windows
Apr 24, 2018
605b963
Merge master
Apr 24, 2018
a89e7a4
Make Windows tests pass by judicious use of os.path.normpath()
Apr 24, 2018
29f3f59
Merge master (some tests broken)
Apr 25, 2018
052fa08
Change the way FileSystemCache gets the package_root
Apr 25, 2018
fb24a14
Merge remote-tracking branch 'upstream/master' into bazel-flag
Apr 25, 2018
05a448d
Avoid mypy error (see https://github.com/python/typeshed/issues/2081)
Apr 25, 2018
8b88cd2
Merge master
May 1, 2018
aa5facd
Fix fscache.py bug: only generate fake __init__.py, not .pyi
May 2, 2018
755b540
Merge master 211a65470079f9080881db1e069a7577a44c1bb1
May 4, 2018
7884ed4
Avoid updating hash in cache meta
May 7, 2018
01f3893
Merge master (22ee7137a5f21b398b18908cca405d104bc060d4)
May 21, 2018
1f26bdc
Merge upstream/master (464b553a4cf1e8b8a2b78d6cf6ebb6597f13af60)
May 24, 2018
3d7f633
Respond to code review
May 29, 2018
7559fc8
Merge master (0447473de2c00b5429ed0072249ecb4f134e2f38)
May 29, 2018
210de9c
Don't force tree.path absolute in non-bazel -- it breaks Windows
May 29, 2018
15b7d87
Add annotations to new functions, to make self-test pass
May 29, 2018
4135e63
Hrmpf
May 29, 2018
6cbcb08
Merge master
Jun 1, 2018
52649f5
merge master (again)
Jun 2, 2018
3bd329e
Fix bad merge
Jun 2, 2018
389c908
Hopefully fix flaky tests (don't use the fscache in getmtime())
Jun 3, 2018
4665933
Merge master one last time
Jun 5, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 63 additions & 21 deletions mypy/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,6 @@
Graph = Dict[str, 'State']


def getmtime(name: str) -> int:
return int(os.path.getmtime(name))


# TODO: Get rid of BuildResult. We might as well return a BuildManager.
class BuildResult:
"""The result of a successful build.
Expand Down Expand Up @@ -230,7 +226,12 @@ def compute_lib_path(sources: List[BuildSource],
# to the lib_path
# TODO: Don't do this in some cases; for motivation see see
# https://github.com/python/mypy/issues/4195#issuecomment-341915031
lib_path.appendleft(os.getcwd())
if options.bazel:
dir = '.'
else:
dir = os.getcwd()
if dir not in lib_path:
lib_path.appendleft(dir)

# Prepend a config-defined mypy path.
lib_path.extendleft(options.mypy_path)
Expand Down Expand Up @@ -687,6 +688,31 @@ def maybe_swap_for_shadow_path(self, path: str) -> str:
def get_stat(self, path: str) -> os.stat_result:
return self.fscache.stat(self.maybe_swap_for_shadow_path(path))

def getmtime(self, path: str) -> int:
"""Return a file's mtime; but 0 in bazel mode.

(Bazel's distributed cache doesn't like filesystem metadata to
end up in output files.)
"""
if self.options.bazel:
return 0
else:
return int(os.path.getmtime(path))

def normpath(self, path: str) -> str:
"""Convert path to absolute; but to relative in bazel mode.

(Bazel's distributed cache doesn't like filesystem metadata to
end up in output files.)
"""
# TODO: Could we always use relpath? (A worry in non-bazel
# mode would be that a moved file may change its full module
# name without changing its size, mtime or hash.)
if self.options.bazel:
return os.path.relpath(path)
else:
return os.path.abspath(path)

def all_imported_modules_in_file(self,
file: MypyFile) -> List[Tuple[int, str, int]]:
"""Find all reachable import statements in a file.
Expand Down Expand Up @@ -1094,14 +1120,17 @@ def get_cache_names(id: str, path: str, manager: BuildManager) -> Tuple[str, str

Args:
id: module ID
path: module path (used to recognize packages)
path: module path
cache_dir: cache directory
pyversion: Python version (major, minor)

Returns:
A tuple with the file names to be used for the meta JSON, the
data JSON, and the fine-grained deps JSON, respectively.
"""
pair = manager.options.cache_map.get(path)
if pair is not None:
return (pair[0], pair[1], None)
prefix = _cache_dir_prefix(manager, id)
is_package = os.path.basename(path).startswith('__init__.py')
if is_package:
Expand Down Expand Up @@ -1232,22 +1261,23 @@ def validate_meta(meta: Optional[CacheMeta], id: str, path: Optional[str],
manager.log('Metadata abandoned for {}: errors were previously ignored'.format(id))
return None

bazel = manager.options.bazel
assert path is not None, "Internal error: meta was provided without a path"
# Check data_json; assume if its mtime matches it's good.
# TODO: stat() errors
data_mtime = getmtime(meta.data_json)
data_mtime = manager.getmtime(meta.data_json)
if data_mtime != meta.data_mtime:
manager.log('Metadata abandoned for {}: data cache is modified'.format(id))
return None
deps_mtime = None
if manager.options.cache_fine_grained:
assert meta.deps_json
deps_mtime = getmtime(meta.deps_json)
deps_mtime = manager.getmtime(meta.deps_json)
if deps_mtime != meta.deps_mtime:
manager.log('Metadata abandoned for {}: deps cache is modified'.format(id))
return None

path = os.path.abspath(path)
path = manager.normpath(path)
try:
st = manager.get_stat(path)
except OSError:
Expand All @@ -1272,12 +1302,14 @@ def validate_meta(meta: Optional[CacheMeta], id: str, path: Optional[str],
fine_grained_cache = manager.use_fine_grained_cache()

size = st.st_size
if size != meta.size and not fine_grained_cache:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, such a comment would be useful

# Bazel ensures the cache is valid.
if size != meta.size and not bazel and not fine_grained_cache:
manager.log('Metadata abandoned for {}: file {} has different size'.format(id, path))
return None

mtime = int(st.st_mtime)
if mtime != meta.mtime or path != meta.path:
# Bazel ensures the cache is valid.
mtime = 0 if bazel else int(st.st_mtime)
if not bazel and (mtime != meta.mtime or path != meta.path):
try:
source_hash = manager.fscache.md5(path)
except (OSError, UnicodeDecodeError, DecodeError):
Expand Down Expand Up @@ -1317,7 +1349,7 @@ def validate_meta(meta: Optional[CacheMeta], id: str, path: Optional[str],
meta_str = json.dumps(meta_dict, indent=2, sort_keys=True)
else:
meta_str = json.dumps(meta_dict)
meta_json, _, _2 = get_cache_names(id, path, manager)
meta_json, _, _ = get_cache_names(id, path, manager)
manager.log('Updating mtime for {}: file {}, meta {}, mtime {}'
.format(id, path, meta_json, meta.mtime))
atomic_write(meta_json, meta_str, '\n') # Ignore errors, it's just an optimization.
Expand Down Expand Up @@ -1373,12 +1405,20 @@ def write_cache(id: str, path: str, tree: MypyFile,
corresponding to the metadata that was written (the latter may
be None if the cache could not be written).
"""
# Obtain file paths
path = os.path.abspath(path)
# For Bazel we use relative paths and zero mtimes.
bazel = manager.options.bazel

# Obtain file paths.
path = manager.normpath(path)
meta_json, data_json, deps_json = get_cache_names(id, path, manager)
manager.log('Writing {} {} {} {} {}'.format(
id, path, meta_json, data_json, deps_json))

# Update tree.path so that in bazel mode it's made relative (since
# sometimes paths leak out).
if bazel:
tree.path = path

# Make sure directory for cache files exists
parent = os.path.dirname(data_json)
assert os.path.dirname(meta_json) == parent
Expand All @@ -1390,7 +1430,8 @@ def write_cache(id: str, path: str, tree: MypyFile,

# Obtain and set up metadata
try:
os.makedirs(parent, exist_ok=True)
if parent:
os.makedirs(parent, exist_ok=True)
st = manager.get_stat(path)
except OSError as err:
manager.log("Cannot get stat for {}: {}".format(path, err))
Expand All @@ -1405,10 +1446,11 @@ def write_cache(id: str, path: str, tree: MypyFile,
return interface_hash, None

# Write data cache file, if applicable
# Note that for Bazel we don't record the data file's mtime.
if old_interface_hash == interface_hash:
# If the interface is unchanged, the cached data is guaranteed
# to be equivalent, and we only need to update the metadata.
data_mtime = getmtime(data_json)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

data_mtime = manager.getmtime(data_json)
manager.trace("Interface for {} is unchanged".format(id))
else:
manager.trace("Interface for {} has changed".format(id))
Expand All @@ -1425,17 +1467,17 @@ def write_cache(id: str, path: str, tree: MypyFile,
# Both have the effect of slowing down the next run a
# little bit due to an out-of-date cache file.
return interface_hash, None
data_mtime = getmtime(data_json)
data_mtime = manager.getmtime(data_json)

deps_mtime = None
if deps_json:
deps_str = json_dumps(serialized_fine_grained_deps, manager.options.debug_cache)
if not atomic_write(deps_json, deps_str, '\n'):
manager.log("Error writing deps JSON file {}".format(deps_json))
return interface_hash, None
deps_mtime = getmtime(deps_json)
deps_mtime = manager.getmtime(deps_json)

mtime = int(st.st_mtime)
mtime = 0 if bazel else int(st.st_mtime)
size = st.st_size
options = manager.options.clone_for_module(id)
assert source_hash is not None
Expand Down Expand Up @@ -1475,7 +1517,7 @@ def delete_cache(id: str, path: str, manager: BuildManager) -> None:
This avoids inconsistent states with cache files from different mypy runs,
see #4043 for an example.
"""
path = os.path.abspath(path)
path = manager.normpath(path)
cache_paths = get_cache_names(id, path, manager)
manager.log('Deleting {} {} {}'.format(id, path, " ".join(x for x in cache_paths if x)))

Expand Down
2 changes: 2 additions & 0 deletions mypy/find_sources.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,8 @@ def get_init_file(self, dir: str) -> Optional[str]:
f = os.path.join(dir, '__init__' + ext)
if self.fscache.isfile(f):
return f
if ext == '.py' and self.fscache.init_under_package_root(f):
return f
return None


Expand Down
117 changes: 109 additions & 8 deletions mypy/fscache.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,19 @@
import hashlib
import os
import stat
from typing import Dict, List, Tuple
from typing import Dict, List, Optional, Set, Tuple


class FileSystemCache:
def __init__(self) -> None:
# The package root is not flushed with the caches.
# It is set by set_package_root() below.
self.package_root = [] # type: List[str]
self.flush()

def set_package_root(self, package_root: List[str]) -> None:
self.package_root = package_root

def flush(self) -> None:
"""Start another transaction and empty all caches."""
self.stat_cache = {} # type: Dict[str, os.stat_result]
Expand All @@ -49,6 +55,7 @@ def flush(self) -> None:
self.read_cache = {} # type: Dict[str, bytes]
self.read_error_cache = {} # type: Dict[str, Exception]
self.hash_cache = {} # type: Dict[str, str]
self.fake_package_cache = set() # type: Set[str]

def stat(self, path: str) -> os.stat_result:
if path in self.stat_cache:
Expand All @@ -58,16 +65,100 @@ def stat(self, path: str) -> os.stat_result:
try:
st = os.stat(path)
except OSError as err:
if self.init_under_package_root(path):
try:
return self._fake_init(path)
except OSError:
pass
# Take a copy to get rid of associated traceback and frame objects.
# Just assigning to __traceback__ doesn't free them.
self.stat_error_cache[path] = copy_os_error(err)
raise err
self.stat_cache[path] = st
return st

def init_under_package_root(self, path: str) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use some documentation. I'm not sure what's the goal/what's going on here.

"""Is this path an __init__.py under a package root?

This is used to detect packages that don't contain __init__.py
files, which is needed to support Bazel. The function should
only be called for non-existing files.

It will return True if it refers to a __init__.py file that
Bazel would create, so that at runtime Python would think the
directory containing it is a package. For this to work you
must pass one or more package roots using the --package-root
flag.

As an exceptional case, any directory that is a package root
itself will not be considered to contain a __init__.py file.
This is different from the rules Bazel itself applies, but is
necessary for mypy to properly distinguish packages from other
directories.

See https://docs.bazel.build/versions/master/be/python.html,
where this behavior is described under legacy_create_init.
"""
if not self.package_root:
return False
dirname, basename = os.path.split(path)
if basename != '__init__.py':
return False
try:
st = self.stat(dirname)
except OSError:
return False
else:
if not stat.S_ISDIR(st.st_mode):
return False
ok = False
drive, path = os.path.splitdrive(path) # Ignore Windows drive name
path = os.path.normpath(path)
for root in self.package_root:
if path.startswith(root):
if path == root + basename:
# A package root itself is never a package.
ok = False
break
else:
ok = True
return ok

def _fake_init(self, path: str) -> os.stat_result:
"""Prime the cache with a fake __init__.py file.

This makes code that looks for path believe an empty file by
that name exists. Should only be called after
init_under_package_root() returns True.
"""
dirname, basename = os.path.split(path)
assert basename == '__init__.py', path
assert not os.path.exists(path), path # Not cached!
dirname = os.path.normpath(dirname)
st = self.stat(dirname) # May raise OSError
# Get stat result as a sequence so we can modify it.
# (Alas, typeshed's os.stat_result is not a sequence yet.)
tpl = tuple(st) # type: ignore
seq = list(tpl) # type: List[float]
seq[stat.ST_MODE] = stat.S_IFREG | 0o444
seq[stat.ST_INO] = 1
seq[stat.ST_NLINK] = 1
seq[stat.ST_SIZE] = 0
tpl = tuple(seq)
st = os.stat_result(tpl)
self.stat_cache[path] = st
# Make listdir() and read() also pretend this file exists.
self.fake_package_cache.add(dirname)
return st

def listdir(self, path: str) -> List[str]:
path = os.path.normpath(path)
if path in self.listdir_cache:
return self.listdir_cache[path]
res = self.listdir_cache[path]
# Check the fake cache.
if path in self.fake_package_cache and '__init__.py' not in res:
res.append('__init__.py') # Updates the result as well as the cache
return res
if path in self.listdir_error_cache:
raise copy_os_error(self.listdir_error_cache[path])
try:
Expand All @@ -77,6 +168,9 @@ def listdir(self, path: str) -> List[str]:
self.listdir_error_cache[path] = copy_os_error(err)
raise err
self.listdir_cache[path] = results
# Check the fake cache.
if path in self.fake_package_cache and '__init__.py' not in results:
results.append('__init__.py')
return results

def isfile(self, path: str) -> bool:
Expand Down Expand Up @@ -133,12 +227,19 @@ def read(self, path: str) -> bytes:
# earlier instant than the mtime reported by self.stat().
self.stat(path)

try:
with open(path, 'rb') as f:
data = f.read()
except Exception as err:
self.read_error_cache[path] = err
raise
dirname, basename = os.path.split(path)
dirname = os.path.normpath(dirname)
# Check the fake cache.
if basename == '__init__.py' and dirname in self.fake_package_cache:
data = b''
else:
try:
with open(path, 'rb') as f:
data = f.read()
except OSError as err:
self.read_error_cache[path] = err
raise

md5hash = hashlib.md5(data).hexdigest()
self.read_cache[path] = data
self.hash_cache[path] = md5hash
Expand Down
Loading