Skip to content

rustup appears to depend on new-ish system call in linux #2472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
biork opened this issue Aug 30, 2020 · 17 comments
Closed

rustup appears to depend on new-ish system call in linux #2472

biork opened this issue Aug 30, 2020 · 17 comments
Labels

Comments

@biork
Copy link

biork commented Aug 30, 2020

Using rustup version 1.22.1
"rustup update" fails and rolls back with the error message:
error: count not copy file from...

The source and destination don't matter. I straced it and found the failure happens in a copy_file_range system call which is somewhat new. I'm running on Linux 3.10 with glibc 2.17. The system call triggers an EOPNOTSUPP error.

This problem has been attributed to NFS in other bug reports. May or may not be related to NFS, though I don't believe it is.
Clearly copy_file_range is related, and given that that's a newer system call, explains the problem on an old Linux.

Steps

  1. Install any not-most-recent version of rust (so that rustup has updating to do) on a Linux version older than 4.5 with glibc older than 2.17 (I'm not sure how old is required).
  2. Run "rustup update"
  3. watch it fail

Possible Solution(s)

The copy_file_range() system call first appeared in Linux 4.5, but glibc 2.27 provides a user-space emulation when it is not available.
Build rustup on the oldest possible Linux installation.

Notes

Output of rustup --version:
rustup 1.22.1 (b01adbb 2020-07-08)

Output of rustup show:
Default host: x86_64-unknown-linux-gnu
rustup home: /research/users/rogerk/.rustup

stable-x86_64-unknown-linux-gnu (default)
rustc 1.41.0 (5e1a79984 2020-01-27)

@biork biork added the bug label Aug 30, 2020
@rbtcollins
Copy link
Contributor

copy_file_range is used by rust automatically on Linux - https://github.com/rust-lang/rust/blob/ac48e62db85e6db4bbe026490381ab205f4a614d/library/std/src/sys/unix/fs.rs#L1122

What you're seeing is just the automatic probing. Please provide the actual symptoms - run rustup with -v and provide the console output please.

@biork
Copy link
Author

biork commented Aug 31, 2020 via email

@kinnison
Copy link
Contributor

Since Rust is defined as needing 2.6.32/2.11 at minimum right now, it sounds like it should be working with 3.10/2.17.

Could you please do as Robert asked and run rustup -v update in the scenario you describe, and provide the console log in the issue?

@biork
Copy link
Author

biork commented Oct 22, 2020

Hi Daniel. Sorry, think I replied with console log earlier without posting. Here it is.
Can provide strace output, too, if that helps.

rustup -v update 1> rustup.out 2> rustup.err
...gave...

rustup.out:
stable-x86_64-unknown-linux-gnu update failed - rustc 1.41.0 (5e1a79984 2020-01-27)

rustup.err:
verbose: read metadata version: '12'
verbose: updating existing install for 'stable-x86_64-unknown-linux-gnu'
verbose: toolchain directory: '/research/users/rogerk/.rustup/toolchains/stable-x86_64-unknown-linux-gnu'
info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'
verbose: creating temp root: /research/users/rogerk/.rustup/tmp
verbose: creating temp file: /research/users/rogerk/.rustup/tmp/w_rqlvte5yehd3y6_file
verbose: downloading file from: 'https://static.rust-lang.org/dist/channel-rust-stable.toml.sha256'
verbose: downloading with reqwest
verbose: deleted temp file: /research/users/rogerk/.rustup/tmp/w_rqlvte5yehd3y6_file
verbose: creating temp file: /research/users/rogerk/.rustup/tmp/62dmtsy3uskly_kp_file.toml
verbose: downloading file from: 'https://static.rust-lang.org/dist/channel-rust-stable.toml'
verbose: downloading with reqwest
verbose: checksum passed
verbose: creating temp file: /research/users/rogerk/.rustup/tmp/pjf52s8_mykf0jj9_file
verbose: downloading file from: 'https://static.rust-lang.org/dist/channel-rust-stable.toml.asc'
verbose: downloading with reqwest
verbose: deleted temp file: /research/users/rogerk/.rustup/tmp/pjf52s8_mykf0jj9_file
verbose: Good signature from on https://static.rust-lang.org/dist/channel-rust-stable.toml from:
verbose: from builtin Rust release key
verbose: RSA/85AB96E6-FA1BE5FE - Rust Language (Tag and Release Signing Key) [email protected]
verbose: Fingerprint: 108F 6620 5EAE B0AA A8DD 5E1C 85AB 96E6 FA1B E5FE
verbose: deleted temp file: /research/users/rogerk/.rustup/tmp/62dmtsy3uskly_kp_file.toml
info: latest update on 2020-10-08, rust version 1.47.0 (18bf6b4f0 2020-10-07)
info: downloading component 'cargo'
verbose: creating Download Directory directory: '/research/users/rogerk/.rustup/downloads'
verbose: downloading file from: 'https://static.rust-lang.org/dist/2020-10-08/cargo-0.48.0-x86_64-unknown-linux-gnu.tar.xz'
verbose: downloading with reqwest
verbose: checksum passed
info: downloading component 'clippy'
verbose: downloading file from: 'https://static.rust-lang.org/dist/2020-10-08/clippy-0.0.212-x86_64-unknown-linux-gnu.tar.xz'
verbose: downloading with reqwest
verbose: checksum passed
info: downloading component 'rust-docs'
verbose: downloading file from: 'https://static.rust-lang.org/dist/2020-10-08/rust-docs-1.47.0-x86_64-unknown-linux-gnu.tar.xz'
verbose: downloading with reqwest
verbose: checksum passed
info: downloading component 'rust-std'
verbose: downloading file from: 'https://static.rust-lang.org/dist/2020-10-08/rust-std-1.47.0-x86_64-unknown-linux-gnu.tar.xz'
verbose: downloading with reqwest
verbose: checksum passed
info: downloading component 'rustc'
verbose: downloading file from: 'https://static.rust-lang.org/dist/2020-10-08/rustc-1.47.0-x86_64-unknown-linux-gnu.tar.xz'
verbose: downloading with reqwest
verbose: checksum passed
info: downloading component 'rustfmt'
verbose: downloading file from: 'https://static.rust-lang.org/dist/2020-10-08/rustfmt-1.4.20-x86_64-unknown-linux-gnu.tar.xz'
verbose: downloading with reqwest
verbose: checksum passed
info: removing previous version of component 'cargo'
verbose: creating temp file: /research/users/rogerk/.rustup/tmp/5o90cquy3xa53n2__file
verbose: creating temp file: /research/users/rogerk/.rustup/tmp/c3mjwqx4y2xqy3vm_file
verbose: deleted temp file: /research/users/rogerk/.rustup/tmp/c3mjwqx4y2xqy3vm_file
verbose: deleted temp file: /research/users/rogerk/.rustup/tmp/5o90cquy3xa53n2__file
info: rolling back changes
error: could not copy file from '/research/users/rogerk/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components' to '/research/users/rogerk/.rustup/tmp/c3mjwqx4y2xqy3vm_file'
info: checking for self-updates
info: cleaning up downloads & tmp directories

@kinnison
Copy link
Contributor

Okay I don't think anything else is triggering so perhaps another step is to try and create a minimal test case for using copy_file_range and replicate the failure on your system. Are you in a position to have a go at that?

@biork
Copy link
Author

biork commented Oct 23, 2020

Sure and done. When I run the program inlined below I get the expected:
copy_file_range: Operation not supported

Other possibly pertinent facts:

  1. rpm -a glibc reports glibc-2.17-307.el7.1.x86_64 on the host with the trouble, so my glibc is well before the 2.27 that exports copy_file_range, but the kernel is ostensibly new enough.
  2. my login filesystem is an NFS mount of what I believe, but am not sure, is a WIndohs file server.
  3. I see NFS related comments in the kernel source, so (speculation) problem might be that the syscall is present but it is failing for host-local reasons (e.g. NFS) and failure is being mis-reported as "not supported." I'd need to do more kernel-diving...
/**
 * This program tests availability of copy_file_range on a system described
 * by uname -a:
 * Linux solu 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
 *
 * It was modeled on the following strace log:
 *
 * open("/research/users/rogerk/.rustup/tmp/e934jgtdi1tk6h56_file", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
 * close(4)                                = 0
 * lstat("/research/users/rogerk/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components", {st_mode=S_IFREG|0700, st_size=212, ...}) = 0
 * open("/research/users/rogerk/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components", O_RDONLY|O_CLOEXEC) = 4
 * fstat(4, {st_mode=S_IFREG|0700, st_size=212, ...}) = 0
 * open("/research/users/rogerk/.rustup/tmp/e934jgtdi1tk6h56_file", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0100700) = 10
 * fstat(10, {st_mode=S_IFREG|0700, st_size=0, ...}) = 0
 * fchmod(10, 0100700)                     = 0
 * copy_file_range(4, NULL, 10, NULL, 212, 0) = -1 EOPNOTSUPP (Operation not supported)
 * close(10)                               = 0
 * close(4)                                = 0
 *
 * Note that I have to define the syscall locally because otherwise
 * gcc main.c yields:
 * /tmp/ccJMhzM7.o: In function `main':
 * main.c:(.text+0x17b): undefined reference to `copy_file_range'
 * collect2: error: ld returned 1 exit status
 */

#include <stdlib.h>
#include <stdio.h>
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <fcntl.h>

static loff_t
copy_file_range(int fd_in, loff_t *off_in, int fd_out,
	loff_t *off_out, size_t len, unsigned int flags) {
	return syscall(__NR_copy_file_range, fd_in, off_in, fd_out, off_out, len, flags);
}

int main( int argc, char *argv[] ) {

	const char *msg = NULL;
	int dst_fd = -1;
	int src_fd = -1;

	if( argc < 3 ) {
		printf( "%s <srcfile> <dstfile>\n", argv[0] );
		return EXIT_FAILURE;
	}

	do {

		const char *srcfile = argv[1];
		const char *tmpfile = argv[2];
		struct stat info;

		// Following seems to be a write test.
		dst_fd = open(tmpfile, O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666);
		if( dst_fd == -1 ) {
			msg = "open-ing dst #1";
		}
		close( dst_fd );
		dst_fd = -1;

		src_fd = open( srcfile, O_RDONLY|O_CLOEXEC );
		if( src_fd == -1 ) {
			msg = "open-ing src";
			break;
		}

		if( fstat( src_fd, &info ) == -1 ) {
			msg = "fstat-ing src";
			break;
		}

		dst_fd = open(tmpfile, O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0100700);
		if( dst_fd == -1 ) {
			msg = "open-ing dst #2";
			break;
		}
		if( fchmod(dst_fd, 0100700) == -1 ) {
			msg = "fchmod-ing dst";
			break;
		}

		if( copy_file_range( src_fd, NULL, dst_fd, NULL, info.st_size, 0) == -1 ) {
			msg = "copy_file_range";
		}

	} while( 0 );

	if( dst_fd >= 0 ) {
		close(dst_fd);
	}
	if( src_fd >= 0 ) {
		close(src_fd);
	}

	if( msg ) {
		perror( msg );
		return EXIT_FAILURE;
	}
	return EXIT_SUCCESS;
}

@rbtcollins
Copy link
Contributor

rbtcollins commented Oct 23, 2020 via email

@biork
Copy link
Author

biork commented Oct 23, 2020

@rbtcollins

  1. I was following up on (my understanding of) kinnison's suggestion.
  2. While the dogma you cite is usually true, in this case, given the failure of a minimal test of a syscall on which rustup apparently depends, it's not clear to me what wrapping the same call in Rust or any other code is going to show. The syscall is failing on a host that otherwise seems to meet Rust's minimal reqs.
    Perhaps this says it's "not Rust's problem," but it's at least a corner case that the reqs maybe should address.

@rbtcollins
Copy link
Contributor

Because as I linked earlier in #2472 (comment) , rust is meant to automatically handle that call failing and fallback. We know the syscall is failing - thats not interesting. What is interesting is whether the fallback code is working or if the fallback code is failing ; if it is working thensomething else - like for example you've got readonly files in rustup's work area - is causing the issue. We can discriminate if a testcase using the rust wrapper of that codepath on some known good files fails.

I'm not trying to deny the problem or make you jump through hoops - I'm trying to discriminate between issues we can help you directly with, and issues that you'll need to work with the rust team on (e.g. the fallback code that I linked you to failing).

That said, it looks like master has changed - see https://github.com/rust-lang/rust/blob/9abf81afa8c20ea48c8515dc4bbc714118502f5e/library/std/src/sys/unix/fs.rs#L1235 now - so I think this will get fixed soon, but we'll need to build rustup with this, and I'm not sure if its in a release etc yet.

@biork
Copy link
Author

biork commented Oct 23, 2020

Ah, ok, the first (fallback) part wasn't clear to me...in which case I'm not sure how to interpret @kinnison 's suggestion. ...extract the relevant fallback code from rustup and run that? I can do pretty much anything you suggest.

@rbtcollins
Copy link
Contributor

I don't think anything is needed - ENOTSUPP is clearly added to the nightly library; if you wanted a working rustup for yourself you could build rustup from source using nightly and it should all just work from that point on.

@biork
Copy link
Author

biork commented Oct 23, 2020

Ok. I'll give that a try.

@kinnison
Copy link
Contributor

The commit which properly introduces the ENOTSUPP is rust-lang/rust@1316c78 which was the 12th August. In theory there's been enough time since then that the next rustup release ought to include it. So it would be very useful if you can confirm if building a rustup with current stable fixes it, or if it needs nightly.

If you clone rustup and run cargo run -- --no-modify-path -y then it will install the newly built rustup and then you can try updating your toolchain. If that still fails, then cargo +nightly run -- --no-modify-path -y will install a rustup built with nightly and you can try again.

Good luck.

@biork
Copy link
Author

biork commented Oct 23, 2020

No joy, from either path. (builds of several crates failed )
FWIW, in the interest of moving forward on stalled (Rust-dependent) projects here, I tried the dumb and usually reliable strategy of hiding-by-renaming (not deleting!) the current .rustup and .cargo directories and doing a strictly fresh install using the curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh approach. Even this doesn't work and, from the similarity of error messages with the rustup update approach, for the same reasons.
Next I tried changing the default destinations by exporting non-default CARGO_HOME and RUSTUP_HOME paths. No joy.
Finally, I ran rustup update on another host I use with same kernel but no NFS (which was at rustc --version 1.41). It worked.
So it appears even fresh installs are failing on certain machine/filesystem configs.

@kinnison
Copy link
Contributor

kinnison commented Dec 3, 2020

Hi @biork

I'm interested in seeing if this is still the case with the new (1.23) rustup. Could you have another go for me? (Sorry for the ongoing pain)

D.

@biork
Copy link
Author

biork commented Dec 8, 2020 via email

@kinnison
Copy link
Contributor

kinnison commented Dec 8, 2020

Wonderful, thank you. I'm going to close this issue, but ping either here or on a fresh issue if problems recur.

@kinnison kinnison closed this as completed Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants