Skip to content

ssh: QA the implementation / migrate to fsspec #6064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
isidentical opened this issue May 26, 2021 · 6 comments · Fixed by #6295
Closed

ssh: QA the implementation / migrate to fsspec #6064

isidentical opened this issue May 26, 2021 · 6 comments · Fixed by #6295
Assignees
Labels
fs: ssh Related to the SSH filesystem

Comments

@isidentical
Copy link
Contributor

fsspec itself comes with a naive implementation for ssh, though it might require a bit of extra work.

@isidentical isidentical self-assigned this May 26, 2021
@isidentical
Copy link
Contributor Author

Do we stil want to maintain SSHObjectDB? It depends on some custom methods (open_max_sftp_channels) and interacts with paramiko directly instead of doing stuff on the filesystem layer. We could, in theory, still support though it seems like an odd piece of code that will lock us to paramiko.

@skshetry skshetry added the fs: ssh Related to the SSH filesystem label May 27, 2021
@efiop
Copy link
Contributor

efiop commented May 27, 2021

@isidentical Depends on the research. The only reason for it is open_max_sftp_channels, which might not be needed with proper throttling for SFTPFileSystems.

@isidentical
Copy link
Contributor Author

A short wrap up on the progress;

We initially decided to go with parallel-ssh though it had a couple of major issues;

  • Heavily unstable (sometimes get stuck out of control and poll for hours) / open the race conditions (there are multiple (sometimes unreproducible) bugs that lead to segmentation faults / double frees / assertion errors on the C level)
  • No error handling for SFTP, so all errors are a generic SFTPProtocolError which is not good for us considering one of the major benefits of fsspec is the unification of protocol errors
  • For ssh2 (the faster / more supported alternative) authentication with kerberos is not supported. So we would have the need to break the backwards compatibility.

After experiencing these failures we've decided to give async-ssh a shot, which I have to state I quite liked (very well documented). Currently, it passes all the functional tests in the DVC test suite and handles some of the rather complicated parts within the FS itself (e.g batch_exists for sshdb is not needed anymore since exists() calls will now handle channelling and can work with the normal object db).

I'm in the progress of wrapping up some ends, and then will try to run benchmarks.

@isidentical
Copy link
Contributor Author

AsyncSSH implementation is nearly ready, I'd really love to see the callbacks first though for the speed-up on small files though if not we can still proceed. Some new features we depend are still in the develop branch of the asyncssh, so in either way we have to wait a release on there first.

@isidentical
Copy link
Contributor Author

isidentical commented Jul 6, 2021

Task list

  • migrate the actual implementation to using sshfs
  • migrate config parsing to use sshfs.config instead of paramiko (ensure all test suite passes)
  • tune config options to make the filesystem faster (e.g disabling compression)
  • migrate fsspec wrapper to use fsspec.callbacks when possible
  • benchmark possible stories and compare them with paramiko
  • benchmark basic datacloud transfers (only push/pull) and compare them with scp and rsync
  • check atomicity (e.g interrupts during push, and re-push)

@isidentical
Copy link
Contributor Author

isidentical commented Jul 8, 2021

Push

paramiko asyncssh scp rsync
1 x 16G blob 8m50.198s 3m12.544s 1m57.267s 1m59.435s
8 x 8G blobs 40m58.745s 11m16.484s 8m45.489s 8m45.268s
cats-dogs dataset 0m11.724s 0m17.866s 0m4.542s 0m1.854s
25000 x 512K blobs 12m40.561s 6m10.019s 3m10.292s 2m24.675s

Pull

paramiko asyncssh scp rsync
1 x 16G blob 4m45.821s 3m52.927s 1m53.512s 1m50.129s
8 x 8G blobs 26m20.726s 13m47.231s 8m44.878s 8m46.730s
cats-dogs dataset 0m13.190s 0m10.425s 0m4.126s 0m2.113s
25000 x 512K blobs 11m17.895s 6m14.551s 3m1.480s 2m21.527s

scp/rsync is pending.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs: ssh Related to the SSH filesystem
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants