-
Notifications
You must be signed in to change notification settings - Fork 1.2k
webhdfs: implement support for basic auth and proxy #10075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
webhdfs: implement support for basic auth and proxy #10075
Conversation
This was already supported according to documentation, but missing in code Fixes: 10062
This key is needed to construct a potential rewrite dictionary in webhdfs to deal with HDFS behind High Availability Proxy servers
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #10075 +/- ##
==========================================
- Coverage 90.54% 90.31% -0.24%
==========================================
Files 499 499
Lines 37950 37950
Branches 5514 5514
==========================================
- Hits 34363 34273 -90
- Misses 2943 3011 +68
- Partials 644 666 +22 ☔ View full report in Codecov by Sentry. |
I think we're in some kind of limbo right now. Apparently the changes have been merged into dvc-webhdfs and the documentation has been updated (which now states that we can use "user", "password" and "data_proxy_target" - but the main repo doesn't yet support any of these parameters. 😕 |
* remote: add user key to config schema webhdfs This was already supported according to documentation, but missing in code Fixes: 10062 * remote: add password key to support Basic Auth * remote: add data_proxy_target for webhdfs remote This key is needed to construct a potential rewrite dictionary in webhdfs to deal with HDFS behind High Availability Proxy servers * bump dvc-webhdfs --------- Co-authored-by: skshetry <[email protected]>
In the current documentation of the webhdfs remote it was stated you could provide a
user
parameter. However, this parameter was not explicitly defined in theconfig_schema.json
file.Since I wanted to add support for Basic Auth for WebHDFS (possible since my PR for fsspec has been merged fsspec/filesystem_spec#1409, but not yet released) I also added support for a
password
parameter.Finally, for cases where the edgenode is behind a HA proxy, rewriting of the URLs might be needed. This is supported by fsspec using the data_proxy parameter (which can be either a dictionary or a callable). After a talk on discord opted to implement this using only the target of the rewrite instead of providing source and target.
This target (
data_proxy_target
) also needed to be added to theconfig_schema.json
.The relevant dvc-webhdfs PR for the change is: iterative/dvc-webhdfs#16
The relevant dvc.org PR for doc change is: iterative/dvc.org#4980
The combination of the 3 code changes (fsspec / dvc / dvc-webhdfs) has been running without any errors in the situation where the basic auth + rewrite is required.
Fixes: #10062
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏