Skip to content

webhdfs: implement support for basic auth and proxy #10075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gdiepen
Copy link
Contributor

@gdiepen gdiepen commented Nov 8, 2023

In the current documentation of the webhdfs remote it was stated you could provide a user parameter. However, this parameter was not explicitly defined in the config_schema.json file.

Since I wanted to add support for Basic Auth for WebHDFS (possible since my PR for fsspec has been merged fsspec/filesystem_spec#1409, but not yet released) I also added support for a password parameter.

Finally, for cases where the edgenode is behind a HA proxy, rewriting of the URLs might be needed. This is supported by fsspec using the data_proxy parameter (which can be either a dictionary or a callable). After a talk on discord opted to implement this using only the target of the rewrite instead of providing source and target.

This target (data_proxy_target) also needed to be added to the config_schema.json.

The relevant dvc-webhdfs PR for the change is: iterative/dvc-webhdfs#16
The relevant dvc.org PR for doc change is: iterative/dvc.org#4980

The combination of the 3 code changes (fsspec / dvc / dvc-webhdfs) has been running without any errors in the situation where the basic auth + rewrite is required.

Fixes: #10062

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

This was already supported according to documentation, but missing in
code

Fixes: 10062
This key is needed to construct a potential rewrite dictionary in
webhdfs to deal with HDFS behind High Availability Proxy servers
Copy link

codecov bot commented Nov 8, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f2b56bb) 90.54% compared to head (ff3b85f) 90.31%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10075      +/-   ##
==========================================
- Coverage   90.54%   90.31%   -0.24%     
==========================================
  Files         499      499              
  Lines       37950    37950              
  Branches     5514     5514              
==========================================
- Hits        34363    34273      -90     
- Misses       2943     3011      +68     
- Partials      644      666      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@efiop efiop changed the title Implement support for basic auth and proxy webhdfs: mplement support for basic auth and proxy Nov 8, 2023
@efiop efiop added the fs: webhdfs Related to the WebHDFS filesystem label Nov 8, 2023
efiop
efiop previously requested changes Nov 8, 2023
@efiop efiop added the feature is a feature label Nov 8, 2023
@efiop efiop changed the title webhdfs: mplement support for basic auth and proxy webhdfs: implement support for basic auth and proxy Nov 9, 2023
@s1lvester
Copy link

I think we're in some kind of limbo right now. Apparently the changes have been merged into dvc-webhdfs and the documentation has been updated (which now states that we can use "user", "password" and "data_proxy_target" - but the main repo doesn't yet support any of these parameters. 😕

skshetry added a commit to iterative/dvc-webhdfs that referenced this pull request Dec 26, 2023
@skshetry skshetry enabled auto-merge (squash) December 26, 2023 02:44
@skshetry skshetry disabled auto-merge December 26, 2023 02:52
@skshetry skshetry merged commit 27a8222 into iterative:main Dec 26, 2023
@s1lvester
Copy link

Thank you @skshetry and @gdiepen ! ❤️

BradyJ27 pushed a commit to BradyJ27/dvc that referenced this pull request Apr 22, 2024
* remote: add user key to config schema webhdfs

This was already supported according to documentation, but missing in
code

Fixes: 10062

* remote: add password key to support Basic Auth

* remote: add data_proxy_target for webhdfs remote

This key is needed to construct a potential rewrite dictionary in
webhdfs to deal with HDFS behind High Availability Proxy servers

* bump dvc-webhdfs

---------

Co-authored-by: skshetry <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature is a feature fs: webhdfs Related to the WebHDFS filesystem
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

push: can't use user key in webhdfs remote configuration
4 participants