-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-17833. Improve Magic Committer performance (#3289) #4470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
steveloughran
merged 1 commit into
apache:branch-3.3
from
steveloughran:s3/HADOOP-17833-magic-committer-branch-3.3
Jun 21, 2022
Merged
HADOOP-17833. Improve Magic Committer performance (#3289) #4470
steveloughran
merged 1 commit into
apache:branch-3.3
from
steveloughran:s3/HADOOP-17833-magic-committer-branch-3.3
Jun 21, 2022
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Speed up the magic committer with key changes being
* Writes under __magic always retain directory markers
* File creation under __magic skips all overwrite checks,
including the LIST call intended to stop files being
created over dirs.
* mkdirs under __magic probes the path for existence
but does not look any further.
Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.
The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename.
Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`
Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.
The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.
Contributed by Steve Loughran.
Change-Id: I9e086423f02eb25b6e70fc1c12a13e0a5afe9cb9
Contributor
Author
|
testing in progress; this is just a bounce past yetus |
Contributor
Author
|
tests s3 london s3 select tests failing https://issues.apache.org/jira/browse/HADOOP-18292 ; unrelated |
|
💔 -1 overall
This message was automatically generated. |
deepakdamri
pushed a commit
to acceldata-io/hadoop
that referenced
this pull request
Jan 21, 2025
…he#4470) Speed up the magic committer with key changes being * Writes under __magic always retain directory markers * File creation under __magic skips all overwrite checks, including the LIST call intended to stop files being created over dirs. * mkdirs under __magic probes the path for existence but does not look any further. Extra parallelism in task and job commit directory scanning Use of createFile and openFile with parameters which all for HEAD checks to be skipped. The committer can write the summary _SUCCESS file to the path `fs.s3a.committer.summary.report.directory`, which can be in a different file system/bucket if desired, using the job id as the filename. Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance` Application code can set the createFile() option fs.s3a.create.performance to true to disable the same safety checks when writing under magic directories. Use with care. The createFile option prefix `fs.s3a.create.header.` can be used to add custom headers to S3 objects when created. Contributed by Steve Loughran.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Speed up the magic committer with key changes being
Writes under __magic always retain directory markers
File creation under __magic skips all overwrite checks,
including the LIST call intended to stop files being
created over dirs.
mkdirs under __magic probes the path for existence
but does not look any further.
Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.
The committer can write the summary _SUCCESS file to the path
fs.s3a.committer.summary.report.directory, which can be in adifferent file system/bucket if desired, using the job id as
the filename.
Also: HADOOP-15460. S3A FS to add
fs.s3a.create.performanceApplication code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.
The createFile option prefix
fs.s3a.create.header.can be used to add custom headers to S3 objects when
created.
Contributed by Steve Loughran.
Change-Id: I9e086423f02eb25b6e70fc1c12a13e0a5afe9cb9
Description of PR
How was this patch tested?
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?