Skip to content

Conversation

@steveloughran
Copy link
Contributor

Speed up the magic committer with key changes being

  • Writes under __magic always retain directory markers

  • File creation under __magic skips all overwrite checks,
    including the LIST call intended to stop files being
    created over dirs.

  • mkdirs under __magic probes the path for existence
    but does not look any further.

Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.

The committer can write the summary _SUCCESS file to the path
fs.s3a.committer.summary.report.directory, which can be in a
different file system/bucket if desired, using the job id as
the filename.

Also: HADOOP-15460. S3A FS to add fs.s3a.create.performance

Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.

The createFile option prefix fs.s3a.create.header.
can be used to add custom headers to S3 objects when
created.

Contributed by Steve Loughran.

Change-Id: I9e086423f02eb25b6e70fc1c12a13e0a5afe9cb9

Description of PR

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

Speed up the magic committer with key changes being

* Writes under __magic always retain directory markers

* File creation under __magic skips all overwrite checks,
  including the LIST call intended to stop files being
        created over dirs.
* mkdirs under __magic probes the path for existence
  but does not look any further.

Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.

The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename.

Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`

Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.

The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.

Contributed by Steve Loughran.

Change-Id: I9e086423f02eb25b6e70fc1c12a13e0a5afe9cb9
@steveloughran
Copy link
Contributor Author

testing in progress; this is just a bounce past yetus

@steveloughran
Copy link
Contributor Author

tests s3 london

s3 select tests failing We do not support REDUCED_REDUNDANCY storage class.

https://issues.apache.org/jira/browse/HADOOP-18292 ; unrelated

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 10m 44s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 2s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 39 new or modified test files.
_ branch-3.3 Compile Tests _
+0 🆗 mvndep 13m 53s Maven dependency ordering for branch
+1 💚 mvninstall 26m 54s branch-3.3 passed
+1 💚 compile 25m 14s branch-3.3 passed
+1 💚 checkstyle 3m 39s branch-3.3 passed
+1 💚 mvnsite 4m 23s branch-3.3 passed
+1 💚 javadoc 3m 3s branch-3.3 passed
+1 💚 spotbugs 6m 23s branch-3.3 passed
+1 💚 shadedclient 27m 39s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 2m 25s the patch passed
+1 💚 compile 18m 18s the patch passed
-1 ❌ javac 18m 18s /results-compile-javac-root.txt root generated 1 new + 1918 unchanged - 1 fixed = 1919 total (was 1919)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 3m 15s /results-checkstyle-root.txt root: The patch generated 1 new + 30 unchanged - 6 fixed = 31 total (was 36)
+1 💚 mvnsite 4m 19s the patch passed
+1 💚 javadoc 2m 55s the patch passed
+1 💚 spotbugs 6m 43s the patch passed
+1 💚 shadedclient 27m 42s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 38s hadoop-common in the patch passed.
+1 💚 unit 6m 35s hadoop-mapreduce-client-core in the patch passed.
+1 💚 unit 2m 36s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 18s The patch does not generate ASF License warnings.
219m 24s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4470/1/artifact/out/Dockerfile
GITHUB PR #4470
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint xmllint
uname Linux 9d713761a210 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / 55714e4
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4470/1/testReport/
Max. process+thread count 3143 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4470/1/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran merged commit aeb2a2f into apache:branch-3.3 Jun 21, 2022
deepakdamri pushed a commit to acceldata-io/hadoop that referenced this pull request Jan 21, 2025
…he#4470)

Speed up the magic committer with key changes being

* Writes under __magic always retain directory markers

* File creation under __magic skips all overwrite checks,
  including the LIST call intended to stop files being
        created over dirs.
* mkdirs under __magic probes the path for existence
  but does not look any further.

Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.

The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename.

Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`

Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.

The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.

Contributed by Steve Loughran.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants