HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID. #2399

steveloughran · 2020-10-20T20:37:53Z

This PR addresses concurrency issues in the s3A committers, mostly related to application attempt IDs not always being unique in spark, but also from some other issues

Magic:

only one job can write to same directory tree due to job completion aborting all uploads (this is now optional)
application_attempt ID used as path under __magic, so if two jobs had same attempt ID (possible under spark)
they would conflict there

Staging: app attempt ID used for

path in HDFS for wrapped committer
part of path of temp dir used by every task attempt to stage uploads.

All S3A committers can have purging pending deletes on job commit
disabled. (new option; old one deprecated).

New test of concurrent jobs designed to trigger the specific failure conditions
of Staging (job2 commit after task1 commit) and Magic (job2 commit before
task2 commit), also verify that paths for output are different.

steveloughran · 2020-10-20T20:38:43Z

Tested OK with -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep -Ds3guard -Ddynamo -Dfs.s3a.directory.marker.audit=true -Dscale

now retesting with delete and unguarded

steveloughran · 2020-10-20T20:44:07Z

FYI @dongjoon-hyun

dongjoon-hyun · 2020-10-20T22:13:04Z

Thank you for pinging me, @steveloughran .

cc @sunchao , too.

steveloughran · 2020-10-23T13:50:12Z

Related to this, I'm going include work related to SPARK-33230

If in setupJob there's no "spark.sql.sources.writeJobUUID", a UUID will be set; staging committers can use this to be confident they are getting separate dirs for jobs even when jobIDs are the same

dongjoon-hyun · 2020-10-26T19:41:50Z

FYI, I merged SPARK-33230, @steveloughran .

steveloughran · 2020-10-27T17:16:15Z

@dongjoon-hyun thanks...doing a bit more on this as the more tests I write, the more corner cases surface. Think I'm control now.

steveloughran · 2020-11-02T18:11:13Z

new s3a test failing

[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.655 s - in org.apache.hadoop.fs.s3a.commit.staging.TestDirectoryCommitterScale
[ERROR] Tests run: 63, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 37.318 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.staging.TestStagingCommitter
[ERROR] testUUIDValidation[threads-0-unique-false](org.apache.hadoop.fs.s3a.commit.staging.TestStagingCommitter)  Time elapsed: 0.058 s  <<< ERROR!
org.apache.hadoop.fs.s3a.commit.PathCommitException: `': Job/task context does not contain a unique ID in spark.sql.sources.writeJobUUID
	at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.buildJobUUID(AbstractS3ACommitter.java:1268)
	at org.apache.hadoop.fs.s3a.commit.staging.TestStagingCommitter.lambda$testUUIDValidation$0(TestStagingCommitter.java:227)
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:498)
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:384)
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:453)
	at org.apache.hadoop.fs.s3a.commit.staging.TestStagingCommitter.testUUIDValidation(TestStagingCommitter.java:226)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

and something in hadoop common related to TestLdap. Filing JIRA there

steveloughran · 2020-11-02T18:23:47Z

#2427 HADOOP-17340. TestLdapGroupsMapping failing -string mismatch in exception validation to cover hadoop-common failure

steveloughran · 2020-11-09T13:47:44Z

Test run with: -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep -Ds3guard -Ddynamo -Dfs.s3a.directory.marker.audit=true -Dscale

[INFO] 
[ERROR] Failures: 
[ERROR]   ITestS3AContractUnbuffer>AbstractContractUnbufferTest.testUnbufferBeforeRead:63->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 failed to read expected number of bytes from stream. This may be transient expected:<1024> but was:<93>
[ERROR]   ITestS3AContractUnbuffer>AbstractContractUnbufferTest.testUnbufferOnClosedFile:83->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 failed to read expected number of bytes from stream. This may be transient expected:<1024> but was:<605>
[INFO] 
[ERROR] Tests run: 1379, Failures: 2, Errors: 0, Skipped: 153
[INFO]

My next big bit of work is to do tests in spark itself

steveloughran · 2020-11-09T19:16:36Z

Running integration tests on this with spark + patch and the 3.4.0-SNAPSHOT builds. Ignoring compilation issues with spark trunk, hadoop-trunk, scala versions and scalatest, I'm running tests in cloud-integration

S3AParquetPartitionSuite:
2020-11-09 10:55:36,664 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  commit.AbstractS3ACommitter (AbstractS3ACommitter.java:<init>(180)) - Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID
2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  output.FileOutputCommitter (FileOutputCommitter.java:<init>(141)) - File Output Committer Algorithm version is 1
2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  output.FileOutputCommitter (FileOutputCommitter.java:<init>(156)) - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  commit.AbstractS3ACommitterFactory (S3ACommitterFactory.java:createTaskCommitter(83)) - Using committer directory to output data to s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo
2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  commit.AbstractS3ACommitterFactory (AbstractS3ACommitterFactory.java:createOutputCommitter(54)) - Using Committer StagingCommitter{AbstractS3ACommitter{role=Task committer attempt_20201109105536_0000_m_000000_0, name=directory, outputPath=s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo, workPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536_0000_m_000000_0/_temporary/0/_temporary/attempt_20201109105536_0000_m_000000_0, uuid='d6b6cd70-0303-46a6-8ff4-240dd14511d6', uuid source=JobUUIDSource{text='spark.sql.sources.writeJobUUID'}}, commitsDirectory=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads, uniqueFilenames=true, conflictResolution=APPEND. uploadPartSize=67108864, wrappedCommitter=FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl{jobId=job_20201109105536_0000}; taskId=attempt_20201109105536_0000_m_000000_0, status=''}; org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter@759c53e5}; outputPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads, workPath=null, algorithmVersion=1, skipCleanup=false, ignoreCleanupFailures=false}} for s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo
2020-11-09 10:55:36,736 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  staging.DirectoryStagingCommitter (DirectoryStagingCommitter.java:setupJob(71)) - Conflict Resolution mode is APPEND
2020-11-09 10:55:36,879 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO  commit.AbstractS3AC

Spark is passing down a unique job ID (committer is configured to require it) Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID
This used for the local fs work path of the staging committer file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536_0000_m_000000_0/_temporary/0/_temporary/attempt_20201109105536_0000_m_000000_0,
And for the cluster FS (which is file:// here)
file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads

that is: spark is setting the UUID and the committer is picking it up and using as appropriate

dongjoon-hyun · 2020-11-09T22:27:55Z

Thank you for sharing, @steveloughran !

steveloughran · 2020-11-10T15:45:31Z

some more detail for the watchers from my testing (hadoop-trunk + CDP spark 2.4). I could not get spark master and hadoop trunk to build together this week.

RDD.saveAs needs to pass down the setting too https://issues.apache.org/jira/browse/SPARK-33402
I'm getting errors with FileSystem instantiation in Hive and the isolated classloader https://issues.apache.org/jira/browse/HADOOP-17372.

I'm not going near that other than to add a para in troubleshooting.md saying "you're in classloader hell". Will need to be testing against spark master before worrying about WTF is going on there

I'm also now worried that if anyone does >1 job with the same dest dir and overwrite=true, then there's a risk that you get the same duplicate app attempt ID race condition. It's tempting just to do something ambitious like use a random number to generate a timestamp for the cluster launch, or some random(year-month-day)+ seconds-of-day, so that this problem goes away almost completely

steveloughran · 2020-11-10T15:52:32Z

latest test run against s3 london, no s3guard; markers deleted (classic config). Everything, even the flaky read() tests passed!

-Dparallel-tests -DtestsThreadCount=4 -Dmarkers=delete -Dfs.s3a.directory.marker.audit=true -Dscale

liuml07

This is a big patch. To follow all the changes, I started from the doc and tests. After that I go back to how it was changing the code. I guess I can follow now. I'm +1 on this PR with my best knowledge, though a second review will very be appreciated.

Thanks Steve! GREAT WORK.

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java

liuml07 · 2020-11-11T06:02:10Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java

nit: s/upload/upload ID/

I was thinking of consistent log keywords so taht for any retry log we can search "upload ID" or "commit ID"

This is the S3 multipart upload ID, so I'll use upload ID for it...its also used in BlockOutputStream

liuml07 · 2020-11-11T06:07:30Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/CommitConstants.java

nit: We can make it clear in javadoc here that default value is false. Same the generate.uuid below.

+1. adding two new constants and referring to them in the production code

liuml07 · 2020-11-11T06:13:47Z

...ols/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/InternalCommitterConstants.java

Is this SPARK app ID name constant still used, or I missed something? 🤔

Cut it. this was a very old property passed down by spark.

liuml07 · 2020-11-11T06:15:19Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/files/SuccessData.java

This is only for java serialization, obviously. It's to make sure anyone (me) who might pass them around in spark RDDs won't create serlalization problems. FWIW I use the JSON format in those cloud committer tests, primarily to verify the committer name correctness

liuml07 · 2020-11-11T06:20:29Z

...tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java

nit: just getUUID() without this.?

relic of wrapping/pulling up the old code. Fixed. Also clarified the uuid javadocs now that SPARK-33402 is generating more unique job IDs

liuml07 · 2020-11-11T06:21:01Z

...tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java

liuml07 · 2020-11-11T06:25:34Z

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md

There is an empty line between the table header and this first row. I see github online viewer is not blessing this. Maybe we just remove LoC 552

done. Also reviewed both tables, removed those columns about which committer supports what option, now they are split into common and staging

liuml07 · 2020-11-11T06:44:36Z

...tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java

nit: may call this conf2 like jobConf2 to make it a bit clearer.

liuml07 · 2020-11-11T06:47:28Z

...tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java

nit: add multipartInitiatedInWrite to the log message? Same below

mehakmeet

LGTM, just some small nits.

mehakmeet · 2020-11-12T02:03:53Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java

nit: Would this be "If disabled"? Also, what is the property we are talking about that is enabled or not, is it FS_S3A_COMMITTER_GENERATE_UUID, then we should mention it here too I think.

added the extra details

mehakmeet · 2020-11-12T02:11:01Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java

nit: typo in "random"

steveloughran · 2020-11-12T12:30:14Z

thanks. will go through comments and apply before merging

…t IDs * ITests for the uuid generate/require logic * Pendingset and _SUCCESS files add jobId field * And on job commit, the jobID of pendingset files are validated The validation will detect and fail if a job with a different ID has got a .pendingset file into the directory used by the current job. The _SUCCESS file is to aid auditing/testing Change-Id: I07a6a2d00ac5598c8f961aebbc9c9fdbb70ab51a commit f2128bd1bf94de36ee1e3ed542de0a0e839307e3 Author: Steve Loughran <[email protected]> Date: Tue Oct 27 17:05:12 2020 +0000 HADOOP-17318. Support concurrent S3A jobs with conflicting app attempt IDs. Have a job UUID everywhere, which for spark must be passed in or self-generated; for MR it will be the yarn app attempt. The task attempts are still used under this where task work needs to be differentiated. Examples * temp dir for staging * magic path for magic committers * HDFS dir for staging summary info Change-Id: I17c641280d916ea1ad4ce4407215d07e488954af commit 0ae79187d6c821eef13872e4c384948729b9c72d Author: Steve Loughran <[email protected]> Date: Tue Oct 20 21:36:36 2020 +0100 HADOOP-17318. Support concurrent S3A commit jobs slightly better. All S3A committers can have purging pending deletes on job commit disabled. (new option; old one deprecated). More logging of what is going on with individual file load/upload/commit (including with duration) Test of concurrent jobs designed to trigger the specific failure conditions of Staging (job2 commit after task1 commit) and Magic (job2 commit before task2 commit) Change-Id: If560c7541c287dc6d4c2f1af395c93b838495139 Change-Id: I2374b904bfb65399e08084e6c2b78237ec1603cd

rdblue

Looks good to me. I looked through just the UUID-related parts.

rdblue · 2020-11-12T17:17:51Z

hadoop-common-project/hadoop-common/src/main/resources/core-default.xml

Committers don't cancel just their own pending uploads?

taskAbort, yet. JobAbort/cleanup is where things are more trouble, because the job doesn't know what specific task attempts have uploaded.

with the staging committer, there's no files uploaded until task commit. Tasks which fail before that moment don't have any pending uploads to cancel.
with the magic committer, because the files are written direct to S3, there is more risk of pending uploads collecting.

I'm not sure about spark here, but on MR when a task is considered to have failed, abortTask is called in the AM to abort that specific task; for the magic committer the task's set of .pending files is determined by listing the task attempt dir, and those operations cancelled. If that operation is called reliably, only the current upload is pending.

Of course, if an entire job fails: no cleanup at all.

The best thing to do is simply to tell everyone to have a scheduled cleanup.

FWIW, the most leakage I see in the real world is actually from incomplete S3ABlockOutputStream writes as again, they accrue bills. Everyone needs a lifecycle rule to delete old ones. The sole exception there is one which our QE team used which (unknown to them) I'd use for testing the scalability of the "hadoop s3guard uploads" command -how well does it work when there are many, many incomplete uploads, can it still delete them all etc. If they had a rule then it'd screw up my test runs.

rdblue · 2020-11-12T17:19:23Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java

Other places use this. as a prefix when setting fields. I find that helpful when reading to know that an instance field is being set, vs a local variable.

Makes sense in the constructor. Done

rdblue · 2020-11-12T17:20:07Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java

A lot of these changes don't seem related to the UUID change. I think it would be easier to review if only necessary changes were in this PR.

The IDE was whining about calling an override point in the constructor, so I turned it off at the same time. sorry

rdblue · 2020-11-12T17:21:57Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java

It seems odd to set the Spark property. Does anything else use this?

I was just trying to be rigorous. will roll back. While I'm there I think I'll add the source attribute -i can then probe for it in the tests. I'm already saving it in the _SUCCESS file

rdblue · 2020-11-12T17:25:30Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java

This is incorrect if this is self-generated but this method is called after setupJob. I think that method shouldn't set SPARK_WRITE_UUID.

Removed it there.

rdblue · 2020-11-12T17:26:03Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java

Why would a committer not want to generate a unique ID and use the job ID instead?

MR jobs where their updated config doesn't get through to the tasks. Use a self-generated ID and things won't work. And as they know that the app ID is unique on that yarn cluster, that's all they need.

For my spark integration tests I turned off auto generate and enabled the fail-on-job-ID option, to verify that all operations (RDD, dataframe, dataset, sql) were passing down the spark.sql option. Helped me find out where it wasn't being set

steveloughran · 2020-11-13T18:02:50Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java

I was just trying to be rigorous. will roll back. While I'm there I think I'll add the source attribute -i can then probe for it in the tests. I'm already saving it in the _SUCCESS file

steveloughran · 2020-11-13T18:23:09Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java

Removed it there.

steveloughran · 2020-11-13T18:26:07Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java

MR jobs where their updated config doesn't get through to the tasks. Use a self-generated ID and things won't work. And as they know that the app ID is unique on that yarn cluster, that's all they need.

For my spark integration tests I turned off auto generate and enabled the fail-on-job-ID option, to verify that all operations (RDD, dataframe, dataset, sql) were passing down the spark.sql option. Helped me find out where it wasn't being set

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md

steveloughran · 2020-11-16T12:10:59Z

@rdblue

yes, I did a bit more than was needed because I had to also let > 1 magic committer commit work side-by-side (all that active upload warning), and the IDE was trying to keep me in check too, on a piece of code which hasn't been revisited for a while.

While I had the files open in the IDE, I moved to passing FileStatus down to line up with the changes in #2168 -if you open a file through the JsonSerializer by passing in the FileStatus, that will be handed off to the FileSystem's implementation of openFile(status.path).withFileStatus(status), and so be used by S3A FS to skip the initial HEAD request. Means if we are reading 1000 .pendingset files in S3A, we eliminate 1000 HEAD calls, which should have tangible benefits for committers using S3 as the place to keep those files.

Also added config option fs.s3a.committer.uuid.source which is set in the jobconf during job setup, used in test to verify source of ID. Change-Id: I9eb44113bc6afd5826c8a51bdf16fb220f8fb111

steveloughran · 2020-11-16T14:03:47Z

Pushed up an iteration with all the feedback addressed

testing: s3 london, unguarded, markers=keep
downstream testing (which now includes a test to generate 10K Job IDs through the spark API and verify they are different): s3 ireland, unguarded, markers = delete

Change-Id: Ic7106a43738a14eba59f81d892b7856e6596ad65

…t ID. (#2399) See also [SPARK-33402]: Jobs launched in same second have duplicate MapReduce JobIDs Contributed by Steve Loughran. Change-Id: Iae65333cddc84692997aae5d902ad8765b45772a

steveloughran · 2020-11-19T14:22:22Z

Merged to trunk, not yet 3.3. See #2473 for the test failure caused in code from a different PR which this patch goes nowhere near.

dongjoon-hyun · 2020-11-19T15:59:45Z

Thank you, @steveloughran and guys!

…t ID. (#2399) See also [SPARK-33402]: Jobs launched in same second have duplicate MapReduce JobIDs Contributed by Steve Loughran. Change-Id: Iae65333cddc84692997aae5d902ad8765b45772a

…e app attempt ID. (apache#2399) See also [SPARK-33402]: Jobs launched in same second have duplicate MapReduce JobIDs Contributed by Steve Loughran. Change-Id: Iae65333cddc84692997aae5d902ad8765b45772a

steveloughran added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label Oct 20, 2020

steveloughran requested a review from liuml07 October 20, 2020 20:37

steveloughran changed the title ~~HADOOP-17318. Support concurrent S3A commit jobs slightly better.~~ HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID. Oct 27, 2020

apache deleted a comment from hadoop-yetus Oct 30, 2020

apache deleted a comment from hadoop-yetus Nov 2, 2020

steveloughran force-pushed the s3/HADOOP-17318-magic-abort branch from d3dc4ed to 293c53f Compare November 2, 2020 13:43

steveloughran force-pushed the s3/HADOOP-17318-magic-abort branch from 293c53f to 6cd4d83 Compare November 9, 2020 12:32

apache deleted a comment from hadoop-yetus Nov 9, 2020

steveloughran mentioned this pull request Nov 10, 2020

[SPARK-33402][CORE] Jobs launched in same second have duplicate MapReduce JobIDs apache/spark#30319

Closed

steveloughran force-pushed the s3/HADOOP-17318-magic-abort branch 2 times, most recently from b526e95 to 1f14f64 Compare November 10, 2020 17:06

liuml07 approved these changes Nov 11, 2020

View reviewed changes

mehakmeet reviewed Nov 12, 2020

View reviewed changes

apache deleted a comment from hadoop-yetus Nov 12, 2020

rdblue approved these changes Nov 12, 2020

View reviewed changes

steveloughran commented Nov 16, 2020

View reviewed changes

HADOOP-17318: address feedback.

9ba3f4e

Also added config option fs.s3a.committer.uuid.source which is set in the jobconf during job setup, used in test to verify source of ID. Change-Id: I9eb44113bc6afd5826c8a51bdf16fb220f8fb111

steveloughran force-pushed the s3/HADOOP-17318-magic-abort branch from 1f14f64 to 9ba3f4e Compare November 16, 2020 14:00

rdblue approved these changes Nov 16, 2020

View reviewed changes

This comment has been minimized.

Sign in to view

HADOOP-17318. Checkstyles before the merge

81b1305

Change-Id: Ic7106a43738a14eba59f81d892b7856e6596ad65

This comment has been minimized.

Sign in to view

steveloughran closed this Nov 19, 2020

steveloughran deleted the s3/HADOOP-17318-magic-abort branch October 15, 2021 19:50

HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID. #2399

HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID. #2399

Uh oh!

Conversation

steveloughran commented Oct 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveloughran commented Oct 20, 2020

Uh oh!

steveloughran commented Oct 20, 2020

Uh oh!

dongjoon-hyun commented Oct 20, 2020

Uh oh!

steveloughran commented Oct 23, 2020

Uh oh!

dongjoon-hyun commented Oct 26, 2020

Uh oh!

steveloughran commented Oct 27, 2020

Uh oh!

steveloughran commented Nov 2, 2020

Uh oh!

steveloughran commented Nov 2, 2020

Uh oh!

steveloughran commented Nov 9, 2020

Uh oh!

steveloughran commented Nov 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Nov 9, 2020

Uh oh!

steveloughran commented Nov 10, 2020

Uh oh!

steveloughran commented Nov 10, 2020

Uh oh!

liuml07 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mehakmeet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steveloughran commented Oct 20, 2020 •

edited

Loading

steveloughran commented Nov 9, 2020 •

edited

Loading

liuml07 left a comment •

edited

Loading