Skip to content

Commit 0af4011

Browse files
ben-rolingsteveloughran
authored andcommitted
HADOOP-16221. S3Guard: add option to fail operation on metadata write failure.
1 parent 7fbaa7d commit 0af4011

File tree

11 files changed

+312
-24
lines changed

11 files changed

+312
-24
lines changed

hadoop-common-project/hadoop-common/src/main/resources/core-default.xml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1522,6 +1522,19 @@
15221522
</description>
15231523
</property>
15241524

1525+
<property>
1526+
<name>fs.s3a.metadatastore.fail.on.write.error</name>
1527+
<value>true</value>
1528+
<description>
1529+
When true (default), FileSystem write operations generate
1530+
org.apache.hadoop.fs.s3a.MetadataPersistenceException if the metadata
1531+
cannot be saved to the metadata store. When false, failures to save to
1532+
metadata store are logged at ERROR level, but the overall FileSystem
1533+
write operation succeeds.
1534+
</description>
1535+
</property>
1536+
1537+
15251538
<property>
15261539
<name>fs.s3a.s3guard.cli.prune.age</name>
15271540
<value>86400000</value>

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -415,6 +415,17 @@ private Constants() {
415415
public static final String S3_METADATA_STORE_IMPL =
416416
"fs.s3a.metadatastore.impl";
417417

418+
/**
419+
* Whether to fail when there is an error writing to the metadata store.
420+
*/
421+
public static final String FAIL_ON_METADATA_WRITE_ERROR =
422+
"fs.s3a.metadatastore.fail.on.write.error";
423+
424+
/**
425+
* Default value ({@value}) for FAIL_ON_METADATA_WRITE_ERROR.
426+
*/
427+
public static final boolean FAIL_ON_METADATA_WRITE_ERROR_DEFAULT = true;
428+
418429
/** Minimum period of time (in milliseconds) to keep metadata (may only be
419430
* applied when a prune command is manually run).
420431
*/
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing, software
13+
* distributed under the License is distributed on an "AS IS" BASIS,
14+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
* See the License for the specific language governing permissions and
16+
* limitations under the License.
17+
*/
18+
19+
package org.apache.hadoop.fs.s3a;
20+
21+
import org.apache.hadoop.fs.PathIOException;
22+
23+
/**
24+
* Indicates the metadata associated with the given Path could not be persisted
25+
* to the metadata store (e.g. S3Guard / DynamoDB). When this occurs, the
26+
* file itself has been successfully written to S3, but the metadata may be out
27+
* of sync. The metadata can be corrected with the "s3guard import" command
28+
* provided by {@link org.apache.hadoop.fs.s3a.s3guard.S3GuardTool}.
29+
*/
30+
public class MetadataPersistenceException extends PathIOException {
31+
32+
/**
33+
* Constructs a MetadataPersistenceException.
34+
* @param path path of the affected file
35+
* @param cause cause of the issue
36+
*/
37+
public MetadataPersistenceException(String path, Throwable cause) {
38+
super(path, cause);
39+
}
40+
}

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Retries.java

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,22 @@
2626
import org.apache.hadoop.classification.InterfaceStability;
2727

2828
/**
29-
* Declaration of retry policy for documentation only.
30-
* This is purely for visibility in source and is currently package-scoped.
31-
* Compare with {@link org.apache.hadoop.io.retry.AtMostOnce}
32-
* and {@link org.apache.hadoop.io.retry.Idempotent}; these are real
33-
* markers used by Hadoop RPC.
29+
* <p>
30+
* Annotations to inform the caller of an annotated method whether
31+
* the method performs retries and/or exception translation internally.
32+
* Callers should use this information to inform their own decisions about
33+
* performing retries or exception translation when calling the method. For
34+
* example, if a method is annotated {@code RetryTranslated}, the caller
35+
* MUST NOT perform another layer of retries. Similarly, the caller shouldn't
36+
* perform another layer of exception translation.
37+
* </p>
38+
* <p>
39+
* Declaration for documentation only.
40+
* This is purely for visibility in source and is currently package-scoped.
41+
* Compare with {@link org.apache.hadoop.io.retry.AtMostOnce}
42+
* and {@link org.apache.hadoop.io.retry.Idempotent}; these are real
43+
* markers used by Hadoop RPC.
44+
* </p>
3445
*/
3546
@InterfaceAudience.Private
3647
@InterfaceStability.Unstable

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,7 @@ public class S3AFileSystem extends FileSystem implements StreamCapabilities,
204204
LoggerFactory.getLogger("org.apache.hadoop.fs.s3a.S3AFileSystem.Progress");
205205
private LocalDirAllocator directoryAllocator;
206206
private CannedAccessControlList cannedACL;
207+
private boolean failOnMetadataWriteError;
207208

208209
/**
209210
* This must never be null; until initialized it just declares that there
@@ -306,6 +307,9 @@ public void initialize(URI name, Configuration originalConf)
306307
onRetry);
307308
writeHelper = new WriteOperationHelper(this, getConf());
308309

310+
failOnMetadataWriteError = conf.getBoolean(FAIL_ON_METADATA_WRITE_ERROR,
311+
FAIL_ON_METADATA_WRITE_ERROR_DEFAULT);
312+
309313
maxKeys = intOption(conf, MAX_PAGING_KEYS, DEFAULT_MAX_PAGING_KEYS, 1);
310314
listing = new Listing(this);
311315
partSize = getMultipartSizeProperty(conf,
@@ -1784,10 +1788,13 @@ public UploadInfo putObject(PutObjectRequest putObjectRequest) {
17841788
* @param putObjectRequest the request
17851789
* @return the upload initiated
17861790
* @throws AmazonClientException on problems
1791+
* @throws MetadataPersistenceException if metadata about the write could
1792+
* not be saved to the metadata store and
1793+
* fs.s3a.metadatastore.fail.on.write.error=true
17871794
*/
1788-
@Retries.OnceRaw("For PUT; post-PUT actions are RetriesExceptionsSwallowed")
1795+
@Retries.OnceRaw("For PUT; post-PUT actions are RetryTranslated")
17891796
PutObjectResult putObjectDirect(PutObjectRequest putObjectRequest)
1790-
throws AmazonClientException {
1797+
throws AmazonClientException, MetadataPersistenceException {
17911798
long len = getPutRequestLength(putObjectRequest);
17921799
LOG.debug("PUT {} bytes to {}", len, putObjectRequest.getKey());
17931800
incrementPutStartStatistics(len);
@@ -2710,11 +2717,14 @@ private void innerCopyFromLocalFile(boolean delSrc, boolean overwrite,
27102717
* @param progress optional progress callback
27112718
* @return the upload result
27122719
* @throws InterruptedIOException if the blocking was interrupted.
2720+
* @throws MetadataPersistenceException if metadata about the write could
2721+
* not be saved to the metadata store and
2722+
* fs.s3a.metadatastore.fail.on.write.error=true
27132723
*/
2714-
@Retries.OnceRaw("For PUT; post-PUT actions are RetriesExceptionsSwallowed")
2724+
@Retries.OnceRaw("For PUT; post-PUT actions are RetryTranslated")
27152725
UploadResult executePut(PutObjectRequest putObjectRequest,
27162726
Progressable progress)
2717-
throws InterruptedIOException {
2727+
throws InterruptedIOException, MetadataPersistenceException {
27182728
String key = putObjectRequest.getKey();
27192729
UploadInfo info = putObject(putObjectRequest);
27202730
Upload upload = info.getUpload();
@@ -3034,10 +3044,15 @@ private Optional<SSECustomerKey> generateSSECustomerKey() {
30343044
* </ol>
30353045
* @param key key written to
30363046
* @param length total length of file written
3047+
* @throws MetadataPersistenceException if metadata about the write could
3048+
* not be saved to the metadata store and
3049+
* fs.s3a.metadatastore.fail.on.write.error=true
30373050
*/
30383051
@InterfaceAudience.Private
3039-
@Retries.RetryExceptionsSwallowed
3040-
void finishedWrite(String key, long length) {
3052+
@Retries.RetryTranslated("Except if failOnMetadataWriteError=false, in which"
3053+
+ " case RetryExceptionsSwallowed")
3054+
void finishedWrite(String key, long length)
3055+
throws MetadataPersistenceException {
30413056
LOG.debug("Finished write to {}, len {}", key, length);
30423057
Path p = keyToQualifiedPath(key);
30433058
Preconditions.checkArgument(length >= 0, "content length is negative");
@@ -3053,8 +3068,12 @@ void finishedWrite(String key, long length) {
30533068
S3Guard.putAndReturn(metadataStore, status, instrumentation);
30543069
}
30553070
} catch (IOException e) {
3056-
LOG.error("S3Guard: Error updating MetadataStore for write to {}:",
3057-
key, e);
3071+
if (failOnMetadataWriteError) {
3072+
throw new MetadataPersistenceException(p.toString(), e);
3073+
} else {
3074+
LOG.error("S3Guard: Error updating MetadataStore for write to {}",
3075+
p, e);
3076+
}
30583077
instrumentation.errorIgnored();
30593078
}
30603079
}

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ARetryPolicy.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,10 @@ protected Map<Class<? extends Exception>, RetryPolicy> createExceptionMap() {
172172
policyMap.put(FileNotFoundException.class, fail);
173173
policyMap.put(InvalidRequestException.class, fail);
174174

175+
// metadata stores should do retries internally when it makes sense
176+
// so there is no point doing another layer of retries after that
177+
policyMap.put(MetadataPersistenceException.class, fail);
178+
175179
// once the file has changed, trying again is not going to help
176180
policyMap.put(RemoteFileChangedException.class, fail);
177181

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -247,22 +247,21 @@ private CompleteMultipartUploadResult finalizeMultipartUpload(
247247
throw new IOException(
248248
"No upload parts in multipart upload to " + destKey);
249249
}
250-
return invoker.retry("Completing multipart commit", destKey,
250+
CompleteMultipartUploadResult uploadResult = invoker.retry("Completing multipart commit", destKey,
251251
true,
252252
retrying,
253253
() -> {
254254
// a copy of the list is required, so that the AWS SDK doesn't
255255
// attempt to sort an unmodifiable list.
256-
CompleteMultipartUploadResult result =
257-
owner.getAmazonS3Client().completeMultipartUpload(
258-
new CompleteMultipartUploadRequest(bucket,
259-
destKey,
260-
uploadId,
261-
new ArrayList<>(partETags)));
262-
owner.finishedWrite(destKey, length);
263-
return result;
256+
return owner.getAmazonS3Client().completeMultipartUpload(
257+
new CompleteMultipartUploadRequest(bucket,
258+
destKey,
259+
uploadId,
260+
new ArrayList<>(partETags)));
264261
}
265262
);
263+
owner.finishedWrite(destKey, length);
264+
return uploadResult;
266265
}
267266

268267
/**

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
import org.apache.hadoop.conf.Configuration;
3030
import org.apache.hadoop.fs.FileSystem;
3131
import org.apache.hadoop.fs.Path;
32+
import org.apache.hadoop.fs.s3a.Retries.RetryTranslated;
3233

3334
/**
3435
* {@code MetadataStore} defines the set of operations that any metadata store
@@ -165,6 +166,7 @@ void move(Collection<Path> pathsToDelete,
165166
* @param meta the metadata to save
166167
* @throws IOException if there is an error
167168
*/
169+
@RetryTranslated
168170
void put(PathMetadata meta) throws IOException;
169171

170172
/**

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
import org.apache.hadoop.fs.FileSystem;
4141
import org.apache.hadoop.fs.Path;
4242
import org.apache.hadoop.fs.s3a.Retries;
43+
import org.apache.hadoop.fs.s3a.Retries.RetryTranslated;
4344
import org.apache.hadoop.fs.s3a.S3AFileStatus;
4445
import org.apache.hadoop.fs.s3a.S3AInstrumentation;
4546
import org.apache.hadoop.fs.s3a.Tristate;
@@ -144,6 +145,7 @@ static Class<? extends MetadataStore> getMetadataStoreClass(
144145
* @return The same status as passed in
145146
* @throws IOException if metadata store update failed
146147
*/
148+
@RetryTranslated
147149
public static S3AFileStatus putAndReturn(MetadataStore ms,
148150
S3AFileStatus status,
149151
S3AInstrumentation instrumentation) throws IOException {

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md

Lines changed: 53 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,10 @@ This offers no metadata storage, and effectively disables S3Guard.
9898

9999
More settings will may be added in the future.
100100
Currently the only Metadata Store-independent setting, besides the
101-
implementation class above, is the *allow authoritative* flag.
101+
implementation class above, are the *allow authoritative* and *fail-on-error*
102+
flags.
103+
104+
#### Allow Authoritative
102105

103106
The _authoritative_ expression in S3Guard is present in two different layers, for
104107
two different reasons:
@@ -183,6 +186,46 @@ removed on `S3AFileSystem` level.
183186
</property>
184187
```
185188

189+
#### Fail on Error
190+
191+
By default, S3AFileSystem write operations will fail when updates to
192+
S3Guard metadata fail. S3AFileSystem first writes the file to S3 and then
193+
updates the metadata in S3Guard. If the metadata write fails,
194+
`MetadataPersistenceException` is thrown. The file in S3 **is not** rolled
195+
back.
196+
197+
If the write operation cannot be programmatically retried, the S3Guard metadata
198+
for the given file can be corrected with a command like the following:
199+
200+
```bash
201+
hadoop s3guard import [-meta URI] s3a://my-bucket/file-with-bad-metadata
202+
```
203+
204+
Programmatic retries of the original operation would require overwrite=true.
205+
Suppose the original operation was FileSystem.create(myFile, overwrite=false).
206+
If this operation failed with `MetadataPersistenceException` a repeat of the
207+
same operation would result in `FileAlreadyExistsException` since the original
208+
operation successfully created the file in S3 and only failed in writing the
209+
metadata to S3Guard.
210+
211+
Metadata update failures can be downgraded to ERROR logging instead of exception
212+
by setting the following configuration:
213+
214+
```xml
215+
<property>
216+
<name>fs.s3a.metadatastore.fail.on.write.error</name>
217+
<value>false</value>
218+
</property>
219+
```
220+
221+
Setting this false is dangerous as it could result in the type of issue S3Guard
222+
is designed to avoid. For example, a reader may see an inconsistent listing
223+
after a recent write since S3Guard may not contain metadata about the recently
224+
written file due to a metadata write error.
225+
226+
As with the default setting, the new/updated file is still in S3 and **is not**
227+
rolled back. The S3Guard metadata is likely to be out of sync.
228+
186229
### 3. Configure the Metadata Store.
187230

188231
Here are the `DynamoDBMetadataStore` settings. Other Metadata Store
@@ -1152,7 +1195,7 @@ java.io.IOException: Invalid region specified "iceland-2":
11521195

11531196
The region specified in `fs.s3a.s3guard.ddb.region` is invalid.
11541197

1155-
# "Neither ReadCapacityUnits nor WriteCapacityUnits can be specified when BillingMode is PAY_PER_REQUEST"
1198+
### "Neither ReadCapacityUnits nor WriteCapacityUnits can be specified when BillingMode is PAY_PER_REQUEST"
11561199

11571200
```
11581201
ValidationException; One or more parameter values were invalid:
@@ -1164,6 +1207,14 @@ ValidationException; One or more parameter values were invalid:
11641207
On-Demand DynamoDB tables do not have any fixed capacity -it is an error
11651208
to try to change it with the `set-capacity` command.
11661209

1210+
### `MetadataPersistenceException`
1211+
1212+
A filesystem write operation failed to persist metadata to S3Guard. The file was
1213+
successfully written to S3 and now the S3Guard metadata is likely to be out of
1214+
sync.
1215+
1216+
See [Fail on Error](#fail-on-error) for more detail.
1217+
11671218
## Other Topics
11681219

11691220
For details on how to test S3Guard, see [Testing S3Guard](./testing.html#s3guard)

0 commit comments

Comments
 (0)