Skip to content

Commit 9fcf45e

Browse files
committed
Merge branch 'apache:trunk' into hadoop-trunk
2 parents 93d18d5 + dae33cf commit 9fcf45e

File tree

239 files changed

+1495
-36742
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

239 files changed

+1495
-36742
lines changed

NOTICE-binary

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ available from http://www.digip.org/jansson/.
6666

6767

6868
AWS SDK for Java
69-
Copyright 2010-2014 Amazon.com, Inc. or its affiliates. All Rights Reserved.
69+
Copyright 2010-2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
7070

7171
This product includes software developed by
7272
Amazon Technologies, Inc (http://www.amazon.com/).

hadoop-common-project/hadoop-common/src/main/resources/core-default.xml

Lines changed: 3 additions & 186 deletions
Original file line numberDiff line numberDiff line change
@@ -1229,7 +1229,7 @@
12291229
com.amazonaws.auth.AWSCredentialsProvider.
12301230

12311231
When S3A delegation tokens are not enabled, this list will be used
1232-
to directly authenticate with S3 and DynamoDB services.
1232+
to directly authenticate with S3 and other AWS services.
12331233
When S3A Delegation tokens are enabled, depending upon the delegation
12341234
token binding it may be used
12351235
to communicate wih the STS endpoint to request session/role
@@ -1686,180 +1686,18 @@
16861686
</description>
16871687
</property>
16881688

1689-
<property>
1690-
<name>fs.s3a.metadatastore.authoritative</name>
1691-
<value>false</value>
1692-
<description>
1693-
When true, allow MetadataStore implementations to act as source of
1694-
truth for getting file status and directory listings. Even if this
1695-
is set to true, MetadataStore implementations may choose not to
1696-
return authoritative results. If the configured MetadataStore does
1697-
not support being authoritative, this setting will have no effect.
1698-
</description>
1699-
</property>
1700-
1701-
<property>
1702-
<name>fs.s3a.metadatastore.metadata.ttl</name>
1703-
<value>15m</value>
1704-
<description>
1705-
This value sets how long an entry in a MetadataStore is valid.
1706-
</description>
1707-
</property>
1708-
1709-
<property>
1710-
<name>fs.s3a.metadatastore.impl</name>
1711-
<value>org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore</value>
1712-
<description>
1713-
Fully-qualified name of the class that implements the MetadataStore
1714-
to be used by s3a. The default class, NullMetadataStore, has no
1715-
effect: s3a will continue to treat the backing S3 service as the one
1716-
and only source of truth for file and directory metadata.
1717-
</description>
1718-
</property>
1719-
1720-
<property>
1721-
<name>fs.s3a.metadatastore.fail.on.write.error</name>
1722-
<value>true</value>
1723-
<description>
1724-
When true (default), FileSystem write operations generate
1725-
org.apache.hadoop.fs.s3a.MetadataPersistenceException if the metadata
1726-
cannot be saved to the metadata store. When false, failures to save to
1727-
metadata store are logged at ERROR level, but the overall FileSystem
1728-
write operation succeeds.
1729-
</description>
1730-
</property>
1731-
1732-
<property>
1733-
<name>fs.s3a.s3guard.cli.prune.age</name>
1734-
<value>86400000</value>
1735-
<description>
1736-
Default age (in milliseconds) after which to prune metadata from the
1737-
metadatastore when the prune command is run. Can be overridden on the
1738-
command-line.
1739-
</description>
1740-
</property>
1741-
1742-
17431689
<property>
17441690
<name>fs.s3a.impl</name>
17451691
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
17461692
<description>The implementation class of the S3A Filesystem</description>
17471693
</property>
17481694

1749-
<property>
1750-
<name>fs.s3a.s3guard.ddb.region</name>
1751-
<value></value>
1752-
<description>
1753-
AWS DynamoDB region to connect to. An up-to-date list is
1754-
provided in the AWS Documentation: regions and endpoints. Without this
1755-
property, the S3Guard will operate table in the associated S3 bucket region.
1756-
</description>
1757-
</property>
1758-
1759-
<property>
1760-
<name>fs.s3a.s3guard.ddb.table</name>
1761-
<value></value>
1762-
<description>
1763-
The DynamoDB table name to operate. Without this property, the respective
1764-
S3 bucket name will be used.
1765-
</description>
1766-
</property>
1767-
1768-
<property>
1769-
<name>fs.s3a.s3guard.ddb.table.create</name>
1770-
<value>false</value>
1771-
<description>
1772-
If true, the S3A client will create the table if it does not already exist.
1773-
</description>
1774-
</property>
1775-
1776-
<property>
1777-
<name>fs.s3a.s3guard.ddb.table.capacity.read</name>
1778-
<value>0</value>
1779-
<description>
1780-
Provisioned throughput requirements for read operations in terms of capacity
1781-
units for the DynamoDB table. This config value will only be used when
1782-
creating a new DynamoDB table.
1783-
If set to 0 (the default), new tables are created with "per-request" capacity.
1784-
If a positive integer is provided for this and the write capacity, then
1785-
a table with "provisioned capacity" will be created.
1786-
You can change the capacity of an existing provisioned-capacity table
1787-
through the "s3guard set-capacity" command.
1788-
</description>
1789-
</property>
1790-
1791-
<property>
1792-
<name>fs.s3a.s3guard.ddb.table.capacity.write</name>
1793-
<value>0</value>
1794-
<description>
1795-
Provisioned throughput requirements for write operations in terms of
1796-
capacity units for the DynamoDB table.
1797-
If set to 0 (the default), new tables are created with "per-request" capacity.
1798-
Refer to related configuration option fs.s3a.s3guard.ddb.table.capacity.read
1799-
</description>
1800-
</property>
1801-
1802-
<property>
1803-
<name>fs.s3a.s3guard.ddb.table.sse.enabled</name>
1804-
<value>false</value>
1805-
<description>
1806-
Whether server-side encryption (SSE) is enabled or disabled on the table.
1807-
By default it's disabled, meaning SSE is set to AWS owned CMK.
1808-
</description>
1809-
</property>
1810-
1811-
<property>
1812-
<name>fs.s3a.s3guard.ddb.table.sse.cmk</name>
1813-
<value/>
1814-
<description>
1815-
The KMS Customer Master Key (CMK) used for the KMS encryption on the table.
1816-
To specify a CMK, this config value can be its key ID, Amazon Resource Name
1817-
(ARN), alias name, or alias ARN. Users only need to provide this config if
1818-
the key is different from the default DynamoDB KMS Master Key, which is
1819-
alias/aws/dynamodb.
1820-
</description>
1821-
</property>
1822-
1823-
<property>
1824-
<name>fs.s3a.s3guard.ddb.max.retries</name>
1825-
<value>9</value>
1826-
<description>
1827-
Max retries on throttled/incompleted DynamoDB operations
1828-
before giving up and throwing an IOException.
1829-
Each retry is delayed with an exponential
1830-
backoff timer which starts at 100 milliseconds and approximately
1831-
doubles each time. The minimum wait before throwing an exception is
1832-
sum(100, 200, 400, 800, .. 100*2^N-1 ) == 100 * ((2^N)-1)
1833-
</description>
1834-
</property>
1835-
1836-
<property>
1837-
<name>fs.s3a.s3guard.ddb.throttle.retry.interval</name>
1838-
<value>100ms</value>
1839-
<description>
1840-
Initial interval to retry after a request is throttled events;
1841-
the back-off policy is exponential until the number of retries of
1842-
fs.s3a.s3guard.ddb.max.retries is reached.
1843-
</description>
1844-
</property>
1845-
1846-
<property>
1847-
<name>fs.s3a.s3guard.ddb.background.sleep</name>
1848-
<value>25ms</value>
1849-
<description>
1850-
Length (in milliseconds) of pause between each batch of deletes when
1851-
pruning metadata. Prevents prune operations (which can typically be low
1852-
priority background operations) from overly interfering with other I/O
1853-
operations.
1854-
</description>
1855-
</property>
1856-
18571695
<property>
18581696
<name>fs.s3a.retry.limit</name>
18591697
<value>7</value>
18601698
<description>
18611699
Number of times to retry any repeatable S3 client request on failure,
1862-
excluding throttling requests and S3Guard inconsistency resolution.
1700+
excluding throttling requests.
18631701
</description>
18641702
</property>
18651703

@@ -1868,7 +1706,7 @@
18681706
<value>500ms</value>
18691707
<description>
18701708
Initial retry interval when retrying operations for any reason other
1871-
than S3 throttle errors and S3Guard inconsistency resolution.
1709+
than S3 throttle errors.
18721710
</description>
18731711
</property>
18741712

@@ -1891,27 +1729,6 @@
18911729
</description>
18921730
</property>
18931731

1894-
<property>
1895-
<name>fs.s3a.s3guard.consistency.retry.limit</name>
1896-
<value>7</value>
1897-
<description>
1898-
Number of times to retry attempts to read/open/copy files when
1899-
S3Guard believes a specific version of the file to be available,
1900-
but the S3 request does not find any version of a file, or a different
1901-
version.
1902-
</description>
1903-
</property>
1904-
1905-
<property>
1906-
<name>fs.s3a.s3guard.consistency.retry.interval</name>
1907-
<value>2s</value>
1908-
<description>
1909-
Initial interval between attempts to retry operations while waiting for S3
1910-
to become consistent with the S3Guard data.
1911-
An exponential back-off is used here: every failure doubles the delay.
1912-
</description>
1913-
</property>
1914-
19151732
<property>
19161733
<name>fs.s3a.committer.name</name>
19171734
<value>file</value>

hadoop-common-project/hadoop-common/src/site/markdown/AdminCompatibilityGuide.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,8 @@ internal state stores:
137137

138138
* The internal MapReduce state data will remain compatible across minor releases within the same major version to facilitate rolling upgrades while MapReduce workloads execute.
139139
* HDFS maintains metadata about the data stored in HDFS in a private, internal format that is versioned. In the event of an incompatible change, the store's version number will be incremented. When upgrading an existing cluster, the metadata store will automatically be upgraded if possible. After the metadata store has been upgraded, it is always possible to reverse the upgrade process.
140-
* The AWS S3A guard keeps a private, internal metadata store that is versioned. Incompatible changes will cause the version number to be incremented. If an upgrade requires reformatting the store, it will be indicated in the release notes.
140+
* The AWS S3A guard kept a private, internal metadata store.
141+
Now that the feature has been removed, the store is obsolete and can be deleted.
141142
* The YARN resource manager keeps a private, internal state store of application and scheduler information that is versioned. Incompatible changes will cause the version number to be incremented. If an upgrade requires reformatting the store, it will be indicated in the release notes.
142143
* The YARN node manager keeps a private, internal state store of application information that is versioned. Incompatible changes will cause the version number to be incremented. If an upgrade requires reformatting the store, it will be indicated in the release notes.
143144
* The YARN federation service keeps a private, internal state store of application and cluster information that is versioned. Incompatible changes will cause the version number to be incremented. If an upgrade requires reformatting the store, it will be indicated in the release notes.

hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md

Lines changed: 5 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -477,19 +477,12 @@ rolled back to the older layout.
477477

478478
##### AWS S3A Guard Metadata
479479

480-
For each operation in the Hadoop S3 client (s3a) that reads or modifies
481-
file metadata, a shadow copy of that file metadata is stored in a separate
482-
metadata store, which offers HDFS-like consistency for the metadata, and may
483-
also provide faster lookups for things like file status or directory listings.
484-
S3A guard tables are created with a version marker which indicates
485-
compatibility.
480+
The S3Guard metastore used to store metadata in DynamoDB tables;
481+
as such it had to maintain a compatibility strategy.
482+
Now that S3Guard is removed, the tables are not needed.
486483

487-
###### Policy
488-
489-
The S3A guard metadata schema SHALL be considered
490-
[Private](./InterfaceClassification.html#Private) and
491-
[Unstable](./InterfaceClassification.html#Unstable). Any incompatible change
492-
to the schema MUST result in the version number of the schema being incremented.
484+
Applications configured to use an S3A metadata store other than
485+
the "null" store will fail.
493486

494487
##### YARN Resource Manager State Store
495488

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -343,7 +343,7 @@ stores pretend that they are a FileSystem, a FileSystem with the same
343343
features and operations as HDFS. This is &mdash;ultimately&mdash;a pretence:
344344
they have different characteristics and occasionally the illusion fails.
345345

346-
1. **Consistency**. Object stores are generally *Eventually Consistent*: it
346+
1. **Consistency**. Object may be *Eventually Consistent*: it
347347
can take time for changes to objects &mdash;creation, deletion and updates&mdash;
348348
to become visible to all callers. Indeed, there is no guarantee a change is
349349
immediately visible to the client which just made the change. As an example,
@@ -447,10 +447,6 @@ Object stores have an even vaguer view of time, which can be summarized as
447447
* The timestamp is likely to be in UTC or the TZ of the object store. If the
448448
client is in a different timezone, the timestamp of objects may be ahead or
449449
behind that of the client.
450-
* Object stores with cached metadata databases (for example: AWS S3 with
451-
an in-memory or a DynamoDB metadata store) may have timestamps generated
452-
from the local system clock, rather than that of the service.
453-
This is an optimization to avoid round-trip calls to the object stores.
454450
+ A file's modification time is often the same as its creation time.
455451
+ The `FileSystem.setTimes()` operation to set file timestamps *may* be ignored.
456452
* `FileSystem.chmod()` may update modification times (example: Azure `wasb://`).

hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1664,9 +1664,16 @@ public BlocksWithLocations getBlocksWithLocations(final DatanodeID datanode,
16641664
if(numBlocks == 0) {
16651665
return new BlocksWithLocations(new BlockWithLocations[0]);
16661666
}
1667+
1668+
// skip stale storage
1669+
DatanodeStorageInfo[] storageInfos = Arrays
1670+
.stream(node.getStorageInfos())
1671+
.filter(s -> !s.areBlockContentsStale())
1672+
.toArray(DatanodeStorageInfo[]::new);
1673+
16671674
// starting from a random block
16681675
int startBlock = ThreadLocalRandom.current().nextInt(numBlocks);
1669-
Iterator<BlockInfo> iter = node.getBlockIterator(startBlock);
1676+
Iterator<BlockInfo> iter = node.getBlockIterator(startBlock, storageInfos);
16701677
List<BlockWithLocations> results = new ArrayList<BlockWithLocations>();
16711678
List<BlockInfo> pending = new ArrayList<BlockInfo>();
16721679
long totalSize = 0;
@@ -1685,8 +1692,8 @@ public BlocksWithLocations getBlocksWithLocations(final DatanodeID datanode,
16851692
}
16861693
}
16871694
if(totalSize<size) {
1688-
iter = node.getBlockIterator(); // start from the beginning
1689-
for(int i=0; i<startBlock&&totalSize<size; i++) {
1695+
iter = node.getBlockIterator(0, storageInfos); // start from the beginning
1696+
for(int i = 0; i < startBlock && totalSize < size && iter.hasNext(); i++) {
16901697
curBlock = iter.next();
16911698
if(!curBlock.isComplete()) continue;
16921699
if (curBlock.getNumBytes() < minBlockSize) {

hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -647,6 +647,17 @@ Iterator<BlockInfo> getBlockIterator(final int startBlock) {
647647
return new BlockIterator(startBlock, getStorageInfos());
648648
}
649649

650+
/**
651+
* Get iterator, which starts iterating from the specified block and storages.
652+
*
653+
* @param startBlock on which blocks are start iterating
654+
* @param storageInfos specified storages
655+
*/
656+
Iterator<BlockInfo> getBlockIterator(
657+
final int startBlock, final DatanodeStorageInfo[] storageInfos) {
658+
return new BlockIterator(startBlock, storageInfos);
659+
}
660+
650661
@VisibleForTesting
651662
public void incrementPendingReplicationWithoutTargets() {
652663
pendingReplicationWithoutTargets++;

hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,11 @@ public boolean areBlockContentsStale() {
168168
return blockContentsStale;
169169
}
170170

171+
@VisibleForTesting
172+
public void setBlockContentsStale(boolean value) {
173+
blockContentsStale = value;
174+
}
175+
171176
void markStaleAfterFailover() {
172177
heartbeatedSinceFailover = false;
173178
blockContentsStale = true;

hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1271,6 +1271,7 @@ boolean isOutliersReportDue(long curTime) {
12711271

12721272
void forceFullBlockReportNow() {
12731273
forceFullBlockReport.set(true);
1274+
resetBlockReportTime = true;
12741275
}
12751276

12761277
/**

hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ public class DNConf {
113113
final long outliersReportIntervalMs;
114114
final long ibrInterval;
115115
final long initialBlockReportDelayMs;
116-
final long cacheReportInterval;
116+
volatile long cacheReportInterval;
117117
final long datanodeSlowIoWarningThresholdMs;
118118

119119
final String minimumNameNodeVersion;
@@ -484,4 +484,14 @@ void setBlockReportInterval(long intervalMs) {
484484
public long getBlockReportInterval() {
485485
return blockReportInterval;
486486
}
487+
488+
void setCacheReportInterval(long intervalMs) {
489+
Preconditions.checkArgument(intervalMs > 0,
490+
"dfs.cachereport.intervalMsec should be larger than 0");
491+
cacheReportInterval = intervalMs;
492+
}
493+
494+
public long getCacheReportInterval() {
495+
return cacheReportInterval;
496+
}
487497
}

0 commit comments

Comments
 (0)