Skip to content

Commit 68484ed

Browse files
HADOOP-19696. hadoop binary distribution to move cloud connectors to hadoop common/lib (#7980)
This moves all the cloud connector libraries to common/lib There are specific build options to control which libraries to include The hadoop-* JARs of the modules are includes, but dependencies are only included when the build-time options specify it. Available package profiles: hadoop-aliyun-package hadoop-aws-package hadoop-azure-datalake-package hadoop-cos-package hadoop-gcp-package hadoop-huaweicloud-package hadoop-tos-package This means that by default AWS bundle.jar is no longer included in the distribution: to add it users must drop their chosen version of the SDK into share/hadoop/common/lib Anyone building their own release now has a choice of which connectors to bundle. The ASF ones will stay fairly lean to reduce the CVE attack surface as well as keep package size under control. Contributed by Steve Loughran
1 parent c5b2c34 commit 68484ed

File tree

16 files changed

+626
-106
lines changed

16 files changed

+626
-106
lines changed

BUILDING.txt

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -390,6 +390,57 @@ Create a local staging version of the website (in /tmp/hadoop-site)
390390

391391
Note that the site needs to be built in a second pass after other artifacts.
392392

393+
----------------------------------------------------------------------------------
394+
Including Cloud Connector Dependencies in Distributions:
395+
396+
Hadoop distributions include the hadoop modules needed to work with data and services
397+
on cloud infrastructure
398+
399+
However, dependencies are omitted for all cloud connectors except hadoop-azure
400+
(abfs:// and wasb://) and possibly hadoop-gcp (gs://) and hadoop-tos (tos://).
401+
For the latter two modules, it depends on shading options.
402+
403+
For hadoop-aws the AWS SDK bundle.jar is omitted, but everything else is included.
404+
405+
Excluding the extra binaries:
406+
* Keeps release artifact size below the limit of the ASF distribution network.
407+
* Reduces download and size overhead in docker usage.
408+
* Reduces the CVE attack surface and audit-related complaints about those same CVEs.
409+
* Reduces the risk of classpath conflict.
410+
411+
To produce a build with the specific desired dependencies, the build must be executed
412+
with the relevant profile of ${module}-package alongside the -Pdist profile.
413+
414+
For example, a build with the hadoop-aws and hadoop-azure-datalake dependencies,
415+
run with
416+
417+
mvn package -Pdist -DskipTests -Dhadoop-aws-package -Dhadoop-azure-datalake-package
418+
419+
Available package profiles:
420+
hadoop-aliyun-package
421+
hadoop-aws-package
422+
hadoop-azure-datalake-package
423+
hadoop-cos-package
424+
hadoop-gcp-package
425+
hadoop-huaweicloud-package
426+
hadoop-tos-package
427+
428+
To build a complete distribution then with all cloud dependencies included:
429+
430+
mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true \
431+
-Dhadoop-aliyun-package \
432+
-Dhadoop-aws-package \
433+
-Dhadoop-azure-datalake-package \
434+
-Dhadoop-cos-package \
435+
-Dhadoop-gcp-package \
436+
-Dhadoop-huaweicloud-package \
437+
-Dhadoop-tos-package
438+
439+
The resulting tar file will be too large to be distributable through ASF infrastructure.
440+
441+
The hadoop-gcp and hadoop-tos artifacts include their dependencies as shaded
442+
artifacts unless the distribution is built with -DskipShade.
443+
393444
----------------------------------------------------------------------------------
394445
Installing Hadoop
395446

LICENSE-binary

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -203,18 +203,23 @@
203203

204204
--------------------------------------------------------------------------------
205205
This project bundles some components that are also licensed under the Apache
206-
License Version 2.0:
206+
License Version 2.0.
207+
Note: some of the listed artifacts may not be included in a given build of the binary
208+
distribution; it depends on the build options. This list intends
209+
to be inclusive of all which may be included:
207210

208211

209212
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/nvd3-1.8.5.* (css and js files)
210213
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/checker/AbstractFuture.java
211214
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/checker/TimeoutFuture.java
212215

213216
ch.qos.reload4j:reload4j:1.2.22
217+
com.aliyun:aliyun-java-core:0.2.11-beta
214218
com.aliyun:aliyun-java-sdk-core:4.5.10
215219
com.aliyun:aliyun-java-sdk-kms:2.11.0
216220
com.aliyun:aliyun-java-sdk-ram:3.1.0
217221
com.aliyun:aliyun-java-sdk-sts:3.0.0
222+
com.aliyun:java-trace-api:0.2.11-beta
218223
com.aliyun.oss:aliyun-sdk-oss:3.13.2
219224
com.cedarsoftware:java-util:1.9.0
220225
com.cedarsoftware:json-io:2.5.1
@@ -266,8 +271,13 @@ com.google.http-client:google-http-client-jackson2:1.46.3
266271
com.google.http-client:google-http-client:1.46.3
267272
com.google.j2objc:j2objc-annotations:3.0.0
268273
com.google.oauth-client:google-oauth-client:1.37.0
269-
com.microsoft.azure:azure-storage:7.0.0
274+
com.huaweicloud:esdk-obs-java:3.20.4.2
275+
com.jamesmurty.utils:java-xmlbuilder-1.2.jar
276+
com.microsoft.azure:azure-storage:7.0.1
270277
com.nimbusds:nimbus-jose-jwt:10.4
278+
com.squareup.okhttp3:okhttp:jar:3.14.2
279+
com.squareup.okio:okio:jar:1.17.2
280+
com.volcengine:ve-tos-java-sdk-hadoop:2.8.9.jar
271281
com.zaxxer:HikariCP:4.0.3
272282
commons-beanutils:commons-beanutils:1.9.4
273283
commons-cli:commons-cli:1.9.0
@@ -344,6 +354,9 @@ io.opentelemetry:opentelemetry-sdk-logs:1.47.0
344354
io.opentelemetry:opentelemetry-sdk-metrics:1.47.0
345355
io.opentelemetry:opentelemetry-sdk-trace:1.47.0
346356
io.opentelemetry.semconv:opentelemetry-semconv:1.29.0-alpha
357+
io.opentracing:opentracing-api:0.33.0.jar
358+
io.opentracing:opentracing-noop:0.33.0.jar
359+
io.opentracing:opentracing-util:0.33.0.jar
347360
io.reactivex:rxjava:1.3.8
348361
io.reactivex:rxjava-string:1.1.1
349362
io.reactivex:rxnetty:0.4.20
@@ -371,6 +384,8 @@ org.apache.htrace:htrace-core:3.1.0-incubating
371384
org.apache.htrace:htrace-core4:4.1.0-incubating
372385
org.apache.httpcomponents:httpclient:4.5.13
373386
org.apache.httpcomponents:httpcore:4.4.13
387+
org.apache.httpcomponents.client5:httpclient5:5.5
388+
org.apache.httpcomponents.core5:httpcore5:5.5
374389
org.apache.kafka:kafka-clients:3.9.0
375390
org.apache.kerby:kerb-admin:2.0.3
376391
org.apache.kerby:kerb-client:2.0.3
@@ -494,6 +509,7 @@ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanage
494509
bootstrap v3.3.6
495510
broccoli-asset-rev v2.4.2
496511
broccoli-funnel v1.0.1
512+
cos_api-bundle-5.6.19.jar
497513
datatables v1.11.5
498514
em-helpers v0.5.13
499515
em-table v0.1.6
@@ -539,7 +555,7 @@ com.microsoft.azure:azure-cosmosdb:2.4.5
539555
com.microsoft.azure:azure-cosmosdb-commons:2.4.5
540556
com.microsoft.azure:azure-cosmosdb-direct:2.4.5
541557
com.microsoft.azure:azure-cosmosdb-gateway:2.4.5
542-
com.microsoft.azure:azure-data-lake-store-sdk:2.3.3
558+
com.microsoft.azure:azure-data-lake-store-sdk:2.3.9
543559
com.microsoft.azure:azure-keyvault-core:1.0.0
544560
com.microsoft.sqlserver:mssql-jdbc:6.2.1.jre7
545561
org.bouncycastle:bcpkix-jdk18on:1.82
@@ -550,6 +566,7 @@ org.codehaus.mojo:animal-sniffer-annotations:1.24
550566
org.jruby.jcodings:jcodings:1.0.13
551567
org.jruby.joni:joni:2.1.2
552568
org.ojalgo:ojalgo:43.0
569+
org.reactivestreams:reactive-streams:1.0.3.jar
553570
org.slf4j:jul-to-slf4j:1.7.36
554571
org.slf4j:slf4j-api:1.7.36
555572
org.slf4j:slf4j-reload4j:1.7.36
@@ -620,3 +637,8 @@ Public Domain
620637
-------------
621638

622639
aopalliance:aopalliance:1.0
640+
641+
Dom4J license
642+
-------------
643+
644+
org.dom4j:dom4j:2.1.4.jar

dev-support/bin/dist-layout-stitching

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,10 @@ run cp -p "${ROOT}/README.txt" .
130130
run copy "${ROOT}/hadoop-common-project/hadoop-common/target/hadoop-common-${VERSION}" .
131131
run copy "${ROOT}/hadoop-common-project/hadoop-nfs/target/hadoop-nfs-${VERSION}" .
132132
run copy "${ROOT}/hadoop-common-project/hadoop-registry/target/hadoop-registry-${VERSION}" .
133+
134+
# cloud connectors go into common
135+
run copy "${ROOT}/hadoop-cloud-storage-project/hadoop-cloud-storage-dist/target/hadoop-cloud-storage-dist-${VERSION}" .
136+
133137
run copy "${ROOT}/hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-${VERSION}" .
134138
run copy "${ROOT}/hadoop-hdfs-project/hadoop-hdfs-nfs/target/hadoop-hdfs-nfs-${VERSION}" .
135139
run copy "${ROOT}/hadoop-hdfs-project/hadoop-hdfs-client/target/hadoop-hdfs-client-${VERSION}" .
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one or more
3+
contributor license agreements. See the NOTICE file distributed with
4+
this work for additional information regarding copyright ownership.
5+
The ASF licenses this file to You under the Apache License, Version 2.0
6+
(the "License"); you may not use this file except in compliance with
7+
the License. You may obtain a copy of the License at
8+
9+
https://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the Li2cense is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.3"
18+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
19+
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.3 https://maven.apache.org/xsd/assembly-1.1.3.xsd">
20+
<id>hadoop-cloud-storage</id>
21+
<formats>
22+
<format>dir</format>
23+
</formats>
24+
<includeBaseDirectory>false</includeBaseDirectory>
25+
26+
<!--
27+
This is executed in directory hadoop-cloud-storage-project/hadoop-cloud-storage-dist
28+
All paths must be relative to that.
29+
-->
30+
<fileSets>
31+
<fileSet>
32+
<directory>../../hadoop-tools/hadoop-aws/src/main/bin</directory>
33+
<outputDirectory>/bin</outputDirectory>
34+
<fileMode>0755</fileMode>
35+
</fileSet>
36+
<fileSet>
37+
<directory>./../hadoop-tools/hadoop-aws/src/main/shellprofile.d</directory>
38+
<includes>
39+
<include>*</include>
40+
</includes>
41+
<outputDirectory>/libexec/shellprofile.d</outputDirectory>
42+
<fileMode>0755</fileMode>
43+
</fileSet>
44+
</fileSets>
45+
46+
<dependencySets>
47+
<dependencySet>
48+
<outputDirectory>/share/hadoop/common/lib</outputDirectory>
49+
<unpack>false</unpack>
50+
<scope>runtime</scope>
51+
<useProjectArtifact>false</useProjectArtifact>
52+
<!-- Stop some needless artifact propagation -->
53+
<excludes>
54+
<exclude>org.apache.hadoop:hadoop-annotations</exclude>
55+
<exclude>org.apache.hadoop.thirdparty:hadoop-shaded-guava</exclude>
56+
</excludes>
57+
</dependencySet>
58+
</dependencySets>
59+
</assembly>

hadoop-assemblies/src/main/resources/assemblies/hadoop-src.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@
5757
<exclude>**/file:/**</exclude>
5858
<exclude>**/SecurityAuth.audit*</exclude>
5959
<exclude>patchprocess/**</exclude>
60+
<exclude>**/auth-keys.xml</exclude>
6061
</excludes>
6162
</fileSet>
6263
</fileSets>

0 commit comments

Comments
 (0)