Skip to content

Conversation

@hanishakoneru
Copy link
Contributor

Increasing the number of client retries to 100 and adding sleep of 500ms between retries

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 27 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
-1 test4tests 0 The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
0 mvndep 71 Maven dependency ordering for branch
+1 mvninstall 1039 trunk passed
+1 compile 954 trunk passed
+1 checkstyle 141 trunk passed
+1 mvnsite 148 trunk passed
+1 shadedclient 1026 branch has no errors when building and testing our client artifacts.
+1 findbugs 188 trunk passed
+1 javadoc 119 trunk passed
_ Patch Compile Tests _
0 mvndep 61 Maven dependency ordering for patch
+1 mvninstall 109 the patch passed
+1 compile 928 the patch passed
+1 javac 928 the patch passed
+1 checkstyle 139 the patch passed
+1 mvnsite 124 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 1 The patch has no ill-formed XML file.
+1 shadedclient 649 patch has no errors when building and testing our client artifacts.
+1 findbugs 198 the patch passed
+1 javadoc 106 the patch passed
_ Other Tests _
+1 unit 80 common in the patch passed.
+1 unit 39 client in the patch passed.
+1 unit 34 objectstore-service in the patch passed.
+1 asflicense 45 The patch does not generate ASF License warnings.
6231
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-753/1/artifact/out/Dockerfile
GITHUB PR #753
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux f69225dbaa0b 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / ef97a20
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-753/1/testReport/
Max. process+thread count 449 (vs. ulimit of 5500)
modules C: hadoop-hdds/common hadoop-ozone/client hadoop-ozone/objectstore-service U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-753/1/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@bshashikant
Copy link
Contributor

bshashikant commented Apr 22, 2019

Thanks Hanisha for updating the patch. The patch adds a retry interval while doing a retry of a client write request. But, this may not address the problem holistically, as client can still get allocated blocks from a container and while the actual write happens to the datanode, the container might get closed. The problem gets aggravated if we have large no of preallocated blocks, but client write happens much later.

Copy link
Contributor

@arp7 arp7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Hanisha for updating the patch. The patch adds a retry interval while doing a retry of a client write request. But, this may not address the problem holistically, as client can still get allocated blocks from a container and while the actual write happens to the datanode, the container might get closed. The problem gets aggravated if we have large no of preallocated blocks, but client write happens much later.

Hi @bshashikant , the retry is before going to the OM. This is to add a bit of throttle to protect the OM since clients that are failing could keep spamming the OM in a tight loop.

@mukul1987 did suggest reducing the interval a bit. We could reduce it to 100ms.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't hardcode the unit (ms). We can specify the unit with the config key. See Configuration#getTimeDuration.

@bshashikant
Copy link
Contributor

Thanks @arp7 . The retry interval should by default should be lower as, other than ContainerCloseExceptions, Ozone client retries in cases, where a request times out, or leader election could not complete etc where Ratis itself retries for a certain interval of time of around 10 mins .This retryInterval will again be added to to total time between two successive calls to OM in case of a failure. This is in the actual write path and will affect the write throughput considerably.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 26 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
-1 test4tests 0 The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
0 mvndep 71 Maven dependency ordering for branch
+1 mvninstall 1136 trunk passed
+1 compile 966 trunk passed
+1 checkstyle 143 trunk passed
+1 mvnsite 185 trunk passed
+1 shadedclient 1109 branch has no errors when building and testing our client artifacts.
+1 findbugs 231 trunk passed
+1 javadoc 148 trunk passed
_ Patch Compile Tests _
0 mvndep 22 Maven dependency ordering for patch
-1 mvninstall 18 client in the patch failed.
-1 mvninstall 18 objectstore-service in the patch failed.
+1 compile 894 the patch passed
+1 javac 894 the patch passed
+1 checkstyle 184 the patch passed
-1 mvnsite 33 objectstore-service in the patch failed.
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 1 The patch has no ill-formed XML file.
+1 shadedclient 727 patch has no errors when building and testing our client artifacts.
-1 findbugs 27 objectstore-service in the patch failed.
+1 javadoc 147 the patch passed
_ Other Tests _
+1 unit 88 common in the patch passed.
+1 unit 34 client in the patch passed.
+1 unit 45 common in the patch passed.
-1 unit 30 objectstore-service in the patch failed.
+1 asflicense 42 The patch does not generate ASF License warnings.
6734
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-753/2/artifact/out/Dockerfile
GITHUB PR #753
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux 9bb732da53b9 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / a703dae
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-753/2/artifact/out/patch-mvninstall-hadoop-ozone_client.txt
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-753/2/artifact/out/patch-mvninstall-hadoop-ozone_objectstore-service.txt
mvnsite https://builds.apache.org/job/hadoop-multibranch/job/PR-753/2/artifact/out/patch-mvnsite-hadoop-ozone_objectstore-service.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-753/2/artifact/out/patch-findbugs-hadoop-ozone_objectstore-service.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-753/2/artifact/out/patch-unit-hadoop-ozone_objectstore-service.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-753/2/testReport/
Max. process+thread count 336 (vs. ulimit of 5500)
modules C: hadoop-hdds/common hadoop-ozone/client hadoop-ozone/common hadoop-ozone/objectstore-service U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-753/2/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@arp7 arp7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with minor comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: By default

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 26 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
-1 test4tests 0 The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
0 mvndep 63 Maven dependency ordering for branch
+1 mvninstall 1029 trunk passed
+1 compile 983 trunk passed
+1 checkstyle 139 trunk passed
+1 mvnsite 192 trunk passed
+1 shadedclient 1058 branch has no errors when building and testing our client artifacts.
+1 findbugs 253 trunk passed
+1 javadoc 171 trunk passed
_ Patch Compile Tests _
0 mvndep 23 Maven dependency ordering for patch
-1 mvninstall 20 client in the patch failed.
-1 mvninstall 20 objectstore-service in the patch failed.
+1 compile 918 the patch passed
+1 javac 918 the patch passed
+1 checkstyle 137 the patch passed
-1 mvnsite 37 objectstore-service in the patch failed.
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 2 The patch has no ill-formed XML file.
+1 shadedclient 679 patch has no errors when building and testing our client artifacts.
-1 findbugs 38 objectstore-service in the patch failed.
+1 javadoc 169 the patch passed
_ Other Tests _
+1 unit 82 common in the patch passed.
+1 unit 40 client in the patch passed.
+1 unit 48 common in the patch passed.
-1 unit 37 objectstore-service in the patch failed.
+1 asflicense 50 The patch does not generate ASF License warnings.
6612
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-753/3/artifact/out/Dockerfile
GITHUB PR #753
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux 16f011cc5d3b 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / b5dcf64
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-753/3/artifact/out/patch-mvninstall-hadoop-ozone_client.txt
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-753/3/artifact/out/patch-mvninstall-hadoop-ozone_objectstore-service.txt
mvnsite https://builds.apache.org/job/hadoop-multibranch/job/PR-753/3/artifact/out/patch-mvnsite-hadoop-ozone_objectstore-service.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-753/3/artifact/out/patch-findbugs-hadoop-ozone_objectstore-service.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-753/3/artifact/out/patch-unit-hadoop-ozone_objectstore-service.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-753/3/testReport/
Max. process+thread count 445 (vs. ulimit of 5500)
modules C: hadoop-hdds/common hadoop-ozone/client hadoop-ozone/common hadoop-ozone/objectstore-service U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-753/3/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@hanishakoneru
Copy link
Contributor Author

The test failures in CI are not related to this PR. Will merge the PR. Thank you @arp7 , @bshashikant and @mukul1987 for the reviews.

@hanishakoneru hanishakoneru merged commit 3758270 into apache:trunk Apr 26, 2019
shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
Author: Boris S <[email protected]>
Author: Boris S <[email protected]>
Author: Boris Shkolnik <[email protected]>

Reviewers: Prateek Maheshwari <[email protected]>

Closes apache#753 from sborya/UseSamazResetInKafka
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants