[SPARK-23366] Improve hot reading path in ReadAheadInputStream #20555

juliuszsompolski · 2018-02-09T03:59:59Z

What changes were proposed in this pull request?

ReadAheadInputStream was introduced in #18317 to optimize reading spill files from disk.
However, from the profiles it seems that the hot path of reading small amounts of data (like readInt) is inefficient - it involves taking locks, and multiple checks.

Optimize locking: Lock is not needed when simply accessing the active buffer. Only lock when needing to swap buffers or trigger async reading, or get information about the async state.

Optimize short-path single byte reads, that are used e.g. by Java library DataInputStream.readInt.

The asyncReader used to call "read" only once on the underlying stream, that never filled the underlying buffer when it was wrapping an LZ4BlockInputStream. If the buffer was returned unfilled, that would trigger the async reader to be triggered to fill the read ahead buffer on each call, because the reader would see that the active buffer is below the refill threshold all the time.

However, filling the full buffer all the time could introduce increased latency, so also add an AtomicBoolean flag for the async reader to return earlier if there is a reader waiting for data.

Remove readAheadThresholdInBytes and instead immediately trigger async read when switching the buffers. It allows to simplify code paths, especially the hot one that then only has to check if there is available data in the active buffer, without worrying if it needs to retrigger async read. It seems to have positive effect on perf.

How was this patch tested?

It was noticed as a regression in some workloads after upgrading to Spark 2.3.

It was particularly visible on TPCDS Q95 running on instances with fast disk (i3 AWS instances).
Running with profiling:
* Spark 2.2 - 5.2-5.3 minutes 9.5% in LZ4BlockInputStream.read
* Spark 2.3 - 6.4-6.6 minutes 31.1% in ReadAheadInputStream.read
* Spark 2.3 + fix - 5.3-5.4 minutes 13.3% in ReadAheadInputStream.read - very slightly slower, practically within noise.

We didn't see other regressions, and many workloads in general seem to be faster with Spark 2.3 (not investigated if thanks to async readed, or unrelated).

juliuszsompolski · 2018-02-09T04:01:46Z

cc @kiszk @sitalkedia @zsxwing

SparkQA · 2018-02-09T07:29:20Z

Test build #87243 has finished for PR 20555 at commit b26ffce.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-02-10T06:48:44Z

core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java

      while (readInProgress) {
+        isWaiting.set(true);
        asyncReadComplete.await();
+        isWaiting.set(false);


What happens if await() throws an exception? Is it ok not to update isWaiting?

Good catch, I added isWaiting.set(false) to the finally branch.
Actually, since the whole implementation assumes that there is only one reader, I removed the while() loop, since there is no other reader to race with us to trigger another read.

In practice I think not updating isWaiting it would have been benign, as after the exception the query will be going down with an InterruptedException, or elsewise anyone upstream handling that exception would most probably declare that stream as unusable afterwards anyway.

kiszk · 2018-02-10T07:07:02Z

core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java

+            return -1;
+          }
        }
+        // Swap the newly read read ahead buffer in place of empty active buffer.


Is it good to use read-ahead instead of read ahead in comments for ease of reading?

Other existing places in comments in the file use read ahead.

SparkQA · 2018-02-12T18:53:43Z

Test build #87339 has finished for PR 20555 at commit eaa6b4e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-02-12T18:56:42Z

Test build #87340 has finished for PR 20555 at commit 7238181.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-02-12T21:54:21Z

Also cc @jiangxb1987 @cloud-fan

SparkQA · 2018-02-12T22:22:29Z

Test build #87337 has finished for PR 20555 at commit ca45a88.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-02-12T23:42:04Z

Test build #87346 has finished for PR 20555 at commit d6d44fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-02-13T10:43:18Z

LGTM

cloud-fan · 2018-02-13T10:44:12Z

core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java

-        if (isEndOfStream()) {
-          return -1;
+        if (!readAheadBuffer.hasRemaining()) {
+          // The first read or activeBuffer is skipped.


unrelated question: what does activeBuffer is skipped mean?

skipped using skip().
I moved the comment over from a few lines above, but looking at skip() now I don't think it can happen - the skip would trigger an readAsync read in that case.
I'll update the comment.

cloud-fan · 2018-02-13T12:11:54Z

cc @sameeragarwal @sitalkedia , shall we have this in Spark 2.3?

jiangxb1987

LGTM

jiangxb1987 · 2018-02-13T12:30:25Z

It would also be great if we can add some unit tests on the read ahead stream model.

sitalkedia

LGTM. Thanks for fixing this @juliuszsompolski .

juliuszsompolski · 2018-02-13T19:53:32Z

@jiangxb1987 there is ReadAheadInputStreamSuite that extends GenericFileInputStreamSuite.
I updated it and added more combination testing with different buffer sizes that should exercise more interactions between the wrapped and outer buffers.

SparkQA · 2018-02-13T22:20:38Z

Test build #87415 has finished for PR 20555 at commit 5273176.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-02-13T23:23:15Z

Test build #87416 has finished for PR 20555 at commit 62cefcd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2018-02-13T21:31:57Z

core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java

+    isWaiting.set(true);
    try {
-      while (readInProgress) {
+      if (readInProgress) {


The while loop here is to handle spurious wakeup.

Good catch, thanks!

zsxwing · 2018-02-13T23:30:29Z

core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java

-  // we issue an async read from the underlying input stream.
-  private final int readAheadThresholdInBytes;
+  // whether there is a reader waiting for data.
+  private AtomicBoolean isWaiting = new AtomicBoolean(false);


You can just use volatile here

I'll leave it be - should compile to basically the same, and with using AtomicBoolean the intent seems more readable to me.

zsxwing · 2018-02-13T23:39:15Z

Looks pretty good. Left two minor comments.

SparkQA · 2018-02-13T23:41:38Z

Test build #87422 has finished for PR 20555 at commit 1b3e970.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2018-02-14T00:45:40Z

LGTM

cloud-fan · 2018-02-14T02:52:13Z

core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java

    stateChangeLock.lock();
+    isWaiting.set(true);
    try {
      while (readInProgress) {


shall we add a comment about spurious wakeup? Otherwise someone else may still mistakenly remove it in the future.

SparkQA · 2018-02-14T04:20:11Z

Test build #87435 has finished for PR 20555 at commit 52f4a7c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-02-14T06:42:17Z

Test build #87438 has finished for PR 20555 at commit b6852aa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987 · 2018-02-14T08:31:56Z

LGTM

cloud-fan · 2018-02-15T09:09:32Z

thanks, merging to master!

juliuszsompolski added 2 commits February 8, 2018 19:50

locking tweak

987f15c

fill the read ahead buffer

b26ffce

kiszk reviewed Feb 10, 2018

View reviewed changes

juliuszsompolski added 3 commits February 12, 2018 10:35

reset isWaiting after exception

ca45a88

remove waiting loop

eaa6b4e

add short path for skip

7238181

fix compilation

d6d44fc

cloud-fan reviewed Feb 13, 2018

View reviewed changes

jiangxb1987 approved these changes Feb 13, 2018

View reviewed changes

sitalkedia approved these changes Feb 13, 2018

View reviewed changes

juliuszsompolski added 3 commits February 13, 2018 11:16

update comment

5273176

update test suite with uneven buffer sizes

62cefcd

more testing combinations

1b3e970

zsxwing reviewed Feb 13, 2018

View reviewed changes

while loop against spurious wakeups

52f4a7c

cloud-fan reviewed Feb 14, 2018

View reviewed changes

add comment about spuriour wakeups

b6852aa

asfgit closed this in 7539ae5 Feb 15, 2018

[SPARK-23366] Improve hot reading path in ReadAheadInputStream #20555

[SPARK-23366] Improve hot reading path in ReadAheadInputStream #20555

Uh oh!

Conversation

juliuszsompolski commented Feb 9, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

juliuszsompolski commented Feb 9, 2018

Uh oh!

SparkQA commented Feb 9, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiszk Feb 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 12, 2018

Uh oh!

SparkQA commented Feb 12, 2018

Uh oh!

gatorsmile commented Feb 12, 2018

Uh oh!

SparkQA commented Feb 12, 2018

Uh oh!

SparkQA commented Feb 12, 2018

Uh oh!

cloud-fan commented Feb 13, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Feb 13, 2018

Uh oh!

jiangxb1987 left a comment

Choose a reason for hiding this comment

Uh oh!

jiangxb1987 commented Feb 13, 2018

Uh oh!

sitalkedia left a comment

Choose a reason for hiding this comment

Uh oh!

juliuszsompolski commented Feb 13, 2018

Uh oh!

SparkQA commented Feb 13, 2018

Uh oh!

SparkQA commented Feb 13, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zsxwing commented Feb 13, 2018

Uh oh!

SparkQA commented Feb 13, 2018

Uh oh!

zsxwing commented Feb 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 14, 2018

Uh oh!

SparkQA commented Feb 14, 2018

Uh oh!

jiangxb1987 commented Feb 14, 2018

Uh oh!

cloud-fan commented Feb 15, 2018

Uh oh!

Reviewers

Assignees

kiszk Feb 10, 2018 •

edited

Loading