Skip to content

Commit 125152e

Browse files
committed
updates formatting
1 parent ba1d26a commit 125152e

File tree

1 file changed

+58
-52
lines changed
  • hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws

1 file changed

+58
-52
lines changed

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md

Lines changed: 58 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -17,39 +17,42 @@
1717
This document explains the `S3PrefetchingInputStream` and the various components it uses.
1818

1919
This input stream implements prefetching and caching to improve read performance of the input
20-
stream. A high level overview of this feature was published in
21-
[Pinterest Engineering's blog post titled "Improving efficiency and reducing runtime using S3 read optimization"](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0)
22-
.
20+
stream.
21+
A high level overview of this feature was published in
22+
[Pinterest Engineering's blog post titled "Improving efficiency and reducing runtime using S3 read optimization"](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0).
2323

2424
With prefetching, the input stream divides the remote file into blocks of a fixed size, associates
25-
buffers to these blocks and then reads data into these buffers asynchronously. It also potentially
26-
caches these blocks.
25+
buffers to these blocks and then reads data into these buffers asynchronously.
26+
It also potentially caches these blocks.
2727

2828
### Basic Concepts
2929

3030
* **Remote File**: A binary blob of data stored on some storage device.
3131
* **Block File**: Local file containing a block of the remote file.
32-
* **Block**: A file is divided into a number of blocks. The size of the first n-1 blocks is same,
33-
and the size of the last block may be same or smaller.
34-
* **Block based reading**: The granularity of read is one block. That is, either an entire block is
35-
read and returned or none at all. Multiple blocks may be read in parallel.
32+
* **Block**: A file is divided into a number of blocks.
33+
The size of the first n-1 blocks is same, and the size of the last block may be same or smaller.
34+
* **Block based reading**: The granularity of read is one block.
35+
That is, either an entire block is read and returned or none at all.
36+
Multiple blocks may be read in parallel.
3637

3738
### Configuring the stream
3839

3940
|Property |Meaning |Default |
4041
|--- |--- |--- |
41-
|fs.s3a.prefetch.enabled |Enable the prefetch input stream |`true` |
42-
|fs.s3a.prefetch.block.size |Size of a block |`8M` |
43-
|fs.s3a.prefetch.block.count |Number of blocks to prefetch |`8` |
42+
|`fs.s3a.prefetch.enabled` |Enable the prefetch input stream |`true` |
43+
|`fs.s3a.prefetch.block.size` |Size of a block |`8M` |
44+
|`fs.s3a.prefetch.block.count` |Number of blocks to prefetch |`8` |
4445

4546
### Key Components
4647

4748
`S3PrefetchingInputStream` - When prefetching is enabled, S3AFileSystem will return an instance of
48-
this class as the input stream. Depending on the remote file size, it will either use
49+
this class as the input stream.
50+
Depending on the remote file size, it will either use
4951
the `S3InMemoryInputStream` or the `S3CachingInputStream` as the underlying input stream.
5052

5153
`S3InMemoryInputStream` - Underlying input stream used when the remote file size < configured block
52-
size. Will read the entire remote file into memory.
54+
size.
55+
Will read the entire remote file into memory.
5356

5457
`S3CachingInputStream` - Underlying input stream used when remote file size > configured block size.
5558
Uses asynchronous prefetching of blocks and caching to improve performance.
@@ -58,31 +61,31 @@ Uses asynchronous prefetching of blocks and caching to improve performance.
5861

5962
* Number of blocks in the remote file
6063
* Block size
61-
* State of each block (initially all blocks have state *NOT_READY*). Other states are: Queued,
62-
Ready, Cached.
64+
* State of each block (initially all blocks have state *NOT_READY*).
65+
Other states are: Queued, Ready, Cached.
6366

6467
`BufferData` - Holds the buffer and additional information about it such as:
6568

6669
* The block number this buffer is for
67-
* State of the buffer (Unknown, Blank, Prefetching, Caching, Ready, Done). Initial state of a buffer
68-
is blank.
70+
* State of the buffer (Unknown, Blank, Prefetching, Caching, Ready, Done).
71+
Initial state of a buffer is blank.
6972

7073
`CachingBlockManager` - Implements reading data into the buffer, prefetching and caching.
7174

72-
`BufferPool` - Manages a fixed sized pool of buffers. It’s used by `CachingBlockManager` to acquire
73-
buffers.
75+
`BufferPool` - Manages a fixed sized pool of buffers.
76+
It’s used by `CachingBlockManager` to acquire buffers.
7477

7578
`S3File` - Implements operations to interact with S3 such as opening and closing the input stream to
7679
the remote file in S3.
7780

78-
`S3Reader` - Implements reading from the stream opened by `S3File`. Reads from this input stream in
79-
blocks of 64KB.
81+
`S3Reader` - Implements reading from the stream opened by `S3File`.
82+
Reads from this input stream in blocks of 64KB.
8083

81-
`FilePosition` - Provides functionality related to tracking the position in the file. Also gives
82-
access to the current buffer in use.
84+
`FilePosition` - Provides functionality related to tracking the position in the file.
85+
Also gives access to the current buffer in use.
8386

84-
`SingleFilePerBlockCache` - Responsible for caching blocks to the local file system. Each cache
85-
block is stored on the local disk as a separate block file.
87+
`SingleFilePerBlockCache` - Responsible for caching blocks to the local file system.
88+
Each cache block is stored on the local disk as a separate block file.
8689

8790
### Operation
8891

@@ -98,9 +101,9 @@ in.read(buffer, 0, 3MB);
98101
in.read(buffer, 0, 2MB);
99102
```
100103

101-
When the first read is issued, there is no buffer in use yet. The `S3InMemoryInputStream` gets the
102-
data in this remote file by calling the `ensureCurrentBuffer()` method, which ensures that a buffer
103-
with data is available to be read from.
104+
When the first read is issued, there is no buffer in use yet.
105+
The `S3InMemoryInputStream` gets the data in this remote file by calling the `ensureCurrentBuffer()`
106+
method, which ensures that a buffer with data is available to be read from.
104107

105108
The `ensureCurrentBuffer()` then:
106109

@@ -114,8 +117,8 @@ The `ensureCurrentBuffer()` then:
114117

115118
The read operation now just gets the required bytes from the buffer in `FilePosition`.
116119

117-
When the second read is issued, there is already a valid buffer which can be used. Don’t do anything
118-
else, just read the required bytes from this buffer.
120+
When the second read is issued, there is already a valid buffer which can be used.
121+
Don’t do anything else, just read the required bytes from this buffer.
119122

120123
#### S3CachingInputStream
121124

@@ -131,37 +134,39 @@ in.read(buffer, 0, 5MB)
131134
in.read(buffer, 0, 8MB)
132135
```
133136

134-
For the first read call, there is no valid buffer yet. `ensureCurrentBuffer()` is called, and for
135-
the first `read()`, prefetch count is set as 1.
137+
For the first read call, there is no valid buffer yet.
138+
`ensureCurrentBuffer()` is called, and for the first `read()`, prefetch count is set as 1.
136139

137140
The current block (block 0) is read synchronously, while the blocks to be prefetched (block 1) is
138141
read asynchronously.
139142

140143
The `CachingBlockManager` is responsible for getting buffers from the buffer pool and reading data
141144
into them. This process of acquiring the buffer pool works as follows:
142145

143-
* The buffer pool keeps a map of allocated buffers and a pool of available buffers. The size of this
144-
pool is = prefetch block count + 1. If the prefetch block count is 8, the buffer pool has a size
145-
of 9.
146+
* The buffer pool keeps a map of allocated buffers and a pool of available buffers.
147+
The size of this pool is = prefetch block count + 1.
148+
If the prefetch block count is 8, the buffer pool has a size of 9.
146149
* If the pool is not yet at capacity, create a new buffer and add it to the pool.
147-
* If it’s at capacity, check if any buffers with state = done can be released. Releasing a buffer
148-
means removing it from allocated and returning it back to the pool of available buffers.
150+
* If it’s at capacity, check if any buffers with state = done can be released.
151+
Releasing a buffer means removing it from allocated and returning it back to the pool of available
152+
buffers.
149153
* If there are no buffers with state = done currently then nothing will be released, so retry the
150154
above step at a fixed interval a few times till a buffer becomes available.
151-
* If after multiple retries there are still no available buffers, release a buffer in the ready
152-
state. The buffer for the block furthest from the current block is released.
155+
* If after multiple retries there are still no available buffers, release a buffer in the ready state.
156+
The buffer for the block furthest from the current block is released.
153157

154158
Once a buffer has been acquired by `CachingBlockManager`, if the buffer is in a *READY* state, it is
155-
returned. This means that data was already read into this buffer asynchronously by a prefetch. If
156-
it’s state is *BLANK,* then data is read into it
157-
using `S3Reader.read(ByteBuffer buffer, long offset, int size).`
159+
returned.
160+
This means that data was already read into this buffer asynchronously by a prefetch.
161+
If it’s state is *BLANK,* then data is read into it using
162+
`S3Reader.read(ByteBuffer buffer, long offset, int size).`
158163

159-
For the second read call, `in.read(buffer, 0, 8MB)`, since the block sizes are of 8MB and only `5MB`
164+
For the second read call, `in.read(buffer, 0, 8MB)`, since the block sizes are of 8MB and only 5MB
160165
of block 0 has been read so far, 3MB of the required data will be read from the current block 0.
161166
Once all data has been read from this block, `S3CachingInputStream` requests the next block (
162-
block 1), which will already have been prefetched and so it can just start reading from it. Also,
163-
while reading from block 1 it will also issue prefetch requests for the next blocks. The number of
164-
blocks to be prefetched is determined by `fs.s3a.prefetch.block.count`.
167+
block 1), which will already have been prefetched and so it can just start reading from it.
168+
Also, while reading from block 1 it will also issue prefetch requests for the next blocks.
169+
The number of blocks to be prefetched is determined by `fs.s3a.prefetch.block.count`.
165170

166171
##### Random Reads
167172

@@ -175,12 +180,13 @@ in.seek(2MB)
175180
in.read(buffer, 0, 4MB)
176181
```
177182

178-
The `CachingInputStream` also caches prefetched blocks. This happens when a `seek()` is issued for
179-
outside the current block and the current block still has not been fully read.
183+
The `CachingInputStream` also caches prefetched blocks.
184+
This happens when a `seek()` is issued for outside the current block and the current block still has
185+
not been fully read.
180186

181187
For the above read sequence, when the `seek(10MB)` call is issued, block 0 has not been read
182188
completely so cache it as the caller will probably want to read from it again.
183189

184-
When `seek(2MB)` is called, the position is back inside block 0. The next read can now be satisfied
185-
from the locally cached block file, which is typically orders of magnitude faster than a network
186-
based read.
190+
When `seek(2MB)` is called, the position is back inside block 0.
191+
The next read can now be satisfied from the locally cached block file, which is typically orders of
192+
magnitude faster than a network based read.

0 commit comments

Comments
 (0)