You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/learn/documentation/versioned/aws/kinesis.md
+62-42Lines changed: 62 additions & 42 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
layout: page
3
-
title: Connecting to Kinesis
3
+
title: Kinesis Connector
4
4
---
5
5
<!--
6
6
Licensed to the Apache Software Foundation (ASF) under one or more
@@ -19,86 +19,106 @@ title: Connecting to Kinesis
19
19
limitations under the License.
20
20
-->
21
21
22
-
You can configure your Samza jobs to process data from [AWS Kinesis](https://aws.amazon.com/kinesis/data-streams), Amazon's data streaming service. A `Kinesis data stream` is similar to a Kafka topic and can have multiple partitions. Each message consumed from a Kinesis data stream is an instance of [Record](http://docs.aws.amazon.com/goto/WebAPI/kinesis-2013-12-02/Record).
22
+
## Overview
23
23
24
-
### Consuming from Kinesis:
24
+
The Samza Kinesis connector provides access to [Amazon Kinesis Data Streams](https://aws.amazon.com/kinesis/data-streams),
25
+
Amazon’s data streaming service. A Kinesis Data Stream is similar to a Kafka topic and can have multiple partitions.
26
+
Each message consumed from a Kinesis Data Stream is an instance of [Record](http://docs.aws.amazon.com/goto/WebAPI/kinesis-2013-12-02/Record).
wraps the Record into a [KinesisIncomingMessageEnvelope](https://github.com/apache/samza/blob/master/samza-aws/src/main/java/org/apache/samza/system/kinesis/consumer/KinesisIncomingMessageEnvelope.java).
25
29
26
-
Samza's [KinesisSystemConsumer](https://github.com/apache/samza/blob/master/samza-aws/src/main/java/org/apache/samza/system/kinesis/consumer/KinesisSystemConsumer.java) wraps the Record into a [KinesisIncomingMessageEnvelope](https://github.com/apache/samza/blob/master/samza-aws/src/main/java/org/apache/samza/system/kinesis/consumer/KinesisIncomingMessageEnvelope.java). The key of the message is set to partition key of the Record. The message is obtained from the Record body.
30
+
## Consuming from Kinesis
27
31
28
-
To configure Samza to consume from Kinesis streams:
32
+
### Basic Configuration
33
+
34
+
You can configure your Samza jobs to process data from Kinesis Streams. To configure Samza job to consume from Kinesis
35
+
streams, please add the below configuration:
29
36
30
37
{% highlight jproperties %}
31
-
#define a kinesis system factory with your identifier. eg: kinesis-system
38
+
// define a kinesis system factory with your identifier. eg: kinesis-system
You can configure any [AWS client config](http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html)
61
+
with the prefix **systems.system-name.aws.clientConfig.***
51
62
52
-
You can configure any [AWS client config](http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html) with the prefix `system.system-name.aws.clientConfig.*`
(KCL) to access the Kinesis data streams. You can set any [Kinesis Client Lib Configuration](https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client-multilang/src/main/java/software/amazon/kinesis/coordinator/KinesisClientLibConfiguration.java)
77
+
for a stream by configuring it under **systems.system-name.streams.stream-name.aws.kcl.***
64
78
65
-
Similarly, you can set any [Kinesis Client Library config](https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/KinesisClientLibConfiguration.java) for a stream by configuring it under `systems.system-name.streams.stream-name.aws.kcl.*`
As an example, to reset the checkpoint and set the starting position for a stream:
83
+
Obtain the config param from the public functions in [Kinesis Client Lib Configuration](https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client-multilang/src/main/java/software/amazon/kinesis/coordinator/KinesisClientLibConfiguration.java)
84
+
by removing the *"with"* prefix. For example: config param corresponding to **withTableName()** is **TableName**.
85
+
86
+
### Resetting Offsets
87
+
88
+
The source of truth for checkpointing while using Kinesis Connector is not the Samza checkpoint topic but Kinesis itself.
89
+
The Kinesis Client Library (KCL) [uses DynamoDB](https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-ddb.html)
90
+
to store it’s checkpoints. By default, Kinesis Connector reads from the latest offset in the stream.
91
+
92
+
To reset the checkpoints and consume from earliest/latest offset of a Kinesis data stream, please change the KCL TableName
93
+
and set the appropriate starting position for the stream as shown below.
94
+
71
95
{% highlight jproperties %}
96
+
// change the TableName to a unique name to reset checkpoint.
To manipulate checkpoints to start from a particular position in the Kinesis stream, in lieu of Samza CheckpointTool,
103
+
please login to the AWS Console and change the offsets in the DynamoDB Table with the table name that you have specified
104
+
in the config above. By default, the table name has the following format:
105
+
"\<job name\>-\<job id\>-\<kinesis stream\>".
78
106
79
-
The following limitations apply for Samza jobs consuming from Kinesis streams using the Samza consumer:
80
-
* Stateful processing (eg: windows or joins) is not supported on Kinesis streams. However, you can accomplish this by chaining two Samza jobs where the first job reads from Kinesis and sends to Kafka while the second job processes the data from Kafka.
81
-
* Kinesis streams cannot be configured as [bootstrap](https://samza.apache.org/learn/documentation/latest/container/streams.html) or [broadcast](https://samza.apache.org/learn/documentation/latest/container/samza-container.html) streams.
82
-
* Kinesis streams must be used with the [AllSspToSingleTaskGrouperFactory](https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/container/grouper/stream/AllSspToSingleTaskGrouperFactory.java). No other grouper is supported.
83
-
* A Samza job that consumes from Kinesis cannot consume from any other input source. However, you can send your results to any destination (eg: Kafka, EventHubs), and have another Samza job consume them.
107
+
### Known Limitations
84
108
85
-
### Producing to Kinesis:
109
+
The following limitations apply to Samza jobs consuming from Kinesis streams using the Samza consumer:
86
110
87
-
The KinesisSystemProducer for Samza is not yet implemented.
88
-
89
-
### How to configure Samza job to consume from Kinesis data stream ?
90
-
91
-
This tutorial uses [hello samza](../../../startup/hello-samza/{{site.version}}/) to illustrate running a Samza job on Yarn that consumes from Kinesis. We will use the [KinesisHelloSamza](https://github.com/apache/samza-hello-samza/blob/master/src/main/java/samza/examples/kinesis/KinesisHelloSamza.java) example.
92
-
93
-
#### Update properties file
111
+
- Stateful processing (eg: windows or joins) is not supported on Kinesis streams. However, you can accomplish this by
112
+
chaining two Samza jobs where the first job reads from Kinesis and sends to Kafka while the second job processes the
113
+
data from Kafka.
114
+
- Kinesis streams cannot be configured as [bootstrap](https://samza.apache.org/learn/documentation/latest/container/streams.html)
115
+
or [broadcast](https://samza.apache.org/learn/documentation/latest/container/samza-container.html) streams.
116
+
- Kinesis streams must be used ONLY with the [AllSspToSingleTaskGrouperFactory](https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/container/grouper/stream/AllSspToSingleTaskGrouperFactory.java)
117
+
as the Kinesis consumer does the partition management by itself. No other grouper is supported.
118
+
- A Samza job that consumes from Kinesis cannot consume from any other input source. However, you can send your results
119
+
to any destination (eg: Kafka, EventHubs), and have another Samza job consume them.
94
120
95
-
Update the following properties in the kinesis-hello-samza.properties file:
The KinesisSystemProducer for Samza is not yet implemented.
103
124
104
-
Now, you are ready to run your Samza application on Yarn as described [here](../../../startup/hello-samza/{{site.version}}/). Check the log file for messages read from your Kinesis stream.
0 commit comments