You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Optimizely is a world’s leading experimentation platform, enabling businesses to deliver continuous experimentation and personalization across websites, mobile apps and connected devices. At Optimizely, billions of events are tracked on a daily basis. Session metrics are among the key metrics provided to their end user in real time. Prior to introducing Samza for realtime computation, the engineering team at Optimizely used HBase to store and serve experimentation data, and Druid for personalization data including session metrics. As business requirements evolved, the Druid-based solution became more and more challenging.
30
+
Optimizely is a world’s leading experimentation platform, enabling businesses to
31
+
deliver continuous experimentation and personalization across websites, mobile
32
+
apps and connected devices. At Optimizely, billions of events are tracked on a
33
+
daily basis. Session metrics are among the key metrics provided to their end user
34
+
in real time. Prior to introducing Samza for their realtime computation, the
35
+
engineering team at Optimizely built their data-pipeline using a complex
[Druid and Hbase] (https://medium.com/engineers-optimizely/building-a-scalable-data-pipeline-bfe3f531eb38).
38
+
As business requirements evolve, this solution became more and more challenging.
31
39
32
-
- Long delays in session metrics caused by M/R jobs
33
-
- Reprocessing of events due to inability to incrementally update Druid index
34
-
- Difficulties in scaling dimensions and cardinality
35
-
- Queries expanding long time periods are expensive
36
-
37
-
The engineering team at Optimizely decided to move away from Druid and focus on HBase as the store, and introduced stream processing to pre-aggregate and deduplicate session events. They evaluated multiple stream processing platforms and chose Samza as their stream processing platform. In their solution, every session event is tagged with an identifier for up to 30 minutes; upon receiving a session event, the Samza job updates session metadata and aggregates counters for the session that is stored in a local RocksDB state store. At the end of each one-minute window, aggregated session metrics are ingested to HBase. With the new solution
40
+
The engineering team at Optimizely decided to move away from Druid and focus on
41
+
HBase as the store, and introduced stream processing to pre-aggregate and
42
+
deduplicate session events. In their solution, every session event is tagged
43
+
with an identifier for up to 30 minutes; upon receiving a session event, the
44
+
Samza job updates session metadata and aggregates counters for the session
45
+
that is stored in a local RocksDB state store. At the end of each one-minute
46
+
window, aggregated session metrics are ingested to HBase. With the new solution
38
47
39
48
- The median query latency was reduced from 40+ ms to 5 ms
40
49
- Session metrics are now available in realtime
@@ -44,15 +53,22 @@ The engineering team at Optimizely decided to move away from Druid and focus on
44
53
45
54
Here is a testimonial from Optimizely
46
55
47
-
“At Optimizely, we have built the world’s leading experimentation platform, which ingests billions of click-stream events a day from millions of visitors for analysis. Apache Samza has been a great asset to Optimizely's Event ingestion pipeline allowing us to perform large scale, real time stream computing such as aggregations (e.g. session computations) and data enrichment on a multiple billion events / day scale. The programming model, durability and the close integration with Apache Kafka fit our needs perfectly” said Vignesh Sukumar, Senior Engineering Manager at Optimizely”
56
+
“At Optimizely, we have built the world’s leading experimentation platform,
57
+
which ingests billions of click-stream events a day from millions of visitors
58
+
for analysis. Apache Samza has been a great asset to Optimizely's Event
59
+
ingestion pipeline allowing us to perform large scale, real time stream
60
+
computing such as aggregations (e.g. session computations) and data enrichment
61
+
on a multiple billion events / day scale. The programming model, durability
62
+
and the close integration with Apache Kafka fit our needs perfectly” said
63
+
Vignesh Sukumar, Senior Engineering Manager at Optimizely”
48
64
49
-
In addition, stream processing is also applied to other use cases such as data enrichment, event stream partitioning and metrics processing at Optimizely.
65
+
In addition, stream processing is also applied to other use cases such as
66
+
data enrichment, event stream partitioning and metrics processing at Optimizely.
0 commit comments