S4 is a distributed stream processing platform from Yahoo. It is often seen as the real-time counterpart of Hadoop. S4 being fault tolerant and horizontally scalable helps you in building very large stream processing application that can do anything from detecting earthquakes to finding that perfect bit of advertising that the visitor on your website is most likely to click.
At its core, an S4 application consists of a number of Processing Elements (PEs) that are wired together with the help of a spring configuration file that defines the PEs and the flow of events in the system. Also, events are produced by event producers that listen that sends these events to the client adapter for S4, from where, the S4 platform takes over and dispatch it to appropriate processing elements. After processing these events, PEs can choose to dispatch them to other PEs for further processing or they can choose to produce output events. Thus, arbitrarily complex behavior can be derived together by wiring a simple set of PEs.
S4 comes with a few example applications, but here is a much simpler S4WordCount application that shows how to:
- Keep state in a PE.
- Dispatch events from a PE.
- Process multiple events from a single PE.
- Write a simple java client for sending events to S4.
Java is known to be a verbose language and the situation worsens when you step into bloated enterprise java world. You need to write tons of code and configure a lot of JXXX to make your simple webapp work. Though the situation is improving in the recent years with the introduction of convention over configuration
I’ve just started experimenting with Scheme, specifically to follow Structure and Interpretation of Computer Programs. Here is a Scheme procedure that returns the sum of squares of two larger numbers out of the given three.