S4 is a distributed stream processing platform from Yahoo. It is often seen as the real-time counterpart of Hadoop. S4 being fault tolerant and horizontally scalable helps you in building very large stream processing application that can do anything from detecting earthquakes to finding that perfect bit of advertising that the visitor on your website is most likely to click.
At its core, an S4 application consists of a number of Processing Elements (PEs) that are wired together with the help of a spring configuration file that defines the PEs and the flow of events in the system. Also, events are produced by event producers that listen that sends these events to the client adapter for S4, from where, the S4 platform takes over and dispatch it to appropriate processing elements. After processing these events, PEs can choose to dispatch them to other PEs for further processing or they can choose to produce output events. Thus, arbitrarily complex behavior can be derived together by wiring a simple set of PEs.
S4 comes with a few example applications, but here is a much simpler S4WordCount application that shows how to:
- Keep state in a PE.
- Dispatch events from a PE.
- Process multiple events from a single PE.
- Write a simple java client for sending events to S4.