Talk by Tiziano De Matteis


Parallel and Elastic Operator for Data Stream Processing


Nowadays,  sensors,  social network interactions and heterogeneous devices interconnected in the "Internet of Things" are continuously producing unbounded streams of data. In Data Stream Processing applications, this flow of information must be gathered and analyzed “on the fly” in order to produce timely responses.

In this talk, we will discuss solutions on how to efficiently exploit parallelism and elasticity in Data Stream Processing Operators, which represent the building blocks of this kind of applications.

For the former aspect, we will show how Parallel Patterns may be a viable solution for guaranteeing throughput and/or latency requirements imposed by the user. Parallel Patterns are parallelization schema for recurrent problems, that the programmer can easily instantiate. In this way, he has to specify only the functional logic of the operator while all the implementation details are hidden and completely encapsulated in the used programming tool and runtime support.

On the other hand, elasticity is becoming mandatory since these applications are affected by highly variable arrival rates and changes in their workload characteristics. Starting from a well-known parallelization structure, we can devise "predictive strategies" for an autonomous operator. We will discuss solutions based on the Model Predictive Control approach, those present important properties such as performance requirement assurances, stability and resource awareness.


My research interests are mainly in High-Performance Computing, related to parallel programming methodologies and tools, run-time supports and Data Stream Processing. For my Ph.D. research, I’m working on Data Stream Processing (briefly DaSP). In similar applications parallelism is unavoidable due the presence of multiple input streams characterized by high volume, high velocity and high variability. Moreover, one of the characterizing aspects of these applications is their long running nature (24hr/7d). Their workload and input rate may exhibit wide variations that need to be sustained in order to provide the needed QoS without interruptions. The aim of my Ph.D. work is therefore to study these problems in an integrated way, providing to the programmers a methodological framework for the development of Parallel and Elastic DaSP applications, through the use of parallel patterns and/or algorithmic skeletons and Model Predictive Control techiniques.