Data warehousing in the cloud: computer scientists needed

Principal Consultants Kennie Nybo Pontoppidan, Peter Hansen and Morten Lobedanz Sørensen from Rehfeld gives this double talk. Everybody is welcome.

Warning: there might be bits and bytes…

Five years ago not many of us dreamed that office applications would be available in the cloud. Now we use all Google Apps or Office 365 in one way or another. In Rehfeld we work with Business Intelligence and the foundation for this is a sound data warehouse. We believe that data warehousing in the cloud will become a next big thing within the next five years, but before then we need your help. The traditional performance bottlenecks in data warehousing is usually I/O bound. Data warehousing in the cloud will suffer from different performance bottlenecks, probably network bandwidth and latency. Because of this we might also need new semantic models for eventual consistency, perhaps with QoS on when data is available after replication.

The first part of today's talk will be an introduction to a traditional data warehouse architecture, including ETL processes (Extract, Transform, Load). Then in the second part we will introduce the challenges of ETL processes when your source systems are in the cloud. We hope that the audience will participate in joint discussions of potential research projects and/or research areas for computer scientist students and staff. We believe that computer science can bring something interesting to the table in an area which until recently was very well understood.