PhD defence by Niels Dalum Hansen
Web data mining for public health purposes
For a long time, public health events, such as disease incidence or vaccination uptake, have been monitored to keep track of the health status of the population. This has allowed public health officials to evaluate the effect of public health initiatives and to decide where resources for improving public health are best spent. As more and more of our activities move online, it is natural to ask our self if online resources can be used as tools to improve public health. In this thesis, we have taken advantage of access to online data combined with the excellent Danish national registries and used these resources to address some the interesting research questions that this new domain poses. In this interdisciplinary thesis, that combines computer science and public health, scientific contributions are made both with respect to methodology and with respect to novel applications. The contributions are as follows:
New approaches for predicting public health events from web mined data. These include: (i) An online learning method that can automatically adapt to sudden temporal changes in the underlying signal. Health events often show temporal stability through many years and historical data is therefore often a good predictor, but in the case of sudden changes, this assumption no longer holds. Our online learning method aims at addressing this problem by automatically adjusting to temporal changes. (ii) Prediction models factoring event seasonality. We show how the expected seasonal variation can be used to optimize the usage of web mined data. (iii) Novel web data mining strategies that make it possible to target different population groups and reduce spurious correlations.
Novel applications of web mined data. These cover: (i) Prediction using web mined data of health events, such as preventive measures and drug consumption. Prior research has primarily focused on prediction of contagious diseases, but public health institutions are also responsible for monitoring several other types of health events. Our extensions to preventive measures and drug consumption show that the potential of web mined data is far from fully utilized. (ii) Understanding the relationship between news media and vaccination uptake. With the constant availability of news, both online and in print, understanding the effect of news media on public health events is important for designing accurate health monitoring systems. Additionally, increased understanding can be useful in a variety of public health tasks, e.g. designing outreach campaigns.
- Chairman: Associate Professor Sune Darkner, Department of Computer Science, University of Copenhagen, Denmark
- Professor Lone Simonsen, The George Washington University, USA
- Assistant Professor Mauricio Santillana, Harvard Institute for Applied Computational Science, USA
- Associate Professor Christina Lioma, Department of Computer Science, University of Copenhagen, Denmark
- Professor Ingemar J. Cox, Department of Computer Science, University of Copenhagen, Denmark
- Kåre Mølbak, Director M.D.,Statens Serum Institut, Copenhagen, Denmark
For an electronic copy of the thesis, please contact firstname.lastname@example.org