CSGB Seminar on 'Large-scale network-guided feature selection for precision medicine'

Seminar by Chloé-Agathe Azencott from MINES ParisTech at 1 pm on 'Large-scale network-guided feature selection for precision medicine'.


Differences in disease predisposition or response to treatment can be explained in great part by genomic differences between individuals. This realization has given birth to precision medicine, where treatment is tailored to the genome of patients. Precision medicine (also sometimes referred to as “personalized medicine”) relies on identifying the similarities, at the molecular level, between patients who exhibit similar susceptibilities,  prognoses, or responses to treatment. For this purpose, it depends on collecting considerable amounts of molecular data for large numbers of individuals, which is being enabled by thriving developments in genome sequencing and other high-throughput experimental technologies.

Unfortunately, we still lack effective methods to reliably detect, from this data, which of these genomic descriptors (or features) determine a phenotype such as disease predisposition or response to treatment.

Indeed, the number of features that can be measured is large (easily reaching tens of millions) with respect to the number of samples for which they can be collected (more usually of the order of hundreds or thousands), posing both computational and statistical difficulties. This typically results in highly unstable feature selection algorithms, often yielding widely different results for different sets of samples relating to the same phenotype.

One way to address this problem is to reduce the dimensionality of the feature space by imposing structure on it. This structure can be derived from prior biological knowledge, and in particular from biological networks. I will present a formulation of network-constrained feature selection, based on the maximization of feature importance and on network regularization. The resulting optimization scheme is exact and efficient, thanks to a maximum flow reformulation, and it has been empirically shown to recover more causal features than its regression-based counterparts. I will then discuss several extensions of this framework.


Chloé-Agathe Azencott is a junior research faculty at Mines ParisTech (Paris, France). She belongs to the Centre for Computational Biology, a joint research group between Mines ParisTech, Institut Curie and INSERM focusing on bioinformatics for cancer research. She holds a PhD in computer science from University of California, Irvine (USA), which she obtained in 2010. From 2011 to 2013 she was a postdoctoral fellow in the Machine Learning for Computational Biology research group of the Max Planck Institutes for Developmental Biology and Intelligent Systems in Tuebingen (Germany). Her research interests revolve around developing machine learning approaches for therapeutic research. This ranges from chemoinformatics methods for drug discovery to the analysis of large-scale, heterogeneous, whole-genome data for precision medicine.

For more details see http://cazencott.info