Group and Pseudo-group Convolutional Neural Networks

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandling

Convolutional neural networks (CNN) have been shown to be efficient and effective in image analysis tasks due to their built-in locality and their weight-sharing property. Since convolutions are translation equivariant - a shift in the input results in the same shift in the output - the networks preserve translation symmetry. In fact, for the most common image analysis tasks, the space that the data live in - Euclidean space - is a principal homogeneous space to the translation group, it is, consequently, generic and convenient to apply CNNs to this type of data directly. For data modalities that are not measured in spaces like the Euclidean space, nonetheless, the translation equivariance is not immediately satisfied. For example, if we translate a signal on a 2D unit sphere, based on the path that is taken, the resulting signal at the destination will have different orientations. Hence, in this thesis, we focus on generalizing CNNs to more general group actions other than simply translation. We take inspiration from the classical path in literature for generalized CNNs. We first lift the data to groups, and then convolutions are performed on the groups via group actions, after which we project the functions back to the original space to perform tasks. In this sense, the data should be modeled in a way such that it is a function mapping from the homogeneous space of the group it is being lifted to. On the other hand, the group that the data are lifted to is not arbitrary. What kind of actions should be incorporated into the group? In this thesis, we explore the group actions in the most natural way - the actions should be associated with the possible motions that come with the data. In other words, the group action should encode whatever motions the data might have in reality such that the model can capture these motions and thus be resistant to variations in real-world data.
In this thesis, we choose Diffusion Weighted Magnetic Resonance Imaging (DWI) as the data and explore possible group actions that are natural to this type of data. DWI is a technique that captures anisotropies in the movement of molecules in tissues and is very useful in diagnoses of vascular strokes in the brain, among other diagnoses of diseases. It has a structure that differs from regular images - it provides 3-dimensional diffusion information at each voxel that can be encoded as a function on a unit sphere. Therefore, it provides a natural structure for generalized CNNs. The variations in the data - or in other words, symmetries in the data - are 3D rigid motions, which can be easily modeled mathematically, fully, or partly. Unlike existing methods in the literature that use irreducible representations that predefine function basis/filter banks for the spherical convolution, in this thesis, we do not impose predefined functions for the CNN task, and we aim at performing lifting and group convolution in a generic and lightweight way. Instead, the symmetries in the data are reflected by group actions that are the most natural for this type of data. We gradually incorporate more symmetries that are associated with the data and perform a segmentation task. With more symmetries incorporated, we see a clear increase in the performance of the task. Furthermore, it is shown that the more symmetries are reflected in the modeling, the more resistant the model is to variations in data.
ForlagDepartment of Computer Science, Faculty of Science, University of Copenhagen
Antal sider91
StatusUdgivet - 2022

ID: 312640181