Learning deep representations for ground-to-aerial geolocalization
Publikation: Bidrag til tidsskrift › Konferenceartikel › Forskning › fagfællebedømt
Standard
Learning deep representations for ground-to-aerial geolocalization. / Lin, Tsung Yi; Cui, Yin; Belongie, Serge; Hays, James.
I: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 14.10.2015, s. 5007-5015.Publikation: Bidrag til tidsskrift › Konferenceartikel › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Learning deep representations for ground-to-aerial geolocalization
AU - Lin, Tsung Yi
AU - Cui, Yin
AU - Belongie, Serge
AU - Hays, James
N1 - Publisher Copyright: © 2015 IEEE.
PY - 2015/10/14
Y1 - 2015/10/14
N2 - The recent availability of geo-tagged images and rich geospatial data has inspired a number of algorithms for image based geolocalization. Most approaches predict the location of a query image by matching to ground-level images with known locations (e.g., street-view data). However, most of the Earth does not have ground-level reference photos available. Fortunately, more complete coverage is provided by oblique aerial or 'bird's eye' imagery. In this work, we localize a ground-level query image by matching it to a reference database of aerial imagery. We use publicly available data to build a dataset of 78K aligned crossview image pairs. The primary challenge for this task is that traditional computer vision approaches cannot handle the wide baseline and appearance variation of these cross-view pairs. We use our dataset to learn a feature representation in which matching views are near one another and mismatched views are far apart. Our proposed approach, Where-CNN, is inspired by deep learning success in face verification and achieves significant improvements over traditional hand-crafted features and existing deep features learned from other large-scale databases. We show the effectiveness of Where-CNN in finding matches between street view and aerial view imagery and demonstrate the ability of our learned features to generalize to novel locations.
AB - The recent availability of geo-tagged images and rich geospatial data has inspired a number of algorithms for image based geolocalization. Most approaches predict the location of a query image by matching to ground-level images with known locations (e.g., street-view data). However, most of the Earth does not have ground-level reference photos available. Fortunately, more complete coverage is provided by oblique aerial or 'bird's eye' imagery. In this work, we localize a ground-level query image by matching it to a reference database of aerial imagery. We use publicly available data to build a dataset of 78K aligned crossview image pairs. The primary challenge for this task is that traditional computer vision approaches cannot handle the wide baseline and appearance variation of these cross-view pairs. We use our dataset to learn a feature representation in which matching views are near one another and mismatched views are far apart. Our proposed approach, Where-CNN, is inspired by deep learning success in face verification and achieves significant improvements over traditional hand-crafted features and existing deep features learned from other large-scale databases. We show the effectiveness of Where-CNN in finding matches between street view and aerial view imagery and demonstrate the ability of our learned features to generalize to novel locations.
UR - http://www.scopus.com/inward/record.url?scp=84959245070&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2015.7299135
DO - 10.1109/CVPR.2015.7299135
M3 - Conference article
AN - SCOPUS:84959245070
SP - 5007
EP - 5015
JO - I E E E Conference on Computer Vision and Pattern Recognition. Proceedings
JF - I E E E Conference on Computer Vision and Pattern Recognition. Proceedings
SN - 1063-6919
T2 - IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
Y2 - 7 June 2015 through 12 June 2015
ER -
ID: 301829041