Double Graph Attention Networks for Visual Semantic Navigation

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Double Graph Attention Networks for Visual Semantic Navigation. / Lyu, Yunlian; Talebi, Mohammad Sadegh.

In: Neural Processing Letters, Vol. 55, No. 7, 2023, p. 9019-9040.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Lyu, Y & Talebi, MS 2023, 'Double Graph Attention Networks for Visual Semantic Navigation', Neural Processing Letters, vol. 55, no. 7, pp. 9019-9040. https://doi.org/10.1007/s11063-023-11190-8

APA

Lyu, Y., & Talebi, M. S. (2023). Double Graph Attention Networks for Visual Semantic Navigation. Neural Processing Letters, 55(7), 9019-9040. https://doi.org/10.1007/s11063-023-11190-8

Vancouver

Lyu Y, Talebi MS. Double Graph Attention Networks for Visual Semantic Navigation. Neural Processing Letters. 2023;55(7):9019-9040. https://doi.org/10.1007/s11063-023-11190-8

Author

Lyu, Yunlian ; Talebi, Mohammad Sadegh. / Double Graph Attention Networks for Visual Semantic Navigation. In: Neural Processing Letters. 2023 ; Vol. 55, No. 7. pp. 9019-9040.

Bibtex

@article{6381a3a6ed0046179465c12e4051c167,
title = "Double Graph Attention Networks for Visual Semantic Navigation",
abstract = "Artificial Intelligence (AI) based on knowledge graphs has been invested in realizing human intelligence like thinking, learning, and logical reasoning. It is a great promise to make AI-based systems not only intelligent but also knowledgeable. In this paper, we investigate knowledge graph based visual semantic navigation using deep reinforcement learning, where an agent reasons actions against targets specified by text words in indoor scenes. The agent perceives its surroundings through egocentric RGB views and learns via trial-and-error. The fundamental problem of visual navigation is efficient learning across different targets and scenes. To obtain an empirical model, we propose a spatial attention model with knowledge graphs, DGVN, which combines both semantic information about observed objects and spatial information about their locations. Our spatial attention model is constructed based on interactions between a 3D global graph and local graphs. The two graphs we adopted encode the spatial relationships between objects and are expected to guide policy search effectively. With the knowledge graph and its robust feature representation using graph convolutional networks, we demonstrate that our agent is able to infer a more plausible attention mechanism for decision-making. Under several experimental metrics, our attention model is shown to achieve superior navigation performance in the AI2-THOR environment.",
keywords = "Deep reinforcement learning, Graph convolutional networks, Knowledge graph, Spatial attention, Visual navigation",
author = "Yunlian Lyu and Talebi, {Mohammad Sadegh}",
note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.",
year = "2023",
doi = "10.1007/s11063-023-11190-8",
language = "English",
volume = "55",
pages = "9019--9040",
journal = "Neural Processing Letters",
issn = "1370-4621",
publisher = "Springer Netherlands",
number = "7",

}

RIS

TY - JOUR

T1 - Double Graph Attention Networks for Visual Semantic Navigation

AU - Lyu, Yunlian

AU - Talebi, Mohammad Sadegh

N1 - Publisher Copyright: © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

PY - 2023

Y1 - 2023

N2 - Artificial Intelligence (AI) based on knowledge graphs has been invested in realizing human intelligence like thinking, learning, and logical reasoning. It is a great promise to make AI-based systems not only intelligent but also knowledgeable. In this paper, we investigate knowledge graph based visual semantic navigation using deep reinforcement learning, where an agent reasons actions against targets specified by text words in indoor scenes. The agent perceives its surroundings through egocentric RGB views and learns via trial-and-error. The fundamental problem of visual navigation is efficient learning across different targets and scenes. To obtain an empirical model, we propose a spatial attention model with knowledge graphs, DGVN, which combines both semantic information about observed objects and spatial information about their locations. Our spatial attention model is constructed based on interactions between a 3D global graph and local graphs. The two graphs we adopted encode the spatial relationships between objects and are expected to guide policy search effectively. With the knowledge graph and its robust feature representation using graph convolutional networks, we demonstrate that our agent is able to infer a more plausible attention mechanism for decision-making. Under several experimental metrics, our attention model is shown to achieve superior navigation performance in the AI2-THOR environment.

AB - Artificial Intelligence (AI) based on knowledge graphs has been invested in realizing human intelligence like thinking, learning, and logical reasoning. It is a great promise to make AI-based systems not only intelligent but also knowledgeable. In this paper, we investigate knowledge graph based visual semantic navigation using deep reinforcement learning, where an agent reasons actions against targets specified by text words in indoor scenes. The agent perceives its surroundings through egocentric RGB views and learns via trial-and-error. The fundamental problem of visual navigation is efficient learning across different targets and scenes. To obtain an empirical model, we propose a spatial attention model with knowledge graphs, DGVN, which combines both semantic information about observed objects and spatial information about their locations. Our spatial attention model is constructed based on interactions between a 3D global graph and local graphs. The two graphs we adopted encode the spatial relationships between objects and are expected to guide policy search effectively. With the knowledge graph and its robust feature representation using graph convolutional networks, we demonstrate that our agent is able to infer a more plausible attention mechanism for decision-making. Under several experimental metrics, our attention model is shown to achieve superior navigation performance in the AI2-THOR environment.

KW - Deep reinforcement learning

KW - Graph convolutional networks

KW - Knowledge graph

KW - Spatial attention

KW - Visual navigation

U2 - 10.1007/s11063-023-11190-8

DO - 10.1007/s11063-023-11190-8

M3 - Journal article

AN - SCOPUS:85149465498

VL - 55

SP - 9019

EP - 9040

JO - Neural Processing Letters

JF - Neural Processing Letters

SN - 1370-4621

IS - 7

ER -

ID: 340545050