Attention as activation

Datalogisk Institut

Attention as activation

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Standard

Attention as activation. / Dai, Yimian; Oehmcke, Stefan; Gieseke, Fabian; Wu, Yiquan; Barnard, Kobus.

Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition. IEEE, 2020. s. 4131-4136 9413020.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Harvard

Dai, Y, Oehmcke, S, Gieseke, F, Wu, Y & Barnard, K 2020, Attention as activation. i Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition., 9413020, IEEE, s. 4131-4136, 25th International Conference on Pattern Recognition, ICPR 2020, Virtual, Milan, Italien, 10/01/2021. https://doi.org/10.1109/ICPR48806.2021.9413020

APA

Dai, Y., Oehmcke, S., Gieseke, F., Wu, Y., & Barnard, K. (2020). Attention as activation. I Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition (s. 4131-4136). [9413020] IEEE. https://doi.org/10.1109/ICPR48806.2021.9413020

Vancouver

Dai Y, Oehmcke S, Gieseke F, Wu Y, Barnard K. Attention as activation. I Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition. IEEE. 2020. s. 4131-4136. 9413020 https://doi.org/10.1109/ICPR48806.2021.9413020

Author

Dai, Yimian ; Oehmcke, Stefan ; Gieseke, Fabian ; Wu, Yiquan ; Barnard, Kobus. / Attention as activation. Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition. IEEE, 2020. s. 4131-4136

Bibtex

@inproceedings{ae15c9d8d68d44b9ba62557719f85ead,

title = "Attention as activation",

abstract = "Activation functions and attention mechanisms are typically treated as having different purposes and have evolved differently. However, both concepts can be formulated as a nonlinear gating function. Inspired by their similarity, we propose a novel type of activation units called attentional activation (ATAC) units as a unification of activation functions and attention mechanisms. In particular, we propose a local channel attention module for the simultaneous non-linear activation and element-wise feature refinement, which locally aggregates point-wise cross-channel feature contexts. By replacing the well-known rectified linear units by such ATAC units in convolutional networks, we can construct fully attentional networks that perform significantly better with a modest number of additional parameters. We conducted detailed ablation studies on the ATAC units using several host networks with varying network depths to empirically verify the effectiveness and efficiency of the units. Furthermore, we compared the performance of the ATAC units against existing activation functions as well as other attention mechanisms on the CIFAR-10, CIFAR-100, and ImageNet datasets. Our experimental results show that networks constructed with the proposed ATAC units generally yield performance gains over their competitors given a comparable number of parameters.",

author = "Yimian Dai and Stefan Oehmcke and Fabian Gieseke and Yiquan Wu and Kobus Barnard",

year = "2020",

doi = "10.1109/ICPR48806.2021.9413020",

language = "English",

pages = "4131--4136",

booktitle = "Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition",

publisher = "IEEE",

note = "25th International Conference on Pattern Recognition, ICPR 2020 ; Conference date: 10-01-2021 Through 15-01-2021",

}

RIS

TY - GEN

T1 - Attention as activation

AU - Dai, Yimian

AU - Oehmcke, Stefan

AU - Gieseke, Fabian

AU - Wu, Yiquan

AU - Barnard, Kobus

PY - 2020

Y1 - 2020

N2 - Activation functions and attention mechanisms are typically treated as having different purposes and have evolved differently. However, both concepts can be formulated as a nonlinear gating function. Inspired by their similarity, we propose a novel type of activation units called attentional activation (ATAC) units as a unification of activation functions and attention mechanisms. In particular, we propose a local channel attention module for the simultaneous non-linear activation and element-wise feature refinement, which locally aggregates point-wise cross-channel feature contexts. By replacing the well-known rectified linear units by such ATAC units in convolutional networks, we can construct fully attentional networks that perform significantly better with a modest number of additional parameters. We conducted detailed ablation studies on the ATAC units using several host networks with varying network depths to empirically verify the effectiveness and efficiency of the units. Furthermore, we compared the performance of the ATAC units against existing activation functions as well as other attention mechanisms on the CIFAR-10, CIFAR-100, and ImageNet datasets. Our experimental results show that networks constructed with the proposed ATAC units generally yield performance gains over their competitors given a comparable number of parameters.

AB - Activation functions and attention mechanisms are typically treated as having different purposes and have evolved differently. However, both concepts can be formulated as a nonlinear gating function. Inspired by their similarity, we propose a novel type of activation units called attentional activation (ATAC) units as a unification of activation functions and attention mechanisms. In particular, we propose a local channel attention module for the simultaneous non-linear activation and element-wise feature refinement, which locally aggregates point-wise cross-channel feature contexts. By replacing the well-known rectified linear units by such ATAC units in convolutional networks, we can construct fully attentional networks that perform significantly better with a modest number of additional parameters. We conducted detailed ablation studies on the ATAC units using several host networks with varying network depths to empirically verify the effectiveness and efficiency of the units. Furthermore, we compared the performance of the ATAC units against existing activation functions as well as other attention mechanisms on the CIFAR-10, CIFAR-100, and ImageNet datasets. Our experimental results show that networks constructed with the proposed ATAC units generally yield performance gains over their competitors given a comparable number of parameters.

U2 - 10.1109/ICPR48806.2021.9413020

DO - 10.1109/ICPR48806.2021.9413020

M3 - Article in proceedings

AN - SCOPUS:85110521482

SP - 4131

EP - 4136

BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition

PB - IEEE

T2 - 25th International Conference on Pattern Recognition, ICPR 2020

Y2 - 10 January 2021 through 15 January 2021

ER -

ID: 286998008