TY - GEN
T1 - PT-CapsNet
T2 - 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
AU - Pan, Chenbin
AU - Velipasalar, Senem
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Capsule Networks (CapsNets) create internal representations by parsing inputs into various instances at different resolution levels via a two-phase process - part-whole transformation and hierarchical component routing. Since both of these internal phases are computationally expensive, CapsNet have not found wider use. Existing variations of CapsNets mainly focus on performance comparison with the original CapsNet, and have not outperformed CNN-based models on complex tasks. To address the limitations of the existing CapsNet structures, we propose a novel Prediction-Tuning Capsule Network (PT-CapsNet), and also introduce fully connected PT-Capsules (FC-PT-Caps) and locally connected PT-Capsules (LC-PT-Caps). Different from existing CapsNet structures, our proposed model (i) allows the use of capsules for more difficult vision tasks and provides wider applicability; and (ii) provides better than or comparable performance to CNN-based baselines on these complex tasks. In our experiments, we show robustness to affine transformations, as well as the lightweight and scalability of PT-CapsNet via constructing larger and deeper networks and performing comparisons on classification, semantic segmentation and object detection tasks. The results show consistent performance improvement and significant parameter reduction compared to various baseline models. Code is available at https://github.com/Christinepan881/PT-CapsNet.git.
AB - Capsule Networks (CapsNets) create internal representations by parsing inputs into various instances at different resolution levels via a two-phase process - part-whole transformation and hierarchical component routing. Since both of these internal phases are computationally expensive, CapsNet have not found wider use. Existing variations of CapsNets mainly focus on performance comparison with the original CapsNet, and have not outperformed CNN-based models on complex tasks. To address the limitations of the existing CapsNet structures, we propose a novel Prediction-Tuning Capsule Network (PT-CapsNet), and also introduce fully connected PT-Capsules (FC-PT-Caps) and locally connected PT-Capsules (LC-PT-Caps). Different from existing CapsNet structures, our proposed model (i) allows the use of capsules for more difficult vision tasks and provides wider applicability; and (ii) provides better than or comparable performance to CNN-based baselines on these complex tasks. In our experiments, we show robustness to affine transformations, as well as the lightweight and scalability of PT-CapsNet via constructing larger and deeper networks and performing comparisons on classification, semantic segmentation and object detection tasks. The results show consistent performance improvement and significant parameter reduction compared to various baseline models. Code is available at https://github.com/Christinepan881/PT-CapsNet.git.
UR - http://www.scopus.com/inward/record.url?scp=85123421846&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123421846&partnerID=8YFLogxK
U2 - 10.1109/ICCV48922.2021.01178
DO - 10.1109/ICCV48922.2021.01178
M3 - Conference contribution
AN - SCOPUS:85123421846
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 11976
EP - 11985
BT - Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 October 2021 through 17 October 2021
ER -