Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot Learning

Chandhok, Shivam and Balasubramanian, Vineeth N (2021) Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot Learning. In: 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021, 5 January 2021 through 9 January 2021, Virtual, Online.

[img] Text
WACV_2021.pdf - Published Version
Available under License Creative Commons Attribution.

Download (1MB)


The performance of generative zero-shot methods mainly depends on the quality of generated features and how well the model facilitates knowledge transfer between visual and semantic domains. The quality of generated features is a direct consequence of the ability of the model to capture the several modes of the underlying data distribution. To address these issues, we propose a new two-level joint maximization idea to augment the generative network with an inference network during training which helps our model capture the several modes of the data and generate features that better represent the underlying data distribution. This provides strong cross-modal interaction for effective transfer of knowledge between visual and semantic domains. Furthermore, existing methods train the zero-shot classifier either on generated synthetic image features or latent embeddings produced by leveraging representation learning. In this work, we unify these paradigms into a single model which, in addition to synthesizing image features, also utilizes the representation learning capabilities of the inference network to provide discriminative features for the final zero-shot recognition task. We evaluate our approach on four benchmark datasets i.e. CUB, FLO, AWA1 and AWA2 against several state-of-the-art methods, and show its performance. We also perform ablation studies to analyze and understand our method more carefully for the Generalized Zero-shot Learning task. © 2021 IEEE.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Balasubramanian, Vineeth N
Item Type: Conference or Workshop Item (Paper)
Additional Information: In order to study the usefulness of using latent feature representations from intermediate layers of the inference network, we plot the representation before the softmax activation layer of the zero-shot classifier (recognition module) of our method and f-VAEGAN-D2, a recent state-of-the-art method, for the FLO dataset in Fig 2e. We visualize the representations for unseen classes (20 classes), since visualizing the seen classes (82 classes) can be cluttered due to their high number. Notice that the clusters for our method (right subfigure) are more compact than those of f-VAEGAN-D2 (left subfigure) for almost all classes. The clusters in f-VAEGAN-D2 show features from one class potentially leaking into other classes, which can result in misclassification. This is however improved in our approach. 6. Conclusions In this work, we propose a unified approach for the generalized zero-shot learning problem that uses a two-level adversarial learning strategy for tight visual-semantic coupling. We use adversarial learning at the level of individual generative and inference modules, as well as use a separate joint maximization constraint across the two modules. In addition, we also show that using the latent representation of intermediate layers of the inference network improves recognition performance. This helps our model unify existing latent representation and generative approaches in a single pipeline. Our contributions in this framework enable us to capture the several modes of the data distribution better and improve GZSL performance by providing stronger visual-semantic coupling. We conduct extensive experiments on four benchmark datasets and demonstrate the value of the proposed method across these fine-grained and coarse-grained datasets. Our future work will include coming up with other ways of performing the joint maximization, as well as considering alignments beyond Wasser-stein alignment to improve GZSL performance. 7. Acknowledgement This work is partly supported by funding from DST through IMPRINT program (IMP/2019/000250).We are grateful to the Govt of India and Intel India for the support. We thank the Japan International Cooperation Agency and IIT-Hyderabad for the provision of GPU servers, and the anonymous reviewers for the valuable feedback.
Uncontrolled Keywords: Cross-modal interaction; Data distribution; Image features; Inference network; Knowledge transfer; Performance; Semantic couplings; Semantic domains; Transfer of knowledge; Visual semantics
Subjects: Computer science
Divisions: Department of Computer Science & Engineering
Depositing User: . LibTrainee 2021
Date Deposited: 06 Oct 2022 13:20
Last Modified: 06 Oct 2022 13:20
Publisher URL:
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 10824 Statistics for this ePrint Item