Attentive Semantic Video Generation using Captions

Marwah, T and Mittal, G and Balasubramanian, Vineeth N (2017) Attentive Semantic Video Generation using Captions. arXiv. pp. 1-9.

Text (arXiv copy)
1708.05980.pdf - Accepted Version

Download (4MB) | Preview


This paper proposes a network architecture to perform variable length semantic video generation using captions. We adopt a new perspective towards video generation where we allow the captions to be combined with the long-term and short-term dependencies between video frames and thus generate a video in an incremental manner. Our experiments demonstrate our network architecture's ability to distinguish between objects, actions and interactions in a video and combine them to generate videos for unseen captions. The network also exhibits the capability to perform spatio-temporal style transfer when asked to generate videos for a sequence of captions. We also show that the network's ability to learn a latent representation allows it generate videos in an unsupervised manner and perform other tasks such as action recognition.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Balasubramanian, Vineeth NUNSPECIFIED
Item Type: Article
Uncontrolled Keywords: Computer Vision; Pattern Recognition
Subjects: Computer science > Computer programming, programs, data
Computer science > Special computer methods
Divisions: Department of Computer Science & Engineering
Depositing User: Team Library
Date Deposited: 28 Aug 2017 04:40
Last Modified: 25 Apr 2018 05:36
Publisher URL:
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 3467 Statistics for this ePrint Item