From Recognition to Generation Using Deep Learning: A Case Study with Video Generation

Balasubramanian, Vineeth N (2018) From Recognition to Generation Using Deep Learning: A Case Study with Video Generation. Communications in Computer and Information Science, 844. pp. 25-36. ISSN 1865-0929

Full text not available from this repository. (Request a copy)


This paper proposes two network architectures to perform video generation from captions using Variational Autoencoders. We adopt a new perspective towards video generation where we use attention as well as allow the captions to be combined with the long-term and short-term dependencies between video frames and thus generate a video in an incremental manner. Our experiments demonstrate our network architectures’ ability to distinguish between objects, actions and interactions in a video and combine them to generate videos for unseen captions. Our second network also exhibits the capability to perform spatio-temporal style transfer when asked to generate videos for a sequence of captions. We also show that the network’s ability to learn a latent representation allows it generate videos in an unsupervised manner and perform other tasks such as action recognition.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Balasubramanian, Vineeth NUNSPECIFIED
Item Type: Article
Uncontrolled Keywords: Attention, Deep learning, Generative models, Variational Autoencoders, Video understanding
Subjects: Computer science
Divisions: Department of Computer Science & Engineering
Depositing User: Team Library
Date Deposited: 09 Oct 2018 09:43
Last Modified: 09 Oct 2018 09:43
Publisher URL:
OA policy:
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 4475 Statistics for this ePrint Item