Submodular Importance Sampling for Neural Network Training

Singh, Krishna Kant and Balasubramanian, Vineeth N (2018) Submodular Importance Sampling for Neural Network Training. Masters thesis, Indian Institute of Technology Hyderabad.

[img] Text
Thesis_Mtech_CS_4163.pdf - Submitted Version
Restricted to Repository staff only until August 2019.

Download (4MB) | Request a copy

Abstract

Stochastic Gradient Descent(SGD) algorithms are the workhorse on which Deep Learning systems have been built upon. The standard approach of uniform sampling in SGD algorithm leads to high variance between the calculated gradient and the true gradient, consequently resulting in longer training times. Importance sampling methods are used for sampling mini-batches that reduce this variance. There exist provable importance sampling techniques for variance reduction but,they generally do not fare well in the case of Deep Learning models. Our work proposes sampling strategies that create diverse mini-batches which consequently leads to the reduction in the variance of the SGD algorithm. We pose the task of creation of such mini-batches as, maximization of a submodular objective function. The proposed submodular objective function samples minibatches that such that more uncertain and diverse set of samples are selected with high probability. Submodular functions can be optimized easily using the GREEDY[1] algorithm but, even the newer variants suffer from performance issues when the size of the dataset is large. We propose a new faster submodular optimization method method which is inspired from [2]. We prove theoretically that our sampling scheme reduces variance in the case of SGD algorithm. We also show that Determinantal point process(DPP) sampling can also be seen as a special case of our algorithm. We showcase the generalization of our method by testing it on several deep learning data sets like MNIST,FMNIST, CIFAR-10 datasets. We study the effect of learning rate, network architecture etc on our proposed method.We study how different features affect the performance of our algorithm. We also study the case of transfer learning with our algorithm used for selection of the dataset. In all the experiments, we compare our algorithm with Loss based sampling and Random sampling for comparison.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Balasubramanian, Vineeth NUNSPECIFIED
Item Type: Thesis (Masters)
Uncontrolled Keywords: Machine Learning, Variance Reduction, Sampling
Subjects: Computer science
Divisions: Department of Computer Science & Engineering
Depositing User: Team Library
Date Deposited: 04 Jul 2018 05:06
Last Modified: 04 Jul 2018 05:06
URI: http://raiith.iith.ac.in/id/eprint/4163
Publisher URL:
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 4163 Statistics for this ePrint Item