Subjective and Objective Methods for Stereoscopic Video Quality Assessment

Appina, Balasubramanyam and Channappayya, Sumohana (2019) Subjective and Objective Methods for Stereoscopic Video Quality Assessment. PhD thesis, Indian institute of technology Hyderabad.

[img] Text
Thesis_Phd_EE_4778.pdf - Submitted Version
Restricted to Repository staff only until February 2022.

Download (18MB) | Request a copy


Stereoscopic 3D (3D or S3D) digital technology has received a lot of attention from the society due to its ability to render depth. Due to this several industries like film, gaming, education etc., have invested a significant amount of research resources to use 3D visualization in their work. The developments and advancements in the S3D technology have made it for content creation and these improvements have led to widespread consumer acceptance. S3D content refers to both stereoscopic image and videos. In this thesis, the focus is exclusively on S3D videos. S3D video is a combination of spatial, temporal, depth components and the dependencies among the components. Like 2D digital video content, 3D content also undergoes several processing stages such as sampling, quantization, synchronization, visualization/rendering etc., for creation and utilization. These steps lead to a degradation in the quality of the S3D video which in turn results in poor user experience. Subjective and objective quality assessment techniques provide a systematic framework for assessing perceptual quality. In subjective assessment, human subjects or observers perform the quality assessment task which ends up being a cumbersome and time consuming process for a number of the VQA applications. However, subjective assessment is very important since most content is meant for human consumption and therefore serves as a benchmark for objective assessment algorithms. Objective assessment is typically classified into full reference (FR), reduced-reference (RR) and no reference (NR) methods. FR QA methods utilize the entire information of pristine or reference content, the RR QA methods utilize the partial information of pristine content, while the NR QA models do not use any information of the pristine content. NR QA models are further classified into supervised and unsupervised QA algorithms. This thesis presents both subjective and objective stereoscopic VQA algorithms. The subjective study experiments are performed to explore the effect of spatial, temporal and depth distortions on the perceptual video quality. In objective assessment, viii the dependencies between the motion and depth/disparity components of an S3D video are explored, and considered as primitive features to estimate the quality of an S3D video. Apart from solutions to the S3D VQA problem, this thesis has made contributions to objective QA of Super-multiview content with high angular resolution images. S3D video projection is classified as Anaglyph 3D (color coding display) and polarized 3D display views. In this thesis, the subjective study is carried out on both anaglyph 3D and polarized 3D display S3D views. The dataset used for the subjective study on Anaglyph 3D projection consists of 6 pristine and 144 distorted videos. We limit our attention to H.264 compression artifacts to generate the test stimuli. The reference video sequences contain a good combination of texture, motion, depth information and we divided these videos into 2 groups based on depth information. Further, 19 subjects participated in the subjective assessment task. Based on the subjective study, we have formulated a conditional relationship between the 2D and stereoscopic subjective scores as a function of compression rate and depth range. We call this database the LFOVIAPh1 S3D video database. In the polarized 3D projection subjective study, we conduct a subjective evaluation of full high-definition (full HD) stereoscopic video content. This study is comprehensive in terms of the variety of video content, the types of distortions considered and the number of test stimuli used. Specifically, we consider 12 reference videos that cover a wide range of texture, motion and depth. These reference videos are subjected to four commonly occurring distortions viz., H.264 compression, H.265 compression and blur, and a new temporal distortion called ‘frame freeze’. We generated a total of 288 symmetrically and asymmetrically distorted test stimuli by applying varying levels of these distortions to the pristine videos. A total of 20 subjects participated in our study. We call this database the LFOVIAPh2 S3D video database. In the objective assessment, the thesis presents full reference, and both supervised and unsupervised no reference objective quality assessment algorithms. In FR S3D ix video QA (VQA), we propose two objective quality assessment algorithms. The algorithms are FLOSIM3D and DeMo3D. In FLOSIM3D, we exploit the separable representation of motion and binocular disparity in the visual cortex and develop a four stage algorithm to measure the S3D video quality. First, we compute the temporal features by using an existing 2D VQA metric which measures the temporal annoyance based on patch level statistics such as mean, variance and minimum eigen value and pools them with a frame categorization based non-linear pooling strategy. Second, a structure based 2D Image Quality Assessment (IQA) metric is used to compute the spatial quality of the frames. Next, the loss in depth cues is measured using a structure based metric. Finally, the features for each of the stereo views are pooled to obtain the final stereo video quality score. Our algorithm is an extension to the 2D VQA FLOSIM [1], and therefore we termed our algorithm as FLOSIM3D. The generalized Gaussian density (GGD) and the Gaussian scale mixture (GSM) density are two models for 2D natural scene statistics [2, 3] that are very popular and have been widely employed in 2D IQA. Inspired by these approaches, in our previous 3D IQA work [4], we have modeled the joint dependencies of luminance and depth subband coefficients using a Bivariate GGD (BGGD). Also, we have shown that BGGD capture well these dependencies, and computed the BGGD coefficients to estimate the quality of an S3D image. Motivated from these statistical studies, we have extended the BGGD model in S3D video quality computation. In this thesis, we propose different FR and NR (supervised and unsupervised) QA algorithms of S3D videos using the BGGD parameters as primitive features. In DeMo3D, we rely on an empirical model for the joint statistics of motion and depth subband coefficients of an S3D video frame. Specifically, we use a Bivariate Generalized Gaussian Distribution (BGGD) model for the joint statistics. We compute the coherence scores (Ψ) from the eigenvalues of the covariance matrix to estimate the amount of directional dependency between the motion and depth components. We show that the coherence scores are distortion type and level discriminable. x To estimate the overall spatial quality score, we apply off-the-shelf 2D FR image QA metrics on a frame-by-frame basis on both the views and average the frame-wise scores. Finally, we pool the coherence and spatial quality scores to derive the overall quality for the S3D video. The proposed algorithm is called Depth and Motion based 3D video quality evaluator (DeMo3D). The performance of the proposed algorithms are evaluated over popular S3D video databases and shown to be robust and competitive with the state-of-the-art QA algorithms. Performance is measured using the linear correlation coefficient (LCC), Spearman’s rank order correlation coefficient (SROCC) and root mean score squared error (RMSE) between difference mean opinion scores (DMOS) and estimated objective quality scores. In NR S3D VQA, we propose supervised and unsupervised objective algorithms for stereoscopic videos. The algorithms are called VQUEMODES and MoDi3D. These works are motivated by our previous empirical findings that motion and depth statistical dependencies can be accurately modeled using a BGGD. VQUEMODES is a supervised S3D NR VQA and we demonstrate that the parameters (α, β) of the BGGD model possess the ability to discern quality variations in S3D videos. Therefore, the BGGD model parameters are employed as motion and depth quality features. In addition to these features, we rely on a frame-level spatial quality feature that is computed using a robust off-the-shelf NR image quality assessment (IQA) algorithm. These frame-level motion, depth, and spatial features are consolidated and used with the corresponding S3D video’s DMOS labels for supervised learning using support vector regression (SVR). The overall quality of an S3D video is computed by averaging the frame-level quality predictions of the constituent video frames. This algorithm is a Video QUality Evaluation using MOtion and DEpth Statistics (VQUEMODES). MoDi3D is an unsupervised (or completely blind) NR S3D VQA algorithm. Like VQUEMODES, we model the joint statistical dependencies between motion and xi disparity components using a BGGD model, and compute the BGGD model parameters α, β and coherence measure Ψ from the eigenvalues of the covariance matrix of the BGGD. In turn, we model the BGGD parameters (α, β and Ψ) of pristine S3D videos using a Multivariate Gaussian (MVG) distribution. The likelihood of a test video’s MVG model parameters coming from the pristine MVG model is computed and shown to play a key role in the overall quality estimation. We also estimate the global motion content of each video by averaging the SSIM scores between pairs of successive video frames. To estimate the test S3D video’s spatial quality, we apply the popular 2D NR unsupervised NIQE image QA model on a frame-by-frame basis on both views. The overall quality of a test S3D video is finally computed by pooling the test S3D video’s likelihood estimates, global motion strength and spatial quality scores. The proposed algorithm, which is unsupervised (or ‘completely blind,’ requiring no reference videos or training on subjective scores) is called the Motion and Disparity based 3D video quality evaluator (MoDi3D). The proposed S3D NR VQA algorithms show robust performance on the popular databases and are competitive with the state-of-the-art FR and NR algorithms. In this thesis, we also contributed the FR QA algorithms on Super-multiview content with high angular resolution images. The super-multiview content is a combination of spatial and depth information at a given 3D view. The projection of content based 3D multiview is different from the regular 2D perception. So the existing 2D FR IQA algorithms cannot give robust performance on these images. To fill the gap, we propose a FR objective quality metric. For every 3D view, the proposed metric combines spatial information from each constituent image and angular information (depth cues) from consecutive images. Finally, we show that the proposed metric correlates significantly with subjective scores, outperforming existing 2D metrics. The efficacy of pooling spatial and angular information highlights the fact that angular information plays a crucial role in 3D perception.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Channappayya, SumohanaUNSPECIFIED
Item Type: Thesis (PhD)
Uncontrolled Keywords: Subjective, Objective, Assessment, Algorithms
Subjects: Electrical Engineering
Divisions: Department of Electrical Engineering
Depositing User: Team Library
Date Deposited: 31 Jan 2019 06:46
Last Modified: 04 Feb 2019 04:24
Publisher URL:
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 4778 Statistics for this ePrint Item