Panos Nasiopoulos

Professor

Relevant Thesis-Based Degree Programs

Affiliations to Research Centres, Institutes & Clusters

 
 

Postdoctoral Fellows

Graduate Student Supervision

Doctoral Student Supervision

Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.

Light field spatial and angular super-resolution (2023)

Light field technology offers a truly immersive experience, having the potential to revolutionize entertainment, education, virtual and augmented reality, autonomous driving, and digital health. There are two common techniques for capturing LF: multiple-camera arrays and microlens array (i.e., plenoptic camera). However, LF capturing techniques face a trade-off between spatial and angular resolution. That is, camera arrays capture high-spatial-resolution LF images with sparse angular sampling (i.e., fewer views), while plenoptic cameras capture dense angular sampling (i.e., more views) with low-spatial-resolution LF images due to the size of the plenoptic camera sensors. Since constructing an LF camera array is expensive, time consuming, and often impractical, many efforts have been made to improve the spatial resolution of LF images captured by plenoptic cameras. In this thesis, we propose two novel methods for LF spatial super-resolution (SR). First, we propose a learning-based model for spatial SR which takes advantage of the epipolar image plane (EPI) information to ensure smooth disparity between the generated views and in turn construct high-spatial-resolution LF images. In our second contribution, we exploit the full four-dimensional (4D) LF images by proposing a deep-learning spatial SR approach that considers the spatial and angular information (i.e., information within each view and information among other views) and progressively reconstructs high-resolution LF images at different upscaling levels. Another challenge when dealing with LF images is the enormous amount of data generated, as they require significant increase in bandwidth. A possible solution may be to drop specific views at the transmitting end and effectively synthesize them at the receiver end, thus minimizing the amount of data needed to be transferred or stored. Accordingly, our third contribution focuses on LF angular SR by synthesizing virtual LF views from a sparse set of input views using two novel approaches. First, a deep recursive residual network is applied using the EPI information to generate one in-between view. Second, a generative adversarial network approach is proposed, which generates up to five in-between views, using LF spatial and angular information for an efficient angular SR with minimal impact on the visual quality on the generated LF content.

View record

Video-based human fall detection in indoor spaces for health monitoring (2023)

The full abstract for this thesis is available in the body of the thesis, and will be available when the embargo expires.

View record

Improving the perceived high dynamic range uniformity across viewing environments (2022)

The High Dynamic Range (HDR) technology drastically improves the visual quality of image and video content. However, with increased promises come higher expectations as HDR content is expected to look the same across various viewing environments. This is typically achieved by content re-targeting algorithms, which ultimately target to compensate for the changes in human perception between the different environments. The challenge of uniform representation across all viewing environments has never been addressed in the past, as the limitations of the Standard Dynamic Range (SDR) technology prohibited such a possibility. In this thesis, we propose methods and metrics to improve the uniform presentation of HDR content across various viewing environments. First, we propose two brightness quantification metrics that are specifically designed for the HDR technology as the current metrics that are used during content delivery are designed for SDR and thus have limited accuracy. Our first metric is based on the properties of the Human Visual System (HVS) and content contrast, while our second metric utilizes the pixel color intensity and spatial location. Both proposed metrics outperform the state-of-the art solutions on brightness quantification. In our second contribution, we assess the viewers’ tolerance to HDR high luminance values in dim cinema environments. The findings of this work are guidelines submitted to the committee responsible for standardization of the HDR cinema display technologies. Our third contribution addresses the perceived color inconsistencies between the two upcoming cinema HDR projection technologies, the RGB laser and Laser Phosphor. The Spectral Power Distribution (SPD) differences of the two projector types along with the outdated colorimetry standard lead to metameric failure, a phenomenon where measurably identical colors do not visually match. In this work, we analyze in depth the occurred metameric failure by designing and conducting two new subjective experiments and propose a method to reduce the metameric effect. Our final contribution is a new tone mapping algorithm that ensures high visual fidelity between the original and processed (tone mapped) version. Our method is a lightweight, Generative Adversarial Network that efficiently adapts to various scenes and delivers the same high-quality visual results across them.

View record

Improving perceptual quality of high dynamic range video (2020)

With the real-life viewing experience of High Dynamic Range (HDR) videos and the growing availability of HDR displays and video content, an efficient HDR video delivery pipeline is required for applications such as broadcasting. The existing pipeline has been designed for Standard Dynamic Range (SDR) signals and displays. Using this pipeline for HDR content will result in visible quality degradation as HDR bears fundamental differences with SDR technology such as higher brightness levels and a wider color gamut (WCG). As a result, various HDR delivery pipelines are under development, supporting varying bitrates and visual quality.In this thesis, we improve the visual quality and hence quality of experience (QoE) of delivered HDR videos without increasing the bitrate. First, we investigate the existing transmission pipelines’ efficiency in delivering HDR through an extensive set of subjective experiments. The unprecedented analysis of each pipeline presented in this work, while considering their backward compatibility with SDR displays, provides valuable information for broadcasters to identify the most efficient pipeline in terms of required bitrate and visual quality for viewers. Next, we evaluate the effect that the identified HDR delivery pipeline has color accuracy. These evaluations are helpful to determine the colors that need improvement. By considering certain characteristics of the human visual system (HVS), we propose two processing techniques that improve the perceptual fidelity of these colors. The proposed techniques are shown to outperform the existing methods in terms of maintaining the color information of HDR signals first subjectively through a set of visual evaluations and second objectively by using color difference evaluation metrics. Additionally, for cases where delivered HDR signals are received by an SDR display, we propose two novel color mapping methods that result in the least perceptual color differences compared to the original HDR signal. The proposed color mapping techniques are compatible with the current pipeline infrastructure with minimal implementation cost. The presented work in this thesis improves the visual quality of transmitted HDR videos, either viewed directly on HDR displays or through a mapping process on SDR displays, while the transmission bitrate is not affected.

View record

Inverse tone mapping of standard dynamic range content for high dynamic range applications (2020)

High Dynamic Range (HDR) technology has revolutionized the field of digital media, affecting different aspects such as capturing, compression, transmission and display. By modeling the behavior of Human Visual System (HVS) when perceiving brightness and color, HDR technology offers a life-like viewing experience that is far superior to what Standard Dynamic Range (SDR) technology could achieve. While HDR technology has a disruptive impact in different fields, it also opens new revenue sources for SDR content owners and broadcasters that will continue producing real-time events in SDR format for the near future. For the latter case, SDR content need to be efficiently converted to HDR format, taking advantage of the superior visual quality of HDR displays.Over the years, several attempts aimed at converting SDR content to HDR format, a process known as inverse Tone Mapping (iTM). The design of inverse Tone Mapping Operators (iTMOs) is considered a difficult task, as it tries to expand the brightness and color information to ranges not originally captured by SDR cameras.In this thesis, we propose novel iTMOs that can effectively deal with all types of SDR content from dark, to normal and bright scenes, producing high visual quality HDR content. Our proposed methods work in the perceptual domain, which allows us to take advantage of the sensitivity of the human eye to brightness changes in different areas of the scene during the mapping process. To preserve the overall artistic impression, we developed methods that divide the SDR frame into dark, normal (average brightness), and bright regions, allowing us to keep intact dark and bright areas, without darkening or brightening up the frame.We also address the issue of the color shift in SDR to HDR mapping by proposing a perception-based color adjustment method that preserves the hue of colors with insignificant changes in brightness, producing HDR colors that are faithful to their SDR counterparts. Subjective and objective evaluations have shown that our proposed iTMOs outperform the state-of-the-art methods in terms of overall visual quality of the generated HDR video and generating HDR colors that closely follow their SDR counterparts.

View record

Complexity Reduction Schemes for Video Compression (2017)

With consumers having access to a plethora of video enabled devices, efficient transmission of video content with different quality levels and specifications has become essential. The primary way of achieving this task is using the simulcast approach, where different versions of the same video sequence are encoded and transmitted separately. This approach, however, requires significantly large amounts of bandwidth. Another solution is to use scalable Video Coding (SVC), where a single bitstream consists of a base layer (BL) and one or more enhancement layers (ELs). At the decoder side, based on bandwidth or type of application, the appropriate part of an SVC bit stream is used/decoded. While SVC enables delivery of different versions of the same video content within one bit stream at a reduced bitrate compared to simulcast approach, it significantly increases coding complexity. However, the redundancies introduced between the different versions of the same stream allow for complexity reduction, which in turn will result in simpler hardware and software implementation and facilitate the wide adoption of SVC. This thesis addresses complexity reduction for spatial scalability, SNR/Quality/Fidelity scalability, and multiview scalability for the High Efficiency Video Coding (HEVC) standard. First, we propose a fast method for motion estimation of spatial scalability, followed by a probabilistic method for predicting block partitioning for the same scalability. Next, we propose a content adaptive complexity reduction method, a mode prediction approach based on statistical studies, and a Bayesian based mode prediction method all for the quality scalability. An online-learning based mode prediction method is also proposed for quality scalability. For the same bitrate and quality, our methods outperform the original SVC approach by 39% for spatial scalability and by 45% for quality scalability. Finally, we propose a content adaptive complexity reduction scheme and a Bayesian based mode prediction scheme. Then, an online-learning based complexity reduction scheme is proposed for 3D scalability, which incorporates the two other schemes. Results show that our methods reduce the complexity by approximately 23% compared to the original 3D approach for the same quality/bitrate. In summary, our methods can significantly reduce the complexity of SVC, enabling its market adoption.

View record

Compression efficiency improvement for 2D and 3D video (2017)

Advances in video compression technologies have resulted in high visual quality at constrained amounts of bitrate. This is crucial in video transmission and storage, considering the limited bandwidth of communication channels and storage media with limited capacities. In this thesis, we propose new methods for improving the compression efficiency of HEVC and its 3D extension for stereo and multiview video content.To achieve high video quality while keeping the bitrate within certain constraints, the characteristics of the human visual system (HVS) play an important role. The utilization of video quality metrics that are based on the human visual system and their integration within the video encoder can improve compression efficiency. We, therefore, propose to measure the distortion using a perceptual video quality metric (instead of sum of squared errors) inside the coding unit structure and for mode selection in the rate distortion optimization process of HEVC. Experiments show that our method improves HEVC compression efficiency by 10.21%.Next, we adjust the trade-off between the perceptual distortion and the bitrate based on the characteristics of the video content. The value of the Lagrange multiplier is estimated from the first frame for every scene in the video. Experimental results show that the proposed approach further improves the compression efficiency of HEVC (up to 2.62% with an average of 0.60%).Furthermore, we extend our work to address the HEVC extension for 3D video. First, we integrate the perceptual video quality in the rate distortion optimization process of stereo video coding where the dependencies between the two views are exploited to improve coding efficiency. Next, we extend our approach to multiview video coding for auto-stereoscopic displays (where 3D content can be viewed without using 3D glasses). In this case, two or three views and their corresponding depth maps need to be coded. Our proposed perceptual 3D video coding increases the compression efficiency of multi-view video coding by 2.78%.Finally, we show that compression efficiency of stereoscopic videos improves if we take advantage of asymmetric video coding. The proposed approach reduces the amount of bitrate required for transmitting stereoscopic video while maintaining the stereoscopic quality.

View record

Energy efficient video sensor networks for surveillance applications (2016)

Video sensor networks (VSNs) provide rich sensing information and coverage, both beneficial for applications requiring visual information such as smart homes, traffic control, healthcare systems and monitoring/surveillance systems. Since a VSN-based surveillance application is usually assumed to have limited resources, energy efficiency has become one of the most important design aspects of such networks. However, unlike common sensor network platforms, where power consumption mostly comes from the wireless transmission, the encoding process in a video sensor network contributes to a significant portion of the overall power consumption. There is a trade-off between encoding complexity and bitrate in a sense that in order to increase compression performance, i.e., achieve a lower bitrate, a more complex encoding process is necessary. The coding complexity and video bitrate determine the overall encoding and transmission power consumption of a VSN. Thus, choosing the right configuration and setting parameters that lead to optimal encoding performance is of primary importance for controlling power consumption in VSNs. The coding complexity and bitrate also depend on the video content complexity, as spatial details and high motion tend to lead to higher computation costs or increased bitrates. In a video surveillance network, each node captures an event from a different point of view, such that each captured video stream has unique spatial and temporal information. This thesis investigates the trade-off between encoding complexity and communication power consumption in a video surveillance network where the effect of video encoding parameters, content complexity, and network topology are taken into consideration. In order to take into account the effect of content complexity, we created a video surveillance dataset consisting of a large number of captured videos with different levels of spatial information and motion. Then, we design an algorithm that minimizes the video surveillance network’s power consumption for different scene settings. Models that estimate the coding complexity and bitrate were proposed. Finally, these models were used to minimize the video surveillance network’s power consumption and estimate the encoding parameters per each node that yield the minimum possible power consumption for the entire network.

View record

3D Video Quality Assessment (2015)

A key factor in designing 3D systems is to understand how different visual cues and distortions affect the perceptual quality of 3D video. The ultimate way to assess video quality is through subjective tests. However, subjective evaluation is time consuming, expensive, and in most cases not even possible. An alternative solution is objective quality metrics, which attempt to model the Human Visual System (HVS) in order to assess the perceptual quality. The potential of 3D technology to significantly improve the immersiveness of video content has been hampered by the difficulty of objectively assessing Quality of Experience (QoE). A no-reference (NR) objective 3D quality metric, which could help determine capturing parameters and improve playback perceptual quality, would be welcomed by camera and display manufactures. Network providers would embrace a full-reference (FR) 3D quality metric, as they could use it to ensure efficient QoE-based resource management during compression and Quality of Service (QoS) during transmission.In this thesis, we investigate the objective quality assessment of stereoscopic 3D video. First, we propose a full-reference Human-Visual-system based 3D (HV3D) video quality metric, which efficiently takes into account the fusion of the two views as well as depth map quality. Subjective experiments verified the performance of the proposed method. Next, we investigate the No-Reference quality assessment of stereoscopic video. To this end, we investigate the importance of various visual saliency attributes in 3D video. Based on the results gathered from our study, we design a learning based visual saliency prediction model for 3D video. Eye-tracking experiments helped verify the performance of the proposed 3D Visual Attention Model (VAM). A benchmark dataset containing 61 captured stereo videos, their eye fixation data, and performance evaluations of 50 state-of-the-art VAMs is created and made publicly available online. Finally, we incorporate the saliency maps generated by our 3D VAM in the design of the state-of-the- art no-reference (NR) and also full-reference (FR) 3D quality metrics.

View record

Capturing and post-processing of stereoscopic 3D content for improved quality of experience (2013)

3D video can offer real-life viewing experience by providing depth impression. 3D technology has not yet been widely adopted due to challenging 3D-related issues, ranging from capturing to post-processing and display. At the capturing side, lack of guidelines may lead to artifacts that cause viewers headaches and nausea. At the display side, not having 3D content customized to a certain aspect ratio, display size, or display technology may result in reduced quality of experience. Combining 3D with high-dynamic-range imaging technology adds exciting features towards real-life experience, whereas conventional low-dynamic-range content often suffers from color saturation distortion when shown on high-dynamic-range displays. This thesis addresses three important issues on capturing and post-processing 3D content to achieve improved quality of experience. First, we provide guidelines for capturing and displaying 3D content. We build a 3D image and video database with the content captured at various distances from the camera lenses and under different lighting conditions. We conduct comprehensive subjective tests on 3D displays of different sizes to determine the influence of these parameters to the quality of 3D images and videos before and after horizontal parallax adjustment. Next, we propose a novel and complete pipeline for automatic content-aware 3D video reframing. We develop a bottom-up 3D visual attention model that identifies the prominent regions in a 3D video frame. We further provide a dynamic bounding box that crops the video and avoids annoying problems, such as jittering and window violation. Experimental results show that our algorithm is both effective and robust. Finally, we propose two algorithms for correcting saturation in color images and videos. One algorithm uses a fast Bayesian-based approach that utilizes images’ strong spatial correlation and the correlations between the R, G, and B color channels. The other algorithm takes advantage of the strong correlation between the chroma of the saturated pixels and their surrounding unsaturated pixels. Experimental results show that our methods effectively correct the saturated 2D and 3D images and videos. Our algorithms significantly outperform the existing state-of-the-art method in both objective and subjective qualities, resulting in plausible content that resembles real-world scenes.

View record

Correcting capturing and display distortions in 3D video (2012)

3D video systems provide a sense of depth by showing slightly different images to the viewer’s left and right eyes. 3D video is usually generated by capturing a scene with two or more cameras and 3D displays need to be able to concurrently display at least two different images. The use of multiple cameras and multiple display channels creates problems that are not present in 2D video systems. At the capturing side, there can be inconsistencies in the videos captured with the different cameras, for example the videos may differ in brightness, colour, sharpness, etc. At the display side, crosstalk is a major problem. Crosstalk is an effect where there is incomplete separation of the images intended for the two eyes; so the left eye sees a portion of the image intended for the right eye and vice versa. In this thesis, we develop methods for correcting these capturing and display distortions in 3D video systems through new digital video processing algorithms.First we propose a new method for correcting the colour of multiview video sets. Our method modifies the colour of all the input videos to match the average colour of the original set of views. Experiments show that applying our method greatly improves the efficiency of multiview video coding. We present a modification of our colour correction algorithm which also corrects vignetting (darkening of an image near its corners), which is useful when images are stitched together into a panorama.Next, we present a method for making stereo images match in sharpness based on scaling the discrete cosine transforms coefficients of the images. Experiments show that our method can greatly increase the accuracy of depth maps estimated from two images that differ in sharpness, which is useful in 3D systems that use view rendering.Finally, we present a new algorithm for crosstalk compensation in 3D displays. Our algorithm selectively adds local patches of light to regions that suffer from visible crosstalk, while considering temporal consistency to prevent flickering. Results show our method greatly reduces the appearance of crosstalk, while preserving image contrast.

View record

Tone-mapping high dynamic range images and videos for bit-depth scalable coding and 3D displaying (2012)

High dynamic range (HDR) images and videos provide superior picture quality by allowing a larger range of brightness levels to be captured and reproduced than their traditional 8-bit low dynamic range (LDR) counterparts. Even with existing 8-bit displays, picture quality can be significantly improved if the content is first captured in HDR format and then converted to LDR format. This converting process is called tone-mapping. In this thesis, we address different aspects of tone-mapping.HDR video formats are unlikely to be broadly accepted without the backward-compatibility with LDR devices. We first consider the case where only the tone-mapped LDR content is transmitted and the HDR video is reconstructed at the receiver by inversely tone-mapping the encoded-decoded LDR video. We show that the appropriate choice of a tone-mapping operator can result in a reconstructed HDR video with good quality. We develop a statistical model of the distortion resulting from tone-mapping, compressing, de-compressing and inverse tone-mapping the HDR video. This model is used to formulate an optimization problem that finds the tone-curve that minimizes the distortion in the reconstructed HDR video. We also derive a simplified version of the model that leads to a closed-form solution for the optimization problem.Next, we consider the case where the HDR content is transmitted using an LDR and an enhancement layers. We formulate an optimization problem that minimizes the transmitted bit-rate of a video sequence and also results in the tone-mapped video that satisfies some desired perceptual appearance. The problem formulation also contains a constraint that suppresses temporal flickering artifacts.We also propose a technique that tone-maps an HDR video directly in a compression friendly color space (e.g., YCbCr) without the need to convert it to the RGB domain.We study the design of 3D HDR-LDR tone-mapping operators. To find the appropriate tone-mapping characteristics that contribute to good 3D representation, subjective psychophysical experiments are performed for i) evaluating existing tone-mapping operators on 3D HDR images and ii) investigating how the preferred level of brightness and details differ between 3D and 2D images. The results are analyzed to find out the desired attributes.

View record

3D-TV Content generation and multi-view video coding (2010)

The success of the 3D technology and the speed at which it will penetrate the entertainment market will depend on how well the challenges faced by the 3D-broadcasting system are resolved. The three main 3D-broadcasting system components are 3D content generation, 3D video transmission and 3D display. One obvious challenge is the unavailability of a wide variety of 3D content. Thus, besides generating new 3D-format videos, it is equally important to convert existing 2D material to the 3D format. This is because the generation of new 3D content is highly demanding and in most cases, involves post-processing correction algorithms. Another major challenge is that of transmitting a huge amount of data. This problem becomes much more severe in the case of multiview video content. This thesis addresses three aspects of the 3D-broadcasting system challenges.Firstly, the problem of converting 2D acquired video to a 3D format is addressed. Two new and efficient methods were proposed, which exploit the existing relationship between the motion of objects and their distance from the camera, to estimate the depth map of the scene in real-time. These methods can be used at the transmitter and receiver-ends. It is especially advantageous to employ them at the receiver-end since they do not increase the transmission bandwidth requirements. Performance evaluations show that our methods outperform the other existing technique by providing better depth approximation and thus a better 3D visual effect. Secondly, we studied one of the problems caused by unsynchronized zooming in stereo-camera video acquisition. We developed an effective algorithm for correcting unsynchronized zoom in 3D videos. The proposed scheme finds corresponding pairs of pixels between the left and right views and the relationship between them. This relationship is used to estimate the amount of scaling and translation needed to align the views. Experimental results show our method produces videos with negligible scale difference and vertical parallax. Lastly, the transmission of 3D-content problem is addressed and two schemes for multiview video coding (MVC) are proposed. While both methods outperform the current MVC standard, one of them introduces significantly less random access delay compared to the MVC standard.

View record

Advances in medical image compression: novel schemes for highly efficient storage, transmission and on demand scalable access for 3D and 4D medical imaging data (2010)

Three dimensional (3D) and four dimensional (4D) medical images are increasingly being used in many clinical and research applications. Due to their huge file size, 3D and 4D medical images pose heavy demands on storage and archiving resources. Lossless compression methods usually facilitate the access and reduce the storage burden of such data, while avoiding any loss of valuable clinical data. In this thesis, we propose novel methods for highly efficient storage and scalable access of 3D and 4D medical imaging data that outperform the state-of the-art. Specifically, we propose (1) a symmetry-based technique for scalable lossless compression of 3D medical images; (2) a 3D scalable medical image compression method with optimized volume of interest (VOI) coding; (3) a motion-compensation-based technique for lossless compression of 4D medical images; and (4) a lossless functional magnetic resonance imaging (fMRI) compression method based on motion compensation and customized entropy coding. The proposed symmetry-based technique for scalable lossless compression of 3D medical images employs wavelet transform technology and a prediction method to reduce the energy of the wavelet sub-bands based on a set of axes of symmetry. We achieve VOI coding by employing an optimization technique that maximizes reconstruction quality of a VOI at any bit-rate, while incorporating partial background information and allowing for gradual increase in peripheral quality around the VOI. The proposed lossless compression method for 4D medical imaging data employs motion compensation and estimation to exploit the spatial and temporal correlations of 4D medical images. Similarly, the proposed fMRI lossless compression method employs a motion compensation process that uses a 4D search, bi-directional prediction and variable-size block matching for motion estimation; and a new context-based adaptive binary arithmetic coder to compress the residual and motion vector data generated by the motion compensation process.We demonstrate that the proposed methods achieve a superior compression performance compared to the state-of-the-art, including JPEG2000 and 3D-JPEG2000.

View record

Computationally efficient techniques for H.264/AVC transcoding applications (2010)

Providing universal access to end-users is the ultimate goal of the communications, entertainment and broadcasting industries. H.264/AVC has become the coding choice for broadcasting, and entertainment (i.e., DVD/Blu-ray), meaning that the latest set-top boxes and playback devices support this new video standard. Since many existing videos had been encoded using previous video coding standards (e.g., MPEG-2), playing them back on the new devices will be possible only if they are converted or transcoded into the H.264/AVC format. In addition, even in the case that videos are compressed using H.264/AVC, transmitting them over different networks for different user applications (e.g., mobile phones, TV) will require transcoding in order to adapt them to different bandwidth and resolution requirements. This thesis tackled the H.264/AVC transcoding problems in 3 aspects. At first, we propose the algorithms that improve the resultant video quality of the transform-domain MPEG-2 to H.264/AVC transcoding structure. Transform-domain transcoding offers the least complexity. However, it produces transcoded videos suffering from some inherent video distortions. We provide a theoretical analysis for these distortions and propose algorithms that compensate for the distortions. Performance evaluation shows that the proposed algorithms greatly improve the resultant transcoded video quality with reasonable computational complexity. Second, we develop an algorithm that speeds up the process of the pixel-domain MPEG-2 to H.264/AVC transcoding. Motion re-estimation is the most time consuming process for this type of transcoding. The proposed algorithm accelerates the motion re-estimation process by predicting the H.264/AVC block-size partitioning. Performance evaluation shows the proposed algorithm significantly reduces the computational complexity compared to the existing state-of-the-art method, while maintaining the same compression efficiency. At last, we propose the algorithm that accelerates the transcoding process of downscaling a coded H.264/AVC video into its downscaled version using arbitrary downscaling ratios. To accelerate the process of encoding the downscaled video, the proposed algorithm derives accurate initial motion vectors for the downscaled video, thus greatly reducing the computational complexity of the motion re-estimation process. Compared to other downscaling state-of-the-art methods, the proposed method requires the least computation while yields the best compression efficiency.

View record

Fast motion estimation methods for H.264 video coding on mobile devices (2010)

Digital video is becoming an increasingly widespread application on a multitude of devices ranging from mobile devices to digital cinema. Technological advancements in processing speed and available bandwidth along with substantial improvements in compression techniques enable completely new applications and services for digital video content.The most demanding task in video encoding is the motion estimation process which aims to identify similarities to previously transmitted video frames. Up to 90% of the processing requirements are attributable to this element.In this thesis, we present three methods for encoding new and transcoding existing video content with significantly reduced computational complexity while maintaining both quality and bitrate.The first method reduces the number of steps required to perform motion estimation by adaptively adjusting the search accuracy needed in distortion measurement.The second method addresses the topic of mode decision in video encoding and provides an algorithm that allows an early decision about the most probable modes without the need to evaluate all 259 different combinations of block sizes.The third method provides a multi-dimensional measure that facilitates evaluating only the most likely modes for efficiently transcoding existing pre-encoded content to lower resolutions with an arbitrary downscaling ratio. This is an important factor for the ever-growing number of devices and application scenarios that access existing pre-encoded content. Our method supplements existing fast transcoding schemes that primarily focus on efficiently determining motion vectors in transcoding.

View record

Modeling of scalable video content for multi-user wireless transmission (2009)

This thesis addresses different aspects of wireless video transmission of scalable video content to multiple users over lossy and under-provisioned channels. Modern wireless video transmission systems, such as the Third Generation Partnership Project (3GPP)'s high speed packet access (HSPA) networks and IEEE 802.11-based wireless local area networks (WLANs) allow sharing common bandwidth resources among multiple video users. However, the unreliable nature of the wireless link results in packet losses and fluctuations in the available channel capacity. This calls for flexible encoding, error protection, and rate control strategies implemented at the video encoder or base station.The scalable video coding (SVC) extension of the H.264/AVC video standard delivers quality scalable video bitstreams that help define and provide quality of service (QoS) guarantees for wireless video transmission applications. We develop real-time rate and distortion estimation models for the coarse/medium granular scalability (CGS/MGS) features in SVC. These models allow mobile video encoders to predict the packet size and corresponding distortion of a video frame using only the residual mean absolute difference (MAD) and the quantization parameter (QP).This thesis employs different cross layer resource allocation techniques that jointly optimize the video bit-rate, error protection, and latency control algorithms in pre-encoded and real-time streaming scenarios. In the first scenario, real-time multi-user streaming with dynamic channel throughput and packet losses is solved by controlling the base and enhancement layer quality as well as unequal erasure protection (UXP) overhead to minimize the frame-level distortion. The second scenario considers pre-encoded scalable video streaming in capacity limited wireless channels suffering from latency problems and packet losses. We develop a loss distortion model for hierarchical predictive coders and employ dynamic UXP allocation with a delay-aware non-stationary rate-allocation streaming policy. The third scenario addresses the problem of efficiently allocating multi-rate IEEE 802.11-based network resources among multiple scalable video streams using temporal fairness constraints. We present a joint link-adaptation at the physical (PHY) layer and a dynamic packet dropping mechanism in the network or medium access control (MAC) layer for multi-rate wireless networks. We demonstrate that these methods result in significant performance gains over existing schemes.

View record

Master's Student Supervision

Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.

Detection and localization of individual free street parking spaces using artificial intelligence and motion estimation (2023)

The full abstract for this thesis is available in the body of the thesis, and will be available when the embargo expires.

View record

Efficient street parking sign detection and recognition using artificial intelligence (2023)

Traffic congestion in urban centers presents a pressing challenge for mobility and quality of life. Addressing this issue requires innovative solutions, with a key focus on leveraging the capabilities of autonomous and human-driven vehicles. A crucial aspect of this effort involves integrating parking sign detection technology to alleviate congestion. Despite its promising potential to improve environmental conditions and productivity, the domain of parking sign detection faces substantial challenges stemming from the diversity of sign types, complex detection requirements, and environmental variables. This thesis introduces an innovative approach to the precise detection and recognition of street parking signs, with the aim of integration into vehicle systems. Using our unique and extensive dataset, we conducted a comparative analysis of various object detection networks, aiming to select a model that balances computational efficiency and performance accuracy. Our evaluations revealed the superior performance of the You Only Look Once (YOLO) object detection models, particularly YOLOv7-X at the time, in terms of accuracy and computational complexity. Initially, the YOLOv7-X deep learning network is employed for the detection of parking signs in a dataset comprising videos captured by car cameras in Vancouver. Subsequently, a matching network utilizing the Triplet Loss function is applied for precise identification, while we leverage temporal information to further enhance the accuracy of detection and recognition. Performance evaluation demonstrated the robustness of our approach, yielding a mean Average Precision (mAP) of 97.4% for parking sign detection and a remarkable 91% accuracy in parking sign identification in a dataset of 43 different classes.

View record

Subjective and objective image and video quality assessment methodologies and metrics (2023)

Impressive advancements in capturing, display, and delivery technologies significantly elevate image and video quality, and with that the need for designing new subjective and objective image and video quality metrics as well as coding methods. However, the delivered quality is affected by many factors, such as limited transmission bandwidth and compression distortions, which may cause degradation in image/video quality and in turn reduce the users' quality of experience (QoE). On one hand, technological advances tend to increase consumer expectations, while on the other the explosive use of social media and entertainment playback on a plethora of devices, ranging from Virtual Reality displays to TVs, have given rise to many new challenges in evaluating the quality of the captured and delivered content. As always, service providers would like to have an accurate way of assessing the perceptual quality of the decoded video streams at the receiver end. Although subjective quality assessment is the best way to evaluate delivered content, this is rarely practical for most applications. It is, thus, of great significance to develop effective image and video quality metrics as well as compression schemes, which will address the above-mentioned challenges. In this thesis, we first propose a foveated compression approach for images rendered on head-mounted displays (HMD) for virtual reality (VR) applications and a subjective scheme for measuring the quality of the generated images. Then, we propose deep learning based no-reference quality metrics that evaluate the quality of high-definition (HD) images and videos that have been compressed by the HEVC standard. We have also created comprehensive and representative ground truth datasets that are publicly available and may become a benchmark for research in related areas.

View record

An efficient middle-out prediction structure for light field video compression using MV-HEVC (2019)

Light field imaging has emerged as a technology that enables the capture of richer visual information. While traditional photography captures just a 2D projection of the light in the scene, a light field camera collects the radiance from rays in all directions and extracts the angular information that is otherwise lost in conventional photography. This angular information can be used to substantially improve immersiveness, focus, depth, color, intensity and perspective, opening up new market opportunities. Nevertheless, the high-dimensionality of light fields also brings with it its own new challenges such as the size of the captured data. Research in light field image compression is becoming increasingly popular, but light field video compression remains a relatively under-explored field. State of the art solutions attempt to apply existing multi-view coding (MVC) methods to encode light field videos. While these solutions show potential, they do not manage to address the bandwidth problem imposed by the size of data involved. Hence, there is a real need for improvement, taking advantage of the additional redundancies of light field video and the intricacies of this data.In this thesis, we proposed a three-dimensional prediction structure for efficiently coding light video using the MV-HEVC standard. First, we modify the inter-view structure in order to exploit the higher similarity found around the central set of views compared to those around the edges. In addition to this, the selection of which views start with a P-frame takes into consideration maximizing their utilization as references by other views. Secondly, we build upon this structure by expanding the GOP size and creating a more efficient temporal structure that better utilizes the higher-fidelity sequences as references for subsequent frames. The schema contains various temporal structures for compressing the views, which is based on their encoding order. This facilitates the latter views relying more heavily on frames from other views as references in order to compensate for the fact that the preceding frames are more quantized by comparison.

View record

High quality virtual view synthesis for immersive video applications (2018)

Advances in image and video capturing technologies, coupled with the introduction of innovative Multiview displays, present new opportunities and challenges to content providers and broadcasters. New technologies that allow multiple views to be displayed to the end-user, such as Super Multiview (SMV) and Free Viewpoint Navigation (FN), aim at creating an immersive experience by offering additional degrees of freedom to the user. Since transmission bitrates are proportional to the number of the cameras used, reducing the number of capturing devices and synthesizing/generating intermediate views at the receiver end is necessary for decreasing the required bandwidth and paving the path toward practical implementation.View synthesis is the common approach for creating new virtual views either for expanding the coverage or closing the gap between existing real camera views, depending on the type of Free Viewpoint TV application, i.e., SMV or 2D walk-around-scene-like (FN) immersive experience. In these implementations, it is common for the majority of the cameras to have dissimilar characteristics and different viewpoints often yielding significant luminance and chrominance discrepancies among the captured views. As a result, synthesized views may have visual artifacts, caused by incorrect estimation of missing texture in occluded areas and possible brightness and color differences between the original real views.In this thesis, we propose unique view synthesis methods that address the inefficiencies of conventional view synthesis approaches by eliminating background leakage and using edge-aware background warping and inter-pixel color interpolation techniques to avoid deformation of foreground objects. Improved occlusion filling is achieved by using information from a temporally constructed background. We also propose a new view synthesis method specifically designed for FN applications, addressing the challenge of brightness and color transition between consecutive virtual views. Subjective and objective evaluations showed that our methods significantly improve the overall objective and subjective quality of the synthesized videos.

View record

Tone mapping of high dynamic range video for video gaming applications (2018)

High Dynamic Range (HDR) technology is regarded as the latest revolution in digital multimedia, as it aims at capturing, distributing and displaying a range of luminance and color values that better correspond to what the human eye can perceive. Inevitably, physical-based rendering in High Dynamic Range (HDR) has recently gained a lot of interest in the video gaming industry. However, the limited availability of commercial HDR displays on one hand and the large installed base of Standard Dynamic Range (SDR) displays on the other imposed the need for techniques to efficiently display HDR content on SDR TVs. Several such techniques, known as Tone-Mapping Operators (TMOs), have been proposed, but all of them are specifically designed for natural content. As such, these TMOs fail to address the unique characteristics of the HDR gaming content, causing loss of details and introducing visual artifacts such as brightness and color inconsistencies.In this thesis, we propose an automated, low complexity and content adaptive video TMO specifically designed for video gaming applications. The proposed method uses the distribution of HDR light information in the perceptual domain and takes advantage of the unique properties of rendered HDR gaming content to calculate a global piece-wise-linear tone-mapping curve to efficiently preserve the global contrast and texture details of the original HDR scene. A unique flickering reduction method is also introduced that eliminates brightness inconsistencies caused by the tone-mapping process while successfully detecting scene changes. Subjective and objective evaluations have shown that our method outperforms existing TMOs, offering better overall visual quality for video gaming content.

View record

Improving non-constant luminance color encoding efficiency for high dynamic range video applications (2017)

Non-Constant Luminance (NCL) and Constant Luminance (CL) are the two common methods for converting RGB values to luma and chroma for compression efficiency. CL coefficients have been derived from the luminous efficacy of the used gamut color primaries in the light linear domain. NCL applies the same coefficients but on non-linear inputs, which are perceptually encoded values using proper transfer function, thus leading to reduced compression efficiency and color shifts. However, since legacy cameras capture perceptually encoded values of light, it is common practice to use NCL in the existing video distribution pipelines. Although color distortion was not a serious problem with legacy Standard Dynamic Range (SDR) systems, this is not the case with High Dynamic Range (HDR) applications where color shifts become much more visible and prohibitive to delivering high quality HDR. In this thesis, we propose methods that address the inefficiencies of the conventional NCL method by optimizing NCL luma values to be as close as possible to those of CL, thus improving compression performance and color accuracy, while maintaining the current pipeline infrastructure. First, we develop a global optimization method for deriving new optimum coefficients that approximate NCL values to those of the CL approach. Then, we improve upon this approach by conducting content based optimization. This adaptive optimization method takes content pixel density into consideration and optimizes only based on these color distributions. Finally, we propose a weighted global optimization method, which separates chromaticity into three categories (Red, Green, and Blue), and assigns weights based on their contributions to luminance. Evaluations show that the proposed method improves color quality and compression efficiency over NCL.

View record

Tone mapping operator for high dynamic range video (2017)

High Dynamic Range (HDR) technology is emerging as the new revolution in digital media and has recently been adopted by industry as the new standard for capturing, transmitting and displaying video content. However, as the majority of the existing commercial displays is still only limited to the Standard Dynamic Range (SDR) technology, backward compatibility of HDR with these legacy displays is a topic of high importance. Over the years, several Tone Mapping Operators (TMOs) have been proposed to adapt HDR content, mainly images, to the SDR format. With the recent development of SDR displays, the need for video TMOs became essential. Direct application of image TMOs to HDR video content is not an efficient solution as they yield visual artifacts such as flickering, ghosting and brightness and color inconsistencies.In this thesis we propose an automated, low complexity content adaptive video TMO which delivers high quality, natural looking SDR content. The proposed method is based histogram equalization of perceptually quantized light information and smart distribution of HDR values in the limited SDR domain. Flickering introduced by the mapping process is reduced by our proposed flickering reduction method, while scene changes are detected by our approach, thus successfully maintaining the original HDR artistic intent. The low complexity of the proposed method along with the fact that it does not require any user interaction, make it a suitable candidate for real time applications, such as live broadcasting.

View record

Automatic Real-Time 2D-to-3D Video Conversion (2016)

The generation of three-dimensional (3D) videos from monoscopic two-dimensional (2D) videos has received a lot of attention in the last few years. Although the concept of 3D has existed for a long time, the research on converting from 2D-to-3D in real-time is still on going. Current conversion techniques are based on generating an estimated depth map for each frame from different depth cues, and then using Depth Image Based Rendering (DIBR) to synthesize additional views. Efficient interactive techniques have been developed in which multiple depth factors (monocular depth cues) are utilized to estimate the depth map using machine-learning algorithms. The challenge with such methods is that they cannot be used for real-time conversion. We address this problem by proposing an effective scheme that generates high quality depth maps for indoor and outdoor scenes in real-time. In our work,we classify the 2D videos into indoor or outdoor categories using machine-learning-based scene classification. Subsequently, we estimate the initial depth mapsfor each video frame using different depth cues based onthe classification results. Then, we fuse these depth maps and the final depth map is evaluated in two steps. First, depth values are estimated at edges. Then, these depth values are propagated to the rest of the image using an edge-aware interpolation method. Performance evaluations show that our method outperforms the existing state-of-the-art 2D-to3D conversion methods.

View record

A Visual Attention Model for High Dynamic Range (HDR) Video Content (2015)

High dynamic range (HDR) imaging is gaining widespread acceptance in computer graphics, photography and multimedia industry. Representing scenes with values corresponding to real-world light levels, HDR images and videos provide superior picture quality and more life-like visual experience than traditional 8-bit Low Dynamic Range (LDR) content. In this thesis, we present a few attempts to assess and improve the quality of HDR using subjective and objective approaches.We first conducted in-depth studies regarding HDR compression and HDR quality metrics. We show that High Efficiency Video Coding (HEVC) outperforms the previous version of compression standard on HDR content and could be used as a platform for HDR compression if provided with some necessary extensions. We also find that, compared to other quality metrics, the Visual Information Fidelity (VIF) quality metric has the highest correlation with subjective opinions on HDR videos. These findings contributed to the development of methods that optimize existing video compression standards for HDR applications. Next, the viewing experience of HDR content is evaluated both subjectively and objectively. The study shows a clear subjective preference for HDR content when individuals are given a choice between HDR and LDR displays. Eye tracking data were collected from individuals viewing HDR content in a free-viewing task. These eye tracking data collected are utilized in the development of a visual attention model for HDR content.Last but not least, we propose a computational approach to predict visual attention for HDR video content, the only one of its kind as all existing visual attention models are designed for HDR images. This proposed approach simulates the characteristics of the Human Visual System (HVS) and makes predictions by combining the spatial and temporal visual features. The analysis using eye tracking data affirms the effectiveness of the proposed model. Comparisons employing three well known quantitative metrics show that the proposed model substantially improves predictions of visual attention of HDR.

View record

Peceptually Based Compression of Emerging Digital Media Content (2014)

Digital video has become ubiquitous in our everyday lives; everywhere we look, there are devices that can display, capture, and transmit video. The recent advances in technology have made it possible to capture and display HD stereoscopic (3D) and High Dynamic Range (HDR) videos. However, the current broadcasting networks do not even have sufficient capacity to transmit large amounts of HD content, let alone 3D and High Dynamic Range. The limitations of the current compression technologies are the motivations behind this thesis, which proposes novel methods for further improving the efficiency of the compression techniques when used on emerging digital media formats. As a first step, we participated in the standardization efforts of High Efficiency Video Coding (HEVC), the latest video compression standard. The knowledge gained from this study became the foundation for the research that followed. We first propose a new method for encoding stereoscopic videos asymmetrically. In traditional asymmetric stereoscopic video coding, the quality of one of the views is reduced while the other view is of original quality. However, this approach is not fair for people with one dominant eye. We address this problem by reducing the quality of horizontal slices in both views. Subjective tests show that the quality, sharpness and depth of the videos encoded by our method are close to those of the original one and that the proposed method is an effective technique for stereoscopic videos coding. In this Thesis we also focus on HDR video technology and we modify the HEVC standards to better characterize HDR content. We first identify a quality metric whose performance on compressed HDR content is highly correlated with subjective results. We then propose a new Lagrangian multiplier that uses this quality metric to strike the best balance between the bit-rate and distortion of the HDR video inside the Rate-Distortion process of the encoder. The updated Lagrange multiplier is implemented on the HEVC reference software. Our experiment results show that, for the same bitrate, the subjective quality scores of the videos encoded by the HDR-accustomed encoder are higher than the ones encoded with the reference encoder.

View record

Improvements of interpolation and extrapolation view synthesis rendering for 3D and multiview displays (2013)

To display video content in 3D, traditional stereoscopic televisions require two views of the same scene filmed at a small distance from one another. Unfortunately, having the required number of views is not always possible due to the complexity of obtaining them and the required bandwidth for transmission. In cases where more advanced auto-stereoscopic televisions require more than two views, the issue of obtaining and transmitting those additional views becomes even more complex. These issues led to the idea of having a small number of real views and their corresponding depth-maps, showing the distance of each object from the viewing plane, which together can be used to generate virtual intermediate views. These virtual synthesized views are generated by moving different objects in the real views a specific amount of pixels based on their distance from the viewing plane. The need for synthesizing virtual views is more pronounced with the introduction of stereoscopic and autostreoscopic (multiview) displays to the consumer market. In this case, as it is not practical to capture all the required views for different multiview display technologies, a limited number of views are captured and the remaining views are synthesized using the available views. View synthesis is also important in converting existing 2D content to 3D, a development that is necessary in the quest for 3D content which has been deemed a vital factor for faster adoption of the 3D technology.In this thesis a new hybrid approach for synthesizing views for stereoscopic and multiview applications is presented. This approach utilizes a unique and effective hole filling method that generates high quality 3D content. First, we present a new method for view interpolation where missing-texture areas are filled with data from the other available view and a unique warping approach that stretches background objects to fill in these areas. Second, a view extrapolation method is proposed where small areas of the image are filled using nearest-neighbor interpolation and larger areas are filled with the same unique image warping approach. Subjective evaluations confirm that this approach outperforms current state-of-the-art pixel interpolation-based as well as existing warping-based techniques.

View record

 
 

If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.

 
 

Learn about our faculties, research and more than 300 programs in our Graduate Viewbook!