Zhen Wang
Relevant Thesis-Based Degree Programs
Affiliations to Research Centres, Institutes & Clusters
Graduate Student Supervision
Doctoral Student Supervision
Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.
Neural networks (NNs) have reached remarkable performance in computer vision. However, numerous parameters and complex structures make NNs opaque to humans. The failure to comprehend NNs may raise serious issues in real-world applications. My research aims to explore the NN interpretability in diverse visual tasks from post-hoc explanation and intrinsic interpretability perspectives.Convolutional neural networks (CNNs) have outperformed humans in image classification. However, the logic of network decisions remains a puzzle. As such, we propose concept-harmonized hierarchical inference, a post-hoc explanation framework, to explain the decision-making process of CNNs. Firstly, we interpret layered feature representations of NNs with hierarchical visual semantics. Then we explain the NN feature learning as a bottom-up decision logic from low to high semantic levels in which a deep-layer decision is decomposed as a sequence of shallow-layer sub-decisions.With the evolution of virtual reality, researchers are focusing increasingly on inverse rendering: reconstructing a 3D scene from multi-view 2D images. In this field, NNs achieved superior performance in novel view synthesis and 3D reconstruction. For both tasks, learning a 3D representation from input views is the key process where prior methods separately designed a CNN-based single-view feature extraction and a pooling-based multi-view fusion. This incoherent design damages their intrinsic interpretability and performance. Therefore, we aim to design coherent, interpretable NNs that can adequately exploit knowledge of relationships from data. For novel view synthesis, we propose a unified Transformer-based neural radiance field (TransNeRF) conditioned on source views to learn a generic 3D-scene representation. TransNeRF explores deep relationships between the target-rendering view and source views. TransNeRF also improves intrinsic interpretability by enhancing the shape and appearance consistency of a 3D scene. In experiments, TransNeRF outperforms prior neural rendering methods, and the interpretation results are consistent with human perception.We reformulate 3D reconstruction as a sequence-to-sequence prediction and propose an end-to-end Transformer-based framework (EVolT). EVolT jointly explores multi-level associations between input views and the output volume-based 3D representation within our encoder-decoder structure. EVolT achieves state-of-the-art accuracy in multi-view reconstruction with fewer parameters (70% fewer) than prior methods. Experimental results also suggest the strong scaling capability of EVolT.
View record
Deep learning, especially through Convolutional Neural Networks (CNNs), has revolutionized image analysis. Image classification, which involves assigning labels to images based on content, has seen significant advancements due to CNNs' ability to autonomously extract image features. Real-world images, spanning natural scenes, social media, medical imaging, and aerial views, often contain multiple objects, necessitating multiple annotations for precise deep learning-based classification. However, comprehensive image annotation is challenging. The sheer volume of images makes exhaustive labeling impractical. Specialized fields like medical imaging or remote sensing demand expert knowledge for accurate annotation, making the process lengthy and expensive. Given these constraints, there's a pressing need for learning with limited annotations, where only a few labels are available per image. To tackle this, researchers are gravitating towards semi-supervised and weakly supervised methods. These techniques utilize available labels to predict missing ones, ensuring minimal performance degradation due to absent labels. This thesis delves into multi-label learning for image analysis under limited annotation, offering insights into this crucial research area.Firstly, we explore multi-label image classification with partial annotations. We introduce an innovative method where annotators label only the most prominent features they're confident about in multi-label images. This reduces potential annotation errors and speeds up the process. Our research suggests that using partial labels can be beneficial, especially in areas where full annotation is challenging or costly. Next, we tackle the challenge of learning from incomplete annotations by examining scenarios where only one positive label is annotated per image. We assess the effects of various label selection strategies and offer practical annotation guidelines. Furthermore, considering that aerial images often cover vast areas with multiple labels, but available datasets are single-labeled, we introduce a self-correction integrated domain adaptation technique. This method leverages abundant single-label images for weakly supervised learning. Lastly, we extend partial annotation learning to hand pose estimation, highlighting that more annotations don't always equate to better results. Annotation quality and the balance between image count and annotation number are pivotal factors.
View record
Modern robots are generally armed with diverse modalities of sensors, for various functionalities and safety redundancy. The recent breakthrough of deep learning (DL) technologies demonstrates superior performance on many high-level tasks, especially with using multi-sensor fusion. While the majority of multi-modal DL methods assume that sensors are well calibrated, synchronized and denoised, such efforts at the sensor layer are non-trivial and get increasingly expensive with the growing complexity of robotic systems. Currently dominant approaches heavily rely on specific hardware setup or high-end sensors, which generally are not cost-effective. This cost concern could be a bottleneck for the potential wide adoption of low-cost robots in the near future. Even though DL has a huge potential at the sensor layer, the difficulty of acquiring sufficient and accurate annotations for related tasks remain a major challenge.This thesis first formulates key problems at the robot sensor layer from the machine learning perspective, and further proposes efficient self-supervised learning approaches systematically. In our work, the popular and representative LiDAR-camera-inertial system is utilized as the study target. Firstly, the challenging LiDAR-camera online extrinsic calibration task is delved into, and we investigate the Riemannian metrics equipped self-supervised learning approach via synthetic data. This was the first work in the literature which demonstrates competing performance of data-driven methods when compared with conventional approaches at the sensor layer. It lays the foundation and shows the potential for later deeper explorations. Secondly, we address several overlooked limitations of the conventional synchronization pipelines and propose the first DL based LiDAR-camera synchronization framework, which is an innovative self-supervised learning schema. Thirdly, the problem of Inertial Measurement Unit (IMU) denoising for navigation is studied, and we propose a self-supervised multi-task framework. This work demonstrates the superiority of data-driven approaches on IMU denoising and presents one realistic self-supervised learning implementation. These explorations initialize the adoption of deep learning for robot sensor layer tasks and show case how self-supervised learning can be applied. Our work helps push the boundary of self-supervised learning at the sensor layer to an usable stage, demonstrate the potential for this direction and shed the lights for future research.
View record
This thesis focuses on the automatic analysis of radiological and pathological images for cancer detection and classification. It addresses ultrasound imaging for prostate and breast cancer, and histopathology analysis for prostate cancer, where we propose several classification approaches based on novel features and deep learning. The goal of this thesis is to develop machine learning methods to assist clinicians in diagnosis, prognosis, and treatment planning for patients.To tackle the inherent data heterogeneity in prostate cancer research, we develop a novel framework based on the generative adversarial network to discard extraneous information. For benign vs. malignant classification, it achieves area-under-the-curve of 93.4%, sensitivity of 95.1%, and specificity of 87.7%, respectively, representing significant improvements of 5.0%, 3.9%, and 6.0% compared to using heterogeneous data.We propose novel methods that improve prostate cancer classification and risk stratification using multi-stain digital histopathology. For classification, we demonstrate that: (1) other stain types (Ki67, P63) improve classification performance upon H&E; (2) even without the presence of Ki67 and P63, by mimicking the stain types, H&E stain can better report the presence and severity of prostate cancer.For risk stratification, our proposed risk stratification pipeline, integrating clinicopathologic data and learned image features from multi-stain digital histopathology, outperforms the currently most common grading system, the Gleason grading system, in predicting clinical outcomes such as metastasis-free and overall survival. Using our risk models, 3.9% of low-risk patients are reclassified as high-risk and 21.3% of high-risk patients are reclassified as low-risk. These results demonstrate our risk stratification pipeline’s potential to guide the administration of adjuvant therapy after radical prostatectomy.For breast cancer, we propose a novel automatic pipeline for data processing, feature extraction, feature selection, and classification using ultrasound data. Our best results (95% confidence interval, area-under-the-curve = 95%±1.45%, sensitivity = 95%, and specificity = 93%) outperform the state-of-the-art results using shear wave absolute vibro-elastography. Moreover, our study proposes novel directions in the field of elasticity imaging for tissue classification.All the proposed methods have been tested on held-out sets and have demonstrated promising results, which would be useful in future cancer diagnosis, prognosis, and patient management.
View record
Skin disorders are among the most prevalent human diseases, affecting a vast population and posing a significant financial burden on the global healthcare system. Although early treatment may considerably improve the cure rate of skin disorders, limited access to dermatologists and a lack of training among other healthcare workers make early detection difficult.In recent years, deep learning has produced exceptional success in the computer-aided detection of skin disorders. However, some high-performance methods are challenging to practice in complex clinical scenarios because of uneven distribution of data categories, multiple data types with dimensional differences, and a lack of support from clinical experience. Therefore, people have begun to combine efficient algorithms with clinical knowledge to better utilize skin lesion information from multiple modalities in various application scenarios rather than chasing prediction accuracy. Herein, we investigate how to combine clinical expertise with sophisticated algorithms and integrate several common imaging modalities and demographic data to make deep learning more persuasive and acceptable in clinical applications.First, we used deep learning to incorporate clinical domain knowledge. By combining a deep feature extractor with a clinically restricted classifier chain, we offered a unique multi-modal framework for skin cancer diagnosis that mimicked one of the most widely used dermoscopy algorithms, the 7-point checklist. Then, using colour photos, symptom characteristics, and demographic data, we created a multi-modal content-based image retrieval system from an existing skin image database using similarity network fusion and deep community analysis. This research addressed the dimensional discrepancies when several data sources were fused. Then, we investigated the viability of utilizing deep learning in the clinic with limited resources. We proposed a new method for skin disease classification that unified diverse knowledge into a generic knowledge distillation framework. This method significantly improved the performance of lightweight models for portable embedded devices. Finally, considering the small sample size resulting from the high collection cost, we offered a framework based on the patching technique and decision level fusion that completely exploits the features of the polarisation speckle data set with a small number of samples, completing the application of deep learning on small-scale data sets.
View record
Data-driven deep learning tasks for security related applications are gaining increasing popularity and achieving impressive performances. This thesis investigates adversarial vulnerabilities of such tasks in order to establish secure and reliable machine learning systems. Adversary attacks aim to extract private data from a model of a task and misguide the model so it yields wrong results or an answer desired by the attacker.This thesis studies potential adversarial attacks that may affect an existing deep learning model of a specific task. Novel approaches that expose security vulnerabilities of four typical deep learning models in three dominant tasks (i.e., matching, classification and regression) are developed. These models include image hashing for image authentication and retrieval, fake face imagery forensic detection, image classification and single object tracking. In the first model, image hashing converts images into codes that are supposed to be non-invertible. However, we prove that this can pose image privacy concerns, and propose two deep learning de-hashing neural networks to show that we can obtain high quality images that are inverted from given image hashes. In the second model, we address fake face image detection. Fake images that can escape an adversarial attacked detector are usually degraded versions of original images. We analyze the visual degradation in such face images, and show how to design attacks that result in visually imperceptible adversarial images. For the image classification model, instead of the conventionally employed visual distortion metric, we propose the use of perceptual models as a novel measure for adversarial example generation. We then propose two sets of attack methods that can generally be incorporated into all existing gradient-based attacks. Lastly, for the single object tracking model, we propose the concept of universally and physically feasible attacks on visual object tracking in real-world settings. We develop a novel attack framework and experimentally demonstrate the feasibility of the proposed concept. The adversarial explorations and examples provided in this thesis show how existing deep learning tasks and their models could be vulnerable to malicious attacks. This would help researchers design more secure and trustworthy models for digital media security and forensics.
View record
Conditional generative adversarial networks (cGANs) are state-of-the-art models for synthesizing images dependent on some conditions. These conditions are usually categorical variables such as class labels. cGANs with class labels as conditions are also known as class-conditional GANs. Some modern class-conditional GANs such as BigGAN can even generate photo-realistic images. The success of cGANs has been shown in various applications. However, two weaknesses of cGANs still exist. First, image generation conditional on continuous, scalar variables (termed regression labels) has never been studied. Second, low-quality fake images still appear frequently in image synthesis with state-of-the-art cGANs, especially when training data are limited. This thesis aims to resolve the above two weaknesses of cGANs and explore the applications of cGANs in improving a lightweight model with the knowledge from a heavyweight model (i.e., knowledge distillation). First, existing empirical losses and label input mechanisms of cGANs are not suitable for regression labels, making cGANs fail to synthesize images conditional on regression labels. To solve this problem, this thesis proposes the continuous conditional generative adversarial network (CcGAN), including novel empirical losses and label input mechanisms. Moreover, even the state-of-the-art cGANs may produce low-quality images, so a subsampling method to drop these images is necessary. In this thesis, we propose a density ratio based subsampling framework for unconditional GANs. Then, we introduce its extension to the conditional image synthesis setting called cDRE-F-cSP+RS, which can effectively improve the image quality of both class-conditional GANs and CcGAN. Finally, we propose a unified knowledge distillation framework called cGAN-KD suitable for both image classification and regression (with a scalar response), where the synthetic data generated from class-conditional GANs and CcGAN are used to transfer knowledge from a teacher net to a student net, and cDRE-F-cSP+RS is applied to filter out bad-quality images. Compared with existing methods, cGAN-KD has many advantages, and it achieves state-of-the-art performance in both image classification and regression tasks.
View record
Community detection is an emerging topic in modern network science. This thesis focuses on developing data-driven network generation and community extraction tools targeted for biomedical research, which is hardly studied in the literature, owing to the struggle of approaches to overcome resolution limitations without prior information. In the first part of this thesis, a novel community detection approach is proposed to detect unknown community structure from both binary and non-binary networks. The method overcomes the resolution limitation of current approaches, while no prior information regarding the community structure is available.In the later parts of the thesis, three common biomedical scenarios are identified where domain-dependent network community extraction frameworks are proposed to solve open research challenges. The first setting represents the scenario when no prior information regarding the community structure is available, and as an illustrative example, we consider the functional segmentation of brainstem from fMRI timecourses. We propose a framework to extract functional communities within the brainstem, based on a data-driven generation and clustering of the functional network, and a consensus based group model development approach. In the second scenario, at the presence of additional information, a domain-specific framework is proposed to incorporate prior information in the network community extraction pipeline. As an illustrative example we consider the parcellation of putaminal sub-regions, and propose a robust community extraction framework where a primary brain region is parcellated into functional sub-regions incorporating prior information regarding the number of communities, the connectivity differences both within and outside the primary network and a constraint on spatial contiguity. In the final setting community extraction from a large-scale dataset is considered, and a unified approach is proposed to combine network community detection to the deep-learning based framework to form deep-communities, and its potential to be used as a deep-clustering tool is illustrated on a chest X-ray based image retrieval study. We propose a framework that integrates a deep learning-based image-network generation approach and a weighted modularity based network-community detection technique to form similar image communities. A region-growing based community formation framework is then applied to extract similar images at the presence of a new image.
View record
Several machine learning tasks rely on the availability of large amounts of data. To obtain robust machine learning systems, the employment of annotated data samples is crucial. For computer vision tasks, the shortage of annotated training data has been a hindrance. To address this problem, one of the most popular approaches is deep transfer learning (DTL). DTL methods transfer Information from annotated large datasets to a scarce number of un-annotated datasets. This transfer employs deep learning to find the features of all annotated and un-annotated image datasets. The labels of un-annotated datasets are determined by finding the label of the annotated ones that share similar features.This thesis proposes different deep transfer learning models for problems with three types of image data: aerial images, satellite images, and ground-view images. Based on these image datasets, our transfer learning tasks include the transfer learning between the different types of regular images, between the different types of remote sensing images, and between the remote sensing and regular images. The underlying relationships are obtained by setting up a correlation between the deep transfer learning models corresponding to the different types of images.The proposed models address three research tasks. The first task addresses the "what to transfer” problem, i.e., finding the appropriate content for transfer. For this task, we propose an active learning incorporated deep transfer learning model which explores the relationships among different remote sensing images; The second task studies the "where to transfer” problem, and finds the correlation between the deep learning networks of the annotated and the un-annotated images. For this task, we considered regular images. The third task investigates the "how to transfer” problem for three types of images (aerial, satellite and ground-view), and involves finding the image relationships and the best deep learning neural network models for knowledge transfer. Several models, including the Dual Space structure preserving Transfer Learning model, the Xnet, and the Dual Adversarial Network (DuAN), are proposed.
View record
Deep Convolutional Neural Networks (DCNNs) have achieved superior performance in many computer vision tasks. However, it has been challenging to understand their internal mechanisms and explain how they make their predictions. The inability to understand how DCNNs reach their decisions could result in serious problems in real-life applications. The objective of this thesis is to design a framework that interprets the internal mechanisms of DCNNs, explains how they make decisions, and builds trust in their predictions.To recognize an object in an image, humans consider both the entirety and the local details of the object. We developed a method that shows that the deep layer of the network learns about the object as a whole, and the shallow layer learns the object’s fine-features. Thus, DCNNs reach their decisions in a way similar to humans.DCNNs are formed of layers. Each layer contains dozens or hundreds of channels that represent image features and are propagated to the subsequent layer. To understand the process of how a network makes decisions, we developed two methods. The first determines which channels are important for deciding the class of an object in an image. This method disables different channels in a layer to determine the influence of each channel on the network’s class-discriminative capability. While this method gives insight into the role of different channels (in a layer) in the classification prediction, the second method explains the role of every layer in this decision. It finds out the important information every layer learns about a certain class of objects. It starts at the deepest layer and finds the most important feature-maps that affects the DCNN final decision. It then decomposes this information into feature-maps belonging to its adjacent shallower layer, using sparse representation. The feature-maps with non-zero coefficients in this decomposition form the significant information in the shallower layer. This process is repeated for all layers in the network. Experiments have shown that the significant feature-maps of all layers result in interpretable patterns that offer a better explanation about the network learning (of intra-class images and inter-class images) than the complicated data in the network.
View record
Neuroimaging with magnetic resonance imaging (MRI) has made great contributionsto our understanding of neurological diseases. Among the many differentimaging techniques, myelin water imaging (MWI) appears to be particularlypromising for investigating white matter microstructure, particularly in terms ofits myelin content. MWI has shown great success in identifying and characterisingalterations of myelin content in neurological diseases but is still only available inresearch settings. In order to bring it closer to clinical practice, its utility, efficacy,and robustness need to be examined.In this work, we investigated the utility of MWI by applying it to Parkinsonsdisease (PD), a neurodegenerative disease with typically unremarkable changesin the white matter in a clinical setting. We show that MWI and data-drivenmultivariate analysis methods can predict distinct PD symptom domains.Furthermore, we have demonstrated a robust relation between myelin, cognitiveperformance and clinical characteristics in Multiple Sclerosis (MS) with adata fusion analysis that finds joint patterns of covariation among the differentmodalities.Additionally, we have devised new methods to analyse MWI images that notonly offer more information about the white matter microstructure, but also makeuse of complementary information of multimodal MRI experiments. We havedemonstrated a characteristic myelin pattern along major white matter fibre bundlesthat shows superior accuracy in classification of sex than traditional analysis.We have also shown that MWI can be linked to the topological organisation of functionalbrain networks, either on its own or in combination with other parameters characterising the white matter microstructure.Lastly, we have devised a novel method that makes use of spatiotemporalsimilarity of white matter voxels in order to denoise MWI data. This method leadsto spatially-smoother myelin maps and prove to be more robust in the presence ofnoise, ultimately leading to more accurate in vivo measurements of myelin in thebrain.In summary, we have shown the utility of MWI by applying it to neurodegenerativediseases, developed methods to leverage joint information of multimodalwhite matter imaging techniques, and proposed a novel method to denoise T2relaxation data.
View record
Inferring brain functional connectivity from functional magnetic resonance imaging (fMRI) data extends our understanding of systems-level functional organization of the brain. Functional connectivity can be assessed at the individual voxel or Regions of Interest (ROI) level, with pros and cons of each approach. This thesis focuses on addressing fundamental problems associated with ROI-based brain functional connectivity inference, including regional signal representation, brain functional connectivity modelling and brain functional connectivity analysis. Functional connectivity involving brainstem ROIs has been rarely studied. We propose a novel framework for brainstem-cortical functional connectivity modelling where the regional signal of brainstem nuclei is estimated by Partial Least Squares and connections between brainstem nuclei and other cortical/subcortical brain regions are reliably estimated by partial correlation. We then apply the proposed framework to assess functional connectivity of one particular brainstem nucleus - the pedunculopontine nucleus (PPN), which is important for ambulation, and is affected in diseases putting people at risk for falls (e.g., Parkinson’s Disease). A key issue for ROI-based brain functional connectivity assessment is how to summarize the information contained in the voxels of a given ROI. Currently, the signals from the same ROI voxels are simply averaged, neglecting any inhomogeneity in each ROI and assuming that the same voxels will interact with different ROIs in a similar manner. In this thesis, we develop a novel method of representing ROI activity and estimating brain functional connectivity that takes the regionally-specific nature of brain activity, the spatial location of concentrated activity, and activity in other ROIs into account. Finally, to facilitate the interpretation of the estimated brain functional connectivity networks, we propose the use of dynamic graph theoretical measures (e.g., the newly introduced graph spectral metric, Fiedler value) as potential MRI-related biomarkers.The proposed methods were applied to real fMRI datasets, with a primary focus on Parkinson’s disease. The proposed methods demonstrated enhanced robustness of brain functional connection estimation, with potential use in disease assessment and treatment evaluation. More broadly, this thesis suggests that brain functional connectivity offers a promising avenue for non-invasive and quantitative assessment of neurological diseases.
View record
Parkinson's disease (PD) is a progressive movement disorder characterized by degeneration of dopaminergic neurons and abnormal brain oscillations. While invasive deep brain stimulation can improve some motor deficits by disrupting pathological brain oscillations, achieving comparable results with non-invasive brain stimulation (NIBS) remains elusive. Previous studies have suggested that electrical vestibular stimulation (EVS) may ameliorate some motor symptoms in PD. However, the investigated effects are limited to a few domains, only a handful of stimulation waveforms have been explored, and neuroimaging studies capable of probing the mechanisms are greatly lacking. The overarching objective of this thesis is to utilize biomedical engineering approaches to fully explore the EVS technique as a potential therapeutic intervention for PD. This involves development of new stimuli, development of new artifact rejection methods, and thorough investigations of brain and behavioural responses, as outlined below.To achieve the objective, noisy EVS is firstly revisited and tested with PD and healthy subjects to investigate effects on visuomotor tracking behaviours. Next, novel EVS stimuli are developed using multisine signals in distinct frequency bands and tested in the experiment where the stimuli are applied to PD and healthy subjects during rest and task conditions while EEG are being recorded. This simultaneous EVS-EEG study aims to provide insights into modulatory effects of EVS on brain oscillations and motor behaviours altered in PD and whether the effects are a function of different stimulation types. One critical challenge involved with EVS-EEG studies is that EEG recordings are severely corrupted by the stimulation artifacts. To resolve this, a quadrature regression and subsequent independent vector analysis method is developed and its superior denoising performance to conventional methods is demonstrated. Finally, underlying mechanisms of EVS effects in PD are investigated in a resting-state functional MRI study.The results from this thesis suggest that sub-threshold EVS in PD induces widespread motor changes and brain activities that are stimulus-dependent, suggesting subject-specific stimuli may ultimately be desirable to achieve a clinically meaningful effect.
View record
Epilepsy is a common neurological disorder that affects over 90 million people globally — 30-40% of whom do not respond to medication. Electroencephalogram (EEG) is the prime tool that has been widely used for the diagnosis and management of epilepsy. As the visual inspection of long-term EEG is tedious, expensive, and time-consuming, research in the EEG-based methods to automatically detect and predict epileptic seizures has been very active. This thesis studies how to leverage the temporal, spectral, and spatial information in the EEG data to accurately detect and also predict seizures. To automatically detect epileptic seizures, we first introduce a computationally-efficient method that detects a seizure within a very short time of its onset. It relies on a computationally-simple feature extraction method based on LASSO regression and extracts the prominent EEG seizure-associated features in a time-efficient manner, achieving high seizure detection accuracy with very short detection latency. Subsequently, we propose two novel methods for robust detection of epileptic seizures where the main question addressed is: Can we identify the seizure pattern(s) hidden in the contaminated EEG data? We first present a novel feature learning algorithm based on L1-penalized robust regression. This algorithm extracts the most distinguishable EEG spectral features pertinent to epileptic seizures. We then propose a deep learning method that achieves a better robust performance under real-life conditions. This method uses long-short-term memory recurrent networks to exploit the temporal dependencies in the EEG data and accurately recognize the seizure patterns. Both methods are proven to maintain robust performance in the presence of common signal contaminants and ambient noise.The thesis then addresses the seizure prediction problem using intracranial EEG (iEEG) data. A novel architecture of multi-scale convolutional neural networks is proposed to learn the discriminative pre-seizure iEEG features that could potentially help predict impending seizures. Experiments on clinical data show that this method achieves high seizure prediction sensitivity and maintains reliable performance against inter- and intra-patient variations.
View record
Skin disorders are among the most common healthcare referrals in Canada, affecting a large population and imposing high healthcare costs. Early detection plays an essential role in efficient management and better outcome. However, restricted access to dermatologists and lack of education to other healthcare professionals pose a major challenge for early detection. Computer-aided systems have great potential as viable tools to identify early skin abnormalities. The initial clinical diagnosis phase of most skin disorders involves visual inspection of the lesion for specific features, associated with certain abnormalities. One of the main cutaneous features are vascular structures, which are significantly involved in pathogenesis, diagnosis, and treatment outcome of skin abnormalities. The presence and morphology of cutaneous vessels are suggestive clues for specific abnormalities. However, there has been no systematic approach for comprehensive analysis of skin vasculature. In this thesis, we propose a three-level framework to systematically detect, quantify and analyze the characteristics of superficial cutaneous blood vessels. First, we investigate the vessels at pixel-level. We propose novel techniques for detection (absence/presence) and segmentation of vascular structures in pigmented and non-pigmented lesions and evaluate the performance quantitatively. We develop a fully automatic vessel segmentation framework based on decomposing the skin into its component chromophores and accounting for shape. Furthermore, we design a deep learning framework based on stacked sparse auto-encoders for detection and localization of skin vasculature. Compared to previous studies, we achieve higher detection performance, while preserving clinical feature interpretability.Next, we analyze the vessels at lesion-level. We propose a novel set of architectural, geometrical and topological features to differentiate vascular morphologies. The defined feature set can effectively differentiate four major classes of vascular patterns.Finally, we investigate the vessels at disease-level. We analyze the relationship between vascular characteristics and disease diagnosis. We design and deploy novel features to evaluate total blood content and vascular characteristics of the lesion to differentiate cancerous lesions from benign ones. We also build a system upon integrating patient’s clinical information and lesion’s visual characteristics using deep feature learning, which achieves superior cancer classification performance compared to current techniques without the need for handcrafted high-level features.
View record
Deep learning is a data-driven technique for developing intelligent systems using a large amount of training data. Amongst the deep learning applications, this thesis focuses on problems in health informatics. Compared to the general deep learning applications, health informatics problems are complex, unique and pose problem-specific challenges. Many of these problems however, face a common challenge: the lack of labeled data. In this thesis, we explore the following three ways to overcome three specific image based health informatics problems:1) The use of image patches instead of whole images as the input for deep learning. To increase the data size, each image is partitioned into non-overlapping, mid-level patches: This approach is illustrated by addressing the food image recognition problem. Automatic food recognition could be used for nutrition analysis. We propose a novel deep framework for mid-level food image patches. Evaluations on 3 benchmark datasets demonstrate that the proposed approach achieves superior performance over baseline convolutional neural networks (CNN) methods.2) The use of prior knowledge to reduce the high dimensionality and complexity of raw data: We illustrate this idea on magnetic resonance imaging (MRI) images, for diagnosing a common mental-health disorder, the Attention Deficit Hyperactivity Disorder (ADHD). MRI has been increasingly used in analyzing ADHD with machine learning algorithms. We propose a multi-channel 3D CNN based automatic ADHD diagnosis approach using MRI scans. Evaluations on ADHD-200 Competition dataset show that the proposed approach achieves state-of-the-art accuracy.3) The use of synthetic data pre-training along with real data domain adaptation to increase the available labeled data during training: We illustrate this idea on 2-D/3-D image registration problems. We propose a fully automatic and real-time CNN-based 2-D/3-D image registration system. Evaluations on Transesophageal Echocardiography (TEE) X-ray images from clinical studies demonstrate that the proposed system outperforms existing methods in accuracy and speed. We further propose a pairwise domain adaptation module (PDA MODULE), designed to be flexible for different deep learning-based 2-D/3-D registration frameworks with improved performance. Evaluations on two clinical applications demonstrate the PDA modules advantages for 2-D/3-D medical image registration with limited data.
View record
Blind Source Separation (BSS) methods have been attracting increasing attention for their promising applications in signal processing. Despite recent progress on the research of BSS, there are still remaining challenges. Specifically, this dissertation focuses on developing novel Underdetermined Blind Source Separation (UBSS) methods that can deal with several specific challenges in real applications, including limited number of observations, self/cross dependence information and source inference in the underdetermined case. First, by taking advantage of theNoise Assisted Multivariate Empirical Mode Decomposition (NAMEMD) and Multiset Canonical Correlation Analysis (MCCA), we propose a novel BSS framework and apply it to extract the heart beat signal form noisy nano-sensor signals. Furthermore, we generalize the idea of (over)determined joint BSS to that of the underdetermined case. We explore the dependence information between two datasets and propose an underdetermined joint BSS method for two datasets, termed as UJBSS-2. In addition, by exploiting the cross correlation between each pair of datasets, we develop a novel and effective method to jointly estimate the mixing matricesfrom multiple datasets, referred to as Underdetermined Joint Blind Source Separation for Multiple Datasets (UJBSS-M). In order to improve the time efficiency and relax the sparsity constraint, we recover the latent sources based on subspace representation when the mixing matrices are estimated. As an example application for noise enhanced signal processing, the proposed UJBSS-M method also can be utilized to solve the single-set UBSS problem when suitable noise is added to the observations. Finally, considering the recent increasing need for biomedical signal processing in the ambulatory environment, we propose a novel UBSS method for removing electromyogram (EMG) from Electroencephalography (EEG) signals. The proposed method for recovering the underlying sources is also applicable to other artifact removal problems. Simulation results demonstrate that the proposed methods yield superior performances over conventional approaches. We also evaluate the proposed methods on real physiological data, and the proposed methods are shown to effectively and efficiently recover the underlying sources.
View record
Functional magnetic resonance imaging (fMRI) is one of the most popular non-invasive neuroimaging technologies, which examines human brain at relatively good spatial resolution in both normal and disease states. In addition to the investigation of local neural activity in isolated brain regions, brain connectivity estimated from fMRI has provided a system-level view of brain functions. Despite recent progress on brain connectivity inference, there are still several challenges. Specifically, this thesis focuses on developing novel brain connectivity modeling approaches that can deal with particular challenges of real biomedical applications, including group pattern extraction from a population, false discovery rate control, incorporation of prior knowledge and time-varying brain connectivity network modeling. First, we propose a multi-subject, exploratory brain connectivity modeling approach that allows incorporation of prior knowledge of connectivity and determination of the dominant brain connectivity patterns among a group of subjects. Furthermore, to integrate the genetic information at the population level, a framework for genetically-informed group brain connectivity modeling is developed. We then focus on estimating the time-varying brain connectivity networks. The temporal dynamics of brain connectivity assess the brain in the additional temporal dimension and provide a new perspective to the understanding of brain functions. In this thesis, we develop a sticky weighted time-varying model to investigate the time-dependent brain connectivity networks. As the brain must strike a balance between stability and flexibility, purely assuming that brain connectivity is static or dynamic may be unrealistic. We therefore further propose making joint inference of time-invariant connections and time-varying coupling patterns by employing a multitask learning model. The above proposed methods have been applied to real fMRI data sets, and the disease induced changes on the brain connectivity networks have been observed. The brain connectivity study is able to provide deeper insights into neurological diseases, complementing the traditional symptom-based diagnostic methods. Results reported in this thesis suggest that brain connectivity patterns may serve as potential disease biomarkers in Parkinson's Disease.
View record
Motion estimation is a key enabler for many advanced medical imaging / image analysis applications, and hence is of significant clinical interest. In this thesis, we study image registration for motion estimation in medical imaging environments, and focus on two clinically interesting problems: 1) deformable respiratory motion estimation from dynamic Magnetic Resonance Imagings (MRIs), and 2) rigid-body object motion estimation (e.g., surgical devices, implants) from fluoroscopic images. Respiratory motion is a major complicating factor in many image acquisition applications and image-guided interventions. Existing respiratory motion estimation methods typically rely on motion models learned from retrospective data, and therefore are vulnerable to unseen respiratory motion patterns. To address this limitation, we propose to use dynamic MRI acquisition protocol to monitor respiratory motion, and a scatter to volume registration method that can directly recover the dense motion fields from the dynamic MRI data without explicitly modeling the motion. The proposed method achieves significantly higher motion estimation accuracy than the state-of-the-art methods in addressing varying respiratory motion patterns. Object motion estimation from fluoroscopic images is an enabling technology for advanced image guidance applications for Image-Guided Therapy (IGT). Complex and time-critical clinical procedures typically require the motion estimation to be accurate, robust and real-time, which cannot be achieved by existing methods at the same time. We study 2-D/3-D registration for rigid-body object motion estimation to address the above challenges, and propose two new approaches to significantly improve the robustness and computational efficiency of 2-D/3-D registration. We first propose to use pre-generated canonical form Digitally Reconstructed Radiographs (DRRs) to accelerate the DRR generation during intensity-based 2-D/3-D registration, which boosts the computational efficiency by ten-fold with little degradation in registration accuracy and robustness. We further demonstrate that the widely adopted intensity-based formulation for 2-D/3-D registration is ineffective, and propose a more effective regression-based formulation, solved using Convolutional Neural Network (CNN). The proposed regression-based approach achieves significantly higher robustness, capture range and computational efficiency than state-of-the-art intensity-base approaches.
View record
Big data is an increasingly attractive concept in many fields both in academia and in industry. The increasing amount of information actually builds an illusion that we are going to have enough data to solve all the data driven problems. Unfortunately it is not true, especially for areas where machine learning methods are heavily employed, since sufficient high-quality training data doesn't necessarily come with the big data, and it is not easy or sometimes impossible to collect sufficient training samples, which most computational algorithms depend on. This thesis mainly focuses on dealing situations with limited training data in visual object recognition, by developing novel machine learning algorithms to overcome the limited training data difficulty.We investigate three issues in object recognition involving limited training data: 1. one-shot object recognition, 2. cross-domain object recognition, and 3. object recognition for images with different picture styles. For Issue 1, we propose an unsupervised feature learning algorithm by constructing a deep structure of the stacked Hierarchical Dirichlet Process (HDP) auto-encoder, in order to extract "semantic" information from unlabeled source images. For Issue 2, we propose a Domain Adaptive Input-Output Kernel Learning algorithm to reduce the domain shifts in both input and output spaces. For Issue 3, we introduce a new problem involving images with different picture styles, successfully formulate the relationship between pixel mapping functions with gradient based image descriptors, and also propose a multiple kernel based algorithm to learn an optimal combination of basis pixel mapping functions to improve the recognition accuracy. For all the proposed algorithms, experimental results on publicly available data sets demonstrate the performance improvements over previous state-of-arts.
View record
Backscatter RFID systems are the most popular RFID systems deployed due to low cost and low complexity. However, they pose many design challenges due to their querying-fading-signaling-fading structure, which experiences deeper fading than conventional one-way channels. Recently, by simulations and measurements, researchers found that the MIMO setting can improve the performance of backscatter RFID systems. These simulations and measurements were based on simple signaling schemes and no rigorous mathematical analysis has been provided. In this thesis, we explore querying, STC, and diversity combining schemes over the three ends of the backscatter RFID systems and provide generalized performance analysis and design criteria.At the tag end, we show that the identical signaling scheme, which cannot improve the BER performance in conventional one-way channels, can significantly improve the BER performance of backscatter RFID. We also analytically study the performances of orthogonal STCs, with different sub-channel fading assumptions, and show that the diversity order depends only on the number of tag antennas. More interestingly, we show that the performance is more sensitive to the channel condition of the forward link than that of the backscattering link.In previous literature, the understanding of the query end is that the designs of query signals have no potential to improve the system performance. However, we show that some well-designed query signals can improve the system performance significantly. We propose a novel unitary query method in this thesis. Conventional measures of the physical layer performance cannot be obtained analytically in backscatter RFID channels with employing our unitary query. We thus provide a new performance measure to overcome the difficulty of conventional measures, and show that why the unitary query has superior performance.The multi-keyhole channel is another type of cascaded channel. The backscatter RFID channel and the multi-keyhole channels look similar, but are essentially different and there difference has not been clearly studied in previous literature. In the final part of this thesis, by investigating general STCs and revealing a few interesting properties of this channel in the MISO case, we show that the two channels achieves completely different diversity order and BER performance.
View record
Corticomuscular coupling analysis using multiple data sets such as electroencepha-logram (EEG) and electromyogram (EMG) signals provides a useful tool for understanding human motor control systems. A popular conventional method to assess corticomuscular coupling is the pair-wise magnitude-squared coherence (MSC). However, there are certain limitations associated with MSC, including the difficulty in robustly assessing group inference, only dealing with two types of data sets simultaneously and the biologically implausible assumption of pair-wise interactions.In this thesis, we propose several novel signal processing techniques to overcome the disadvantages of current coupling analysis methods. We propose combining partial least squares (PLS) and canonical correlation analysis (CCA) to take advantage of both techniques to ensure that the extracted components are maximally correlated across two data sets and meanwhile can well explain the information within each data set. Furthermore, we propose jointly incorporating response-relevance and statistical independence into a multi-objective optimization function, meaningfully combining the goals of independent component analysis (ICA) and PLS under the same mathematical umbrella.In addition, we extend the coupling analysis to multiple data sets by proposing a joint multimodal group analysis framework. Finally, to acquire independent components but not just uncorrelated ones, we improve the multimodal framework by exploiting the complementary property of multiset canonical correlation analysis (M-CCA) and joint ICA. Simulations show that our proposed methods can achieve superior performances than conventional approaches. We also apply the proposed methods to concurrent EEG, EMG and behavior data collected in a Parkinson's disease (PD) study. The results reveal highly correlated temporal patterns among the multimodal signals and corresponding spatial activation patterns. In addition to the expected motor areas, the corresponding spatial activation patterns demonstrate enhanced occipital connectivity in PD subjects, consistent with previous medical findings.
View record
The massive production and easy use of digital media pose new challenges on protecting intellectual property of digital media. Digital data hiding, which can be defined as the procedure of embedding information into an original media host signal, is a promising technique for digital intellectual property protection. A data hiding system generally contains two major components: the encoder for embedding the hidden information and the decoder for extracting the hidden information.This thesis focuses on spread spectrum (SS) watermarking schemes for data hiding. Watermarking techniques for data hiding can be broadly categorized into two classes: quantization index modulation (QIM) based and spread spectrum based approaches. Being robust against distortions and having simple decoder structure make SS attractive for data hiding.First, we investigate the decoding performance of the traditional SS schemes in the DCT and DFT domains. To obtain more practical decoders, we propose using suboptimal decoders which do not need side information.Secondly, since the interference effect of the host signal causes decoding performance degradation in the additive SS scheme, to remove this host effect efficiently, we propose the correlation-and-bit-aware concept for data hiding by exploiting the side information at the encoder side and propose two improved SS-based schemes, the correlation-aware SS (CASS) and the correlation-aware improved SS (CAISS) embedding schemes.Thirdly, we analyze the decoding error probability and capacity of the multiplicative spread spectrum (MSS) embedding scheme, and show that the content-based MSS still suffers from the interference effect of the host signal. We then propose an improved MSS-based scheme by efficiently removing the host interference effect.Lastly, we present the security analysis of the SS-based data hiding schemes under the Known Message Attack (KMA) scenario. Each data hiding scheme has some secret parameters and here the security of a data hiding scheme represents the difficulty of estimating the secret parameters. We employ the mutual information between the observations and the secret parameters as a security measure. Also some practical estimators for estimating the signature code are introduced and their performances are reported to illustrate the security results.
View record
High-dimensional datasets, where the number of measured variables is larger than the sample size, are not uncommon in modern real-world applications such as functional Magnetic Resonance Imaging (fMRI) data. Conventional statistical signal processing tools and mathematical models could fail at handling those datasets. Therefore, developing statistically valid models and computationally efficient algorithms for high-dimensional situations are of great importance in tackling practical and scientific problems. This thesis mainly focuses on the following two issues: (1) recovery of sparse regression coefficients in linear systems; (2) estimation of high-dimensional covariance matrix and its inverse matrix, both subject to additional random noise.In the first part, we focus on the Lasso-type sparse linear regression. We propose two improved versions of the Lasso estimator when the signal-to-noise ratio is low: (i) to leverage adaptive robust loss functions; (ii) to adopt a fully Bayesian modeling framework. In solution (i), we propose a robust Lasso with convex combined loss function and study its asymptotic behaviors. We further extend the asymptotic analysis to the Huberized Lasso, which is shown to be consistent even if the noise distribution is Cauchy. In solution (ii), we propose a fully Bayesian Lasso by unifying discrete prior on model size and continuous prior on regression coefficients in a single modeling framework. Since the proposed Bayesian Lasso has variable model sizes, we propose a reversible-jump MCMC algorithm to obtain its numeric estimates.In the second part, we focus on the estimation of large covariance and precision matrices. In high-dimensional situations, the sample covariance is an inconsistent estimator. To address this concern, regularized estimation is needed. For the covariance matrix estimation, we propose a shrinkage-to-tapering estimator and show that it has attractive theoretic properties for estimating general and large covariance matrices. For the precision matrix estimation, we propose a computationally efficient algorithm that is based on the thresholding operator and Neumann series expansion. We prove that, the proposed estimator is consistent in several senses under the spectral norm. Moreover, we show that the proposed estimator is minimax in a class of precision matrices that are approximately inversely closed.
View record
Image hashing has been a popular alternative of digital watermarking for copyright protection and content authentication of digital images, due to its two critical properties -- robustness and security. Also, its uniqueness and compactness make image hashing attractive for efficient image indexing and retrieval applications. In this thesis, novel image hashing algorithms are proposed to improve the robustness of digital image hashing against various perceptually insignificant manipulations and distortions on image content. Furthermore, image hashing concept is extended to the content-based fingerprinting concept, which combines various hashing schemes efficiently to achieve superior robustness and higher identification accuracy.The first contribution of the thesis is the novel FJLT image hashing, which applies a recently proposed low-distortion dimension reduction technique, referred as Fast Johnson-Lindenstrauss Transform (FJLT), into image hashing generation. FJLT shares the low distortion characteristics of random projections, but requires less computational cost, which are desirable properties to generate robust and secure image hashes. The Fourier-Mellin transform can also be incorporated into FJLT hashing to improve its performances under rotation attacks. Further, the content-based fingerprinting concept is proposed, which combines different FJLT-based hashes to achieve better overall robustness and identification capability.The second contribution of the thesis is the novel shape contexts based image hashing (SCH) using robust local feature points. The robust SIFT-Harris detector is proposed to select the most stable feature points under various content-preserving distortions, and compact and robust image hashes are generated by embedding the detected feature points into the shape contexts based descriptors. The proposed SCH approach yields better identification performances under geometric attacks and brightness changes, and provides comparable performances under classical distortions.The third contribution of this thesis addresses an important issue of compressing the real-valued image hashes into robust short binary image hashes. By exploring prior information from the virtual prior attacked hash space (VPAHS), the proposed semi-supervised spectral embedding approach could compress real-valued hashes into compact binary signatures, while the robustness against different attacks and distortions are preserved. Moreover, the proposed SSE framework could be easily generalized to combine different types of image hashes to generate a robust, fixed-length binary signature.
View record
The concept of brain connectivity provides a new perspective to the understanding of the mechanism underlying brain functions, complementing the traditional approach of analyzing neural activity of isolated regions. Among the existing connectivity analysis techniques, multivariate autoregressive (mAR)-based measures are of great interest for their ability to characterize both directionality and spectral property of cortical interactions. Yet, the direct estimation of mAR-based connectivity from scalp electroencephalogram (EEG) is confounded by volume conduction, statistical instability and inter-subject variability. In this thesis, we propose novel signal processing methods to enhance the existing mAR-based connectivity methods. First, we explore incorporating sparsity constraints into the mAR formulation at both subject level and group level using LASSO-based regression. We show by simulation that sparse mAR yields more stable and accurate connectivity estimates compared to the traditional, non-sparse approach. Furthermore, the group-wise sparsity simplifies the inference of group-level connectivity patterns from multi-subject data. To mitigate the effect of volume conduction, we investigate source-level connectivity and propose a state-space generalized mAR framework to jointly model the mixing effect of volume conduction and causal relationships between underlying neural sources. By jointly estimating the mixing process and mAR model parameters, the proposed technique demonstrates improved connectivity estimation performance. Finally, we expanded our connectivity analysis to cortico-muscular level by modeling the relationships between EEG and simultaneously recorded electromyography (EMG) data using a multiblock partial least square (mbPLS) framework. The hierarchical construction of the mbPLS framework provides a natural way to model multi-subject, multi-modal data, enabling the identification of maximally covarying common patterns from EEG and EMG across subjects. Applications of the proposed techniques to EEG and EMG data of healthy and Parkinson's disease (PD) subjects demonstrate that directional connectivity analysis is a more sensitive technique than traditional univariate spectral analysis in revealing complex effects of motor tasks and disease. Moreover, alternations in connectivity accurately predict disease severity in PD. These new analysis tools allow a better understanding of brain function and provide a basis for developing objective measures to assess progression of neurological diseases.
View record
Studying interactions between different brain regions or neural components is crucial in understanding neurological disorders. Dynamic Bayesian networks, a type of statistical graphical model, have been suggested as a promising tool to model neural communication systems. This thesis investigates the employment of dynamic Bayesian networks for analyzing neural connectivity, especially with focus on three topics: structural feature extraction, group analysis, and error control in learning network structures.Extracting interpretable features from experimental data is important for clinical diagnosis and improving experiment design. A framework is designed for discovering structural differences, such as the pattern of sub-networks, between two groups of Bayesian networks. The framework consists of three components: Bayesian network modeling, statistical structure-comparison, and structure-based classification. In a study on stroke using surface electromyography, this method detected several coordination patterns among muscles that could effectively differentiate patients from healthy people.Group analyses are widely conducted in neurological research. However for dynamic Bayesian networks, the performances of different group-analysis methods had not been systematically investigated. To provide guidance on selecting group-analysis methods, three popular methods, i.e. the virtual-typical-subject, the common-structure and the individual-structure methods, were compared in a study on Parkinson's disease, from the aspects of their statistical goodness-of-fit to the data, and more importantly, their sensitivity in detecting the effect of medication. The three methods led to considerably different group-level results, and the individual-structure approach was more sensitive to the normalizing effect of medication.Controlling errors is a fundamental problem in applying dynamic Bayesian networks to discovering neural connectivity. An algorithm is developed for this purpose, particularly for controlling the false discovery rate (FDR). It is proved that the algorithm is able to curb the FDR under user-specified levels (for example, conventionally 5%) at the limit of large sample size, and meanwhile recover all the true connections with probability one. Several extensions are also developed, including a heuristic modification for moderate sample sizes, an adaption to prior knowledge, and a combination with Bayesian inference.
View record
Master's Student Supervision
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
Modern Autonomous Vehicles (AVs) rely on sensory data often acquired by cameras and LiDARs to perceive the world around them. To operate the vehicle safely and effectively, Artificial Intelligence (AI) is used to process this data to detect objects of interest around the vehicle. For 3D object detection, recent advances in deep learning have resulted in the development of state-of-the-art multi-modal models which are built using Deep Neural Nets (DNNs) to process camera images, and LiDAR point clouds. While DNN-based models are very powerful and accurate they may be vulnerable to adversarial attacks which introduce a small change to a model’s input and can result in great errors in its output. These attacks have been heavily investigated for models that operate on camera image input only, and recently for point cloud processing models, however they have rarely been investigated in models that utilize both modalities as is often the case in modern AVs. To address this gap we propose a realistic adversarial attack on such multi-modal 3D detection models. We place a 3D adversarial object on a vehicle with the aim of hiding this object’s host vehicle from detection by powerful multi-modal 3D detectors. This object’s shape and texture are trained so that it can be used to prevent a specific model from detecting any host vehicle in any scene. 3D detection models are often based on either a cascaded architecture where each input modality is processed consecutively, or a fusion architecture where multi-input features are extracted and fused simultaneously. We use our attack to study the vulnerabilities of representative models of these architectures to realistic adversarial attacks and to understand the effects of multi-modal learning on the robustness of a model. Our experiments show that a single adversarial object is capable of hiding its host vehicle 55.6% and 63.19% of the times from the cascaded model and from the fusion model respectively. This vulnerability was found to be mainly due to RGB image features which were much less robust to adversarial scene changes compared to the point cloud features.
View record
Crowd analysis has been widely used in everyday life. Among different crowd analysis tasks, crowd counting is the most basic but essential one that measures the number of people in a particular area. Such counting and crowd density information is crucial to determine the maximum occupancy of a room or public area to address safety concerns. Counting by hand can yield an accurate number, but it is a tedious job and may not fulfill the time requirement for analysis. Therefore, automated crowd counting for accurately estimating the number of people in crowded scenes is needed. Deep learning has shown beyond human-level accuracy in many computer vision tasks. Recently, researchers have shown that deep learning methods can achieve the state-of-the-art performance in the task of crowd counting in images. The majority of these methods are based on density regression, where a density map is predicted and its integral is further calculated to obtain the final count. However, this learned density map is uninterpretable and could deviate largely from the actual person distribution even if the final count is accurate. Besides, existing crowd counting methods are mainly based on cumbersome feature extraction networks and can not be deployed in edge devices with limited computational power. In this thesis work, we proposed two models that tackle the above interpretability and computational efficiency concerns respectively: the joint crowd counting and localization model extends traditional counting only methods and provides precise localization results without additional model complexity; the ShuffleCount is a computationally efficient and accurate model that is trained through the specially designed knowledge distillation process and can satisfy the real-time crowd analysis requirement on edge devices. We evaluated both models by training and testing on public crowd counting benchmark datasets. Both quantitative and qualitative results are obtained and compared with existing state-of-the-art methods. Our superior results show the potential of deploying the proposed methods to real-life applications for efficient crowd analysis. The methods proposed in both JCCL and ShuffleCount can be generalized to other ideas to improve the interpretability and computational efficiency of the models.
View record
Major depressive disorder (MDD) is a highly prevalent psychiatric disorder that affects millions of people. Repetitive transcranial magnetic stimulation (rTMS) has been recommended as a safe, reliable, non-invasive, neurostimulation therapy option for treatment-resistant depression (TRD). The effectiveness of rTMS treatment varies among individuals; thus, predicting the responsiveness to rTMS treatment can reduce unnecessary expenses and improve treatment capacity. In this study, we combined machine learning models with depression rating scales, clinical variables, and demographic data to predict the outcomes and effectiveness of rTMS treatment for TRD patients. Using the clinical data of 356 TRD patients who each received 20 to 30 sessions of rTMS treatment over a 4-6-week period, we examined the predictive value of different depression rating scales and models for various prediction outcomes, at multiple time points. Our optimal baseline models achieved area under the curve (AUC) values of 0.634 and 0.735 for treatment response and remission prediction, respectively, using the Elastic Net. In the longitudinal analysis, using baseline data and early treatment outcomes for 1–3 weeks, all predictive values improved compared with baseline models. In addition, predicting the percentage of symptom improvement was also feasible using longitudinal treatment outcomes, achieving coefficients of determination of 0.277, at the end of week 1, and 0.464, at the end of week 3. We found that the use of depression rating subscales, combined with clinical and demographic data, including anxiety severity, employment status, age, gender, and education level, may produce higher accuracy at baseline. In the longitudinal analysis, the total scores of depression rating scales were the most significant predictors, allowing prediction models to be built using only the total scores, which resulted in high predictive value and interpretability. This work presented a convenient and economical approach for the prediction of rTMS treatment outcomes in TRD patients, using pre-treatment clinical and demographic data alone, without requiring expensive biomarker data. The predictive value was further enhanced by adding longitudinal treatment outcomes. This method is a plausible approach that could be utilized in clinical practice for individualized treatment selection, leading to better treatment outcomes for rTMS in TRD patients.
View record
There is an increasing recognition that different preprocessing approaches for functional magnetic resonance imaging (fMRI) data may have a profound impact on downstream analyses. A critical element of standard preprocessing of fMRI data is motion correction, as head motion during fMRI scanning can induce changes in blood oxygenation level dependent (BOLD) signals that may confound estimations of brain activity, particularly connectivity estimates between brain regions. In this thesis, we propose an approach which explicitly decouples the changes due to brain activation and changes due to motion. Independent Vector Analysis (IVA) is used to determine the optimal combination of basis volumes to match reference volumes. Such an approach, which we call Motion correction with IVA (McIVA), is amenable to wide-scale parallelization. The mutual information between the first volume and all subsequent volumes declined more slowly after preprocessing with McIVA compared to the raw data and other motion correction schemes. Remarkably, since the final motion-correction volume in McIVA is based on a combination of images, we show that McIVA's error can be actually less than interpolation error. What is more, McIVA resulted in the lowest connectivity across a range of spatial distances between regions of interest (ROI) pairs. When a volume is severely corrupted, the IVA fails to converge, providing a principled way to determine if a given time point is too corrupted for recovery. Finally, we assessed the effects of McIVA on two popular denoising methods, aCompCor and ICA-AROMA on resting-state fMRI derived. McIVA resulted in reducing inflated connectivity estimates, while still retaining an adequate degree of freedom. Though the proposed method is based on resting-state fMRI data, it could be applied to task-related fMRI data as well. We conclude that the proposed approach is superior for removing motion-related artifacts and reducing biases in functional connectivity estimates induced by head movement.
View record
Major Depressive Disorder (MDD) is a severe medical condition that affects thousands of people every year. Therapy in MDD includes medication and psychotherapy, and is prescribed on the basis of the type and severity of depressive episodes. Treatment-resistance is common among MDD patients. Repetitive Transcranial Magnetic Stimulation (rTMS) is a form of deep brain stimulation used for relieving depressive symptoms. Due to its high cost and lengthy procedure, it’s reserved for patients showing treatment-resistance to at least 2 trials of antidepressants. Of all MDD patients, only 50% show response to rTMS, which leads to unnecessary patient frustration and additional costs. Prediction of resistance to rTMS treatment can thus help physicians decide on the best treatment course for each patient. This thesis presents a machine-learning based clinical assistive tool that predicts the probability of a patient to respond to rTMS treatment and if so, predict the probability whether they are likely to achieve remission. The most relevant clinical and sociodemographic variables associated with predicting treatment outcomes were selected on the basis of importance scores ranked using a Random Forest (RF) algorithm, and an elaborative analysis of their significance was presented. The most important variables were fed into a Deep Artificial Neural Network (DANN) for classification of patients who will respond to rTMS treatment. Two DANN variants were designed, trained, optimized and tested to predict each of rTMS treatment response and remission outcomes. Our model is based on the pre-treatment clinical and sociodemographic data which had been collected from 414 patients diagnosed with treatment-resistant MDD. Results show that our DANN model outperforms existing clinical procedures and yields an accuracy of 84.4% in predicting remission and 73.8% in distinguishing responders form non-responders. Additionally, a thorough evaluation and comparison with other methods that have used machine learning algorithms to predict rTMS treatment outcome was carried and discussed in detail. Findings in this thesis signify the potential of individual-based assessments that can improve rTMS treatment procedure.
View record
Translucency, defined as a jelly-like appearance, is a common clinical feature of basal cell carcinoma (BCC), the most common skin cancer. This feature plays an important role in diagnosing basal cell carcinoma at an early stage because the translucency can be observed readily in clinical examinations with a high specificity. Therefore, translucency detection is a critical component of computer aided systems which aim at early detection of basal cell carcinoma. In this thesis, we proposed two deep learning methods to automatically detect translucency. First, we develop a convolutional neural network based framework to detect translucency of basal cell carcinoma. Furthermore, a sparse auto-encoder based framework is proposed for translucency detection on BCC images. Since currently two types of skin images are mainly used for diagnosis of basal cell carcinoma by doctors, which are dermoscopy images and clinical images, we evaluate two proposed methods on both types of skin images. Our results showed that the two proposed methods yield similar detection performances. For detecting translucency in dermoscopy images, both proposed methods achieve comparable accuracy results, though the accuracy is not as good as we expected. For detecting translucency in clinical images, both methods achieve good performances. Compared the performances in both types of images, the proposed deep learning based methods seems more suitable for translucency detection in clinical images than in dermoscopy images.
View record
Parcellation of brain imaging data is desired for proper neurological interpretation in resting-state functional magnetic resonance imaging (rs-fMRI) data. Some methods require specifying a number of parcels and using model selection to determine the number of parcels with rs-fMRI data. However, this generalization does not fit with all subjects in a given dataset. A method has been proposed using parametric formulas for the distribution of modularity in random networks to determine the statistical significance between parcels. In this thesis, we propose an agglomerative clustering algorithm using parametric formulas for the distribution of modularity in random networks, coupled with a false discovery rate (FDR) controller to parcellate rsfMRI data. The proposed method controls the FDR to reduce the number of false positives and incorporates spatial information to ensure the regions are spatially contiguous. Simulations demonstrate that our proposed FDRcontrolled agglomerative clustering algorithm yields more accurate results when compared with existing methods. We applied our proposed method to a rs-fMRI dataset and found that it obtained higher reproducibility compared to the Ward hierarchical clustering method. Lastly, we compared the normalized total connectivity degree of each region within the motor network between normal subjects and Parkinson’s disease (PD) subjects using sub-regions defined by our proposed method and the entire region. We found that PD subjects without medication had a significant increase in functional connectivity compared to normal subjects in the right primary motor cortex using our sub-regions within the right primary motor cortex, whereas this significant increase was not found using the entire right primary motor cortex. These sub-regions are of great interest in studying the differences in functional connectivity between different neurological diseases, which can be used as biomarkers and may provide insight in severity of the disease.
View record
Functional magnetic resonance imaging (fMRI) has shown great potential in studying the underlying neural systems. Functional connectivity measured by fMRI provides an efficient approach to study the interactions and relationships between different brain regions. However, functional connectivity studies require accurate definition of brain regions, which is often difficult and may not be achieved through anatomical landmarks. In this thesis, we present a novel framework for parcellation of a brain region into functional subunits based on their connectivity patterns with other reference brain regions. The proposed method takes the prior neurological information into consideration and aims at finding spatially continuous and functionally consistent sub-regions in a given brain region. The proposed framework relies on a sparse spatially regularized fused lasso regression model for feature extraction. The usual lasso model is a linear regression model commonly applied in high dimensional data such as fMRI signals. Compared with lasso, the proposed model further considers the spatial order of each voxel and thus encourages spatially and functionally adjacent voxels to share similar regression coefficients despite of the possible spatial noise. In order to achieve the accurate parcellation results, we propose a process by iteratively merging voxels (groups) and tuning the parameters adaptively. In addition, a Graph-Cut optimization algorithm is adopted for assigning the overlapped voxels into separate sub-regions. With spatial information incorporated, spatially continuous and functionally consistent subunits can be obtained which are desired for subsequent brain connectivity analysis. The simulation results demonstrate that the proposed method could reliably yield spatially continuous and functionally consistent subunits. When applied to real resting state fMRI datasets, two consistent functional subunits could be obtained in the putamen region for all normal subjects. Comparisons between the results of the Parkinson’s disease group and the normal group suggest that the obtained results are in accordance with our medical assumption. The extracted functional subunits themselves are of great interest in studying the influence of aging and a certain disease, and they may provide us deeper insights and serve as a biomarker in our future Parkinson’s disease study.
View record
Non-contact measurements of human cardiopulmonary physiological parameters based on photoplethysmography (PPG) can lead to efficient and comfortable medical assessment. It was shown that human facial blood volume variation during cardiac cycle can be indirectly captured by regular Red-Green-Blue (RGB) cameras. However, few attempts have been made to incorporate data from different facial sub-regions to improve remote measurement performance. In this thesis, we propose a novel framework for non-contact video-based human heart rate (HR) measurement by exploring correlations among facial sub-regions via joint blind source separation (J-BSS). In an experiment involving video data collected from 16 subjects, we compare the non-contact HR measurement results obtained from a commercial digital camera to results from a Health Canada and Food and Drug Administration (FDA) licensed contact blood volume pulse (BVP) sensor. We further test our framework on a large public database, which provides subjects' left-thumb plethysmograph signal as ground truth. Experimental results show that the proposed framework outperforms the state-of-the-art independent component analysis (ICA)-based methodologies. Driver physiological monitoring in vehicle is of great importance to provide a comfortable driving environment and prevent road accidents. Contact sensors can be placed on the driver's body to measure various physiological parameters. However such sensors may cause discomfort or distraction. The development of non-contact techniques can provide a promising solution. In this thesis, we employ our proposed non-contact video-based HR measurement framework to monitor the drivers heart rate and do heart rate variability analysis using a simple consumer-level webcam. Experiments of real-world road driving demonstrate that the proposed non-contact framework is promising even with the presence of unstable illumination variation and head movement.
View record
Neural recording technologies such as functional magnetic resonance imaging (fMRI) and surface electroencephalography (sEMG) provide great potential to studying the underlying neural systems and the related diseases. A broad range of statistical methods have been developed to model interactions between neural components. In this thesis, a false discovery rate (FDR)-controlled exploratory group modeling approach is introduced to model interaction/cooperation between neural components.Group network modeling for comparison between populations is of great common interest in biomedical signal processing, particularly when there might be considerable heterogeneity within one or more groups, such as disease populations.A group-level network modeling process, the group PCfdr algorithm with taking into account inter-subject variances, is proposed. The group PCfdr algorithm combines group inference with a graphical modeling approach for discovering statistically significant structure connectivity. Simulation results demonstrate that the group PCfdr algorithm can accurately recover the underlying group network structures and robustly control the FDR at user-specified levels.To further extract informative features and compare the connectivity patterns across groups at the network level, network analysis methods including graph theoretical analysis, lesion and perturbation analysis are applied to examine the inferred networks. It can provide great potential to investigate the connectivity patterns as well as the particular changes associated with certain disease states.The proposed network modeling and analysis approach is applied to fMRI data collected from control and Parkinson's Disease (PD) groups. The network analysis results of the PD groups before and after L-dopa medication support the hypothesis that PD subjects could be ameliorated by the medication. In addition, based on the comparison between PD subtypes, we observe that the learned brain effective networks across PD subtypes display different connectivity patterns.In another sEMG study in low back pain, significant findings of muscle coordination networks are found to be associated with low back pain. The results indicate that the networks representing the normal group clearly exhibit globally symmetrical patterns between the left and right sEMG channels, while the connections between sEMG channels for the patient group are more likely to cluster locally and the learned group networks show the loss of global symmetrical patterns.
View record
If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.