Date of Award:


Document Type:


Degree Name:

Doctor of Philosophy (PhD)


Computer Science

Committee Chair(s)

Xiaojun Qi


Xiaojun Qi


Vladimir Kulyukin


Haitao Wang


John Edwards


Ziqi Song


Visual tracking is the process of estimating states of a moving object in a dynamic frame sequence. It has been considered as one of the most paramount and challenging topics in computer vision. Although numerous tracking methods have been introduced, developing a robust algorithm that can handle different challenges still remains unsolved. In this dissertation, we introduce four different trackers and evaluate their performance in terms of tracking accuracy on challenging frame sequences. Each of these trackers aims to address the drawbacks of their peers. The first developed method is called a structured multi-task multi-view tracking (SMTMVT) method, which exploits the sparse appearance model in the particle filter frame work to track targets under different challenges. Specifically, we extract features of the target candidates from different views and sparsely represent them by a linear combination of templates of different views. Unlike the conventional sparse trackers, SMTMVT not only jointly considers the relationship between different tasks and different views but also retains the structures among different views in a robust multi-task multi-view formulation. The second developed method is called a structured group local sparse tracker (SGLST), which exploits local patches inside target candidates in the particle filter framework. Unlike the conventional local sparse trackers, the proposed optimization model in SGLST not only adopts local and spatial information of the target candidates but also attains the spatial layout structure among them by employing a group-sparsity regularization term. To solve the optimization model, we propose an efficient numerical algorithm consisting of two subproblems with closed-form solutions. The third developed tracker is called a robust structured tracker using local deep features (STLDF). This tracker exploits the deep features of local patches inside target candidates and sparsely represents them by a set of templates in the particle filter framework. The proposed STLDF utilizes a new optimization model, which employs a group-sparsity regularization term to adopt local and spatial information of the target candidates and attain the spatial layout structure among them. To solve the optimization model, we adopt the alternating direction method of multiplier (ADMM) to design a fast and parallel numerical algorithm by deriving the augmented Lagrangian of the optimization model into two closed-form solution problems: the quadratic problem and the Euclidean norm projection onto probability simplex constraints problem. The fourth developed tracker is called an appearance variation adaptation (AVA) tracker, which aligns the feature distributions of target regions over time by learning an adaptation mask in an adversarial network. The proposed adversarial network consists of a generator and a discriminator network that compete with each other over optimizing a discriminator loss in a mini-max optimization problem. Specifically, the discriminator network aims to distinguish recent target regions from earlier ones by minimizing the discriminator loss, while the generator network aims to produce an adaptation mask to maximize the discriminator loss. We incorporate a gradient reverse layer in the adversarial network to solve the aforementioned mini-max optimization in an end-to-end manner. We compare the performance of the proposed four trackers with the most recent state-of-the-art trackers by doing extensive experiments on publicly available frame sequences, including OTB50, OTB100, VOT2016, and VOT2018 tracking benchmarks.