Date of Award:

5-2017

Document Type:

Dissertation

Degree Name:

Doctor of Philosophy (PhD)

Department:

Computer Science

Committee Chair(s)

Xiaojun Qi

Committee

Xiaojun Qi

Committee

Haitao Wang

Committee

Vicki Allan

Committee

Stephen Clyde

Committee

Adele Cutler

Abstract

As the growth of mobile devices and social networks has been faster than ever, online image and video content has become truly ubiquitous today. Understanding of these images and videos, called vision, is one of the most primary ways for human being to perceive the world. Computer vision, which refers to the study of enabling machines to see and understand the visual world, is fundamental in advancing Artificial Intelligence.

Object recognition, which is defined as the task of locating and recognizing object categories in images and videos, is a major research field in computer vision. Recent research in object recognition has achieved some significant improvement utilizing larger labeled data (e.g., ImageNet) and deep architecture of neural network algorithms (e.g., Convolution Neutral Network, Restricted Boltzmann Machine, etc.). However, object recognition research using deep architectures has been mainly focused on images. Little has been done in videos, one of the fastest growing types of multimedia content. Video understanding, especially large-scale object detection in video, has applications in brand awareness, autonomous cars, augmented reality, etc.

The research presented in this dissertation proposes and demonstrates a novel system that automatically recognizes objects in videos by incorporating tracking, object detection and classification using deep neural networks. By utilizing temporal and spatial information, the proposed approach achieved the better object recognition performance than the prior state-of-the-art methods in terms of average precision.

Checksum

cf1e56e2aee313e6a4bcdeaf727d6a5e

Share

COinS