Date of Award:
5-2011
Document Type:
Dissertation
Degree Name:
Doctor of Philosophy (PhD)
Department:
Computer Science
Committee Chair(s)
Xiaojun Qi
Committee
Xiaojun Qi
Committee
Changhui Yan
Committee
Minghui Jiang
Committee
Adele Cutler
Committee
Vicki Allan
Abstract
Nowadays, machine learning techniques are widely used for extracting knowledge from data in a large number of bioinformatics problems. It turns out that in many of such problems, data observations can be naturally represented by discrete structures such as graphs, networks, trees, or sequences. For example, a protein can be seen as a cloud of interconnected atoms lying on a 3-dimensional space. The focus of this dissertation is on the development and application of machine learning techniques to bioinformatics problems wherein the data can be represented by graphs. In particular, we focus our attention on proteins, which are essential elements in the life process. The study of their underlying structure and function is one of the most important subjects in bioinformatics. As proteins can be naturally represented by graphs, we consider the use of kernel functions that can directly deal with data observations in the form of graphs. Kernel functions are the basic building block for a powerful family of machine learning algorithms called kernel methods.
Concretely, we propose a novel approach for predicting the function of proteins. We model proteins as graphs, and we predict function using support vector machines and graph kernels. We evaluate our approach under two types of function prediction, the discrimination of proteins as enzymes or not, and the recognition of DNA binding proteins. In both cases, the resulting performance is higher than existing methods.
In addition, given the establishment of ontologies as a popular topic in biomedical research, we propose two novel semantic similarity measures between pairs of proteins. First, we introduce a novel semantic similarity method between pairs of gene ontology terms. Second, we propose an instance of the shortest path graph kernel for calculating the semantic similarity between proteins. This latter approach, when compared with state-of-the-art methods, yields an improved performance.
Checksum
28b98aef641f0a7a10501bb2e07ef547
Recommended Citation
Alvarez Vega, Marco, "Graph Kernels and Applications in Bioinformatics" (2011). All Graduate Theses and Dissertations, Spring 1920 to Summer 2023. 1185.
https://digitalcommons.usu.edu/etd/1185
Included in
Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .
Comments
This work made publicly available electronically on April 12, 2012.