Date of Award:
5-2013
Document Type:
Dissertation
Degree Name:
Doctor of Philosophy (PhD)
Department:
Mathematics and Statistics
Committee Chair(s)
Adele Cutler
Committee
Adele Cutler
Committee
Donald Cooley
Committee
Christopher Corcoran
Committee
Daniel Coster
Committee
Jürgen Symanzik
Abstract
Statistical classification is widely used in many areas where there is a need to make a data-driven decision, or to classify complicated cases or objects. For instance: disease diagnostics (is a patient sick or healthy, based on the blood test results?); weather forecasting (will there be a storm tomorrow, based on today's atmospheric pressure, air temperature, and wind velocity?); speech recognition (what was said over the phone, based on the caller's voice level and articulation); spam detection (can the unsolicited commercial e-mails be identified by their content?); and so on.
Classification trees help to answer such questions by constructing a tree-like structure, where the features of the objects are analyzed consequently one at a time in a step-by-step fashion, e.g., if a patient is coughing – measure his/her temperature, if the temperature is above 100.4°F (38.0°C) – listen to the lungs, if there are crackles or rattling noises – suspect pneumonia. The classification results become more reliable if the decision is made by aggregating many trees created from randomly sampled data into a Random Forest, similarly to consulting several doctors with different training backgrounds before stating a subtle diagnosis.
In this work the tree classification algorithm was enhanced with the ability to consider the objects' features in pairs, similarly to considering a patient's body mass index (weight together with height) before diagnosing obesity; or considering a customer's debt-to-income ratio (income together with debt) before approving him/her for a loan. The trees created with the new method are called oblique, because they separate the objects with oblique lines when looking at the pairwise features plots.
Since the new method is able to focus on pairs of features, it can be used to determine which of the pairs are more useful for classification (chosen more often than others), how the features relate and interact with each other.
This work contains theoretical argumentation for the new method, as well as the detailed description of the classification algorithm, which was implemented in a computer software package (download links are provided). The properties and performance of oblique trees were investigated using numerical simulations and real data examples. Comparison with other popular classification methods was also performed.
Checksum
0549aac85e01712d9f9b808e8c0f5f18
Recommended Citation
Parfionovas, Andrejus, "Enhancement of Random Forests Using Trees with Oblique Splits" (2013). All Graduate Theses and Dissertations, Spring 1920 to Summer 2023. 1508.
https://digitalcommons.usu.edu/etd/1508
Included in
Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .