Probabilistic Wrapper Approach to Predictor Subset Selection in Local Learning Algorithm
Location
Space Dynamics Laboratory
Event Website
http://water.usu.edu/
Start Date
3-26-2004 2:15 PM
End Date
3-26-2004 2:30 PM
Description
Local learning algorithms are plagued with the curse of dimensionality. Locality is introduced based on the definition of “similarity”, which is distance based. With respect to model output, the presence of marginally relevant or irrelevant input dimensions significantly influences the performance of various local learning algorithms. Furthermore, a high dimensionality of inputs also influence an algorithm’s generalization ability. A theoretically sound wrapper approach to dimensionality reduction (feature subset selection) is therefore introduced and applied to a water resource management problem involving the prediction of daily canal diversions. The general argument in favor of wrapper methods is that using the basic induction algorithm (learning algorithm) in feature subset selection incorporates the induction bias. However, the major disadvantage of wrapper methods is that it is time consuming as the evaluation of various possible subsets of features in wrapper methods is generally done by cross validation. Incorporating locally weighted cross validation error (to retain the local nature of feature subset selection) as an evaluation measure, a fast method for cross-validation is introduced (“Local Schemata”). This evaluation measure is “estimated” in the sense that it measures the “utility” (or usefulness) over finite sample data. Since the environment is always uncertain, there is always a discrepancy between the “estimated” utility and the actual (“true”) utility. Moreover since the estimation of utilities is based on finite number of samples, one would expect any algorithm to return only approximately correct feature subset. Thus “Local Schemata” is modified further such that a feature subset is selected that is epsilon close to the true optimal (underlying but unknown) with some probability of error. This epsilon-optimal feature subset could be visualized as one of the epsilon-equivalent best feature subsets and the probability of error provides an upper bound on the error that we make in concluding that.
Probabilistic Wrapper Approach to Predictor Subset Selection in Local Learning Algorithm
Space Dynamics Laboratory
Local learning algorithms are plagued with the curse of dimensionality. Locality is introduced based on the definition of “similarity”, which is distance based. With respect to model output, the presence of marginally relevant or irrelevant input dimensions significantly influences the performance of various local learning algorithms. Furthermore, a high dimensionality of inputs also influence an algorithm’s generalization ability. A theoretically sound wrapper approach to dimensionality reduction (feature subset selection) is therefore introduced and applied to a water resource management problem involving the prediction of daily canal diversions. The general argument in favor of wrapper methods is that using the basic induction algorithm (learning algorithm) in feature subset selection incorporates the induction bias. However, the major disadvantage of wrapper methods is that it is time consuming as the evaluation of various possible subsets of features in wrapper methods is generally done by cross validation. Incorporating locally weighted cross validation error (to retain the local nature of feature subset selection) as an evaluation measure, a fast method for cross-validation is introduced (“Local Schemata”). This evaluation measure is “estimated” in the sense that it measures the “utility” (or usefulness) over finite sample data. Since the environment is always uncertain, there is always a discrepancy between the “estimated” utility and the actual (“true”) utility. Moreover since the estimation of utilities is based on finite number of samples, one would expect any algorithm to return only approximately correct feature subset. Thus “Local Schemata” is modified further such that a feature subset is selected that is epsilon close to the true optimal (underlying but unknown) with some probability of error. This epsilon-optimal feature subset could be visualized as one of the epsilon-equivalent best feature subsets and the probability of error provides an upper bound on the error that we make in concluding that.
https://digitalcommons.usu.edu/runoff/2004/AllAbstracts/15