Document Type
Article
Journal/Book Title/Conference
Communications in Statistics - Simulation and Computation
Publisher
Taylor & Francis Inc.
Publication Date
2024
Journal Article Version
Accepted Manuscript
First Page
1
Last Page
24
Abstract
Analyzing soft interval data for uncertainty quantification has attracted much attention recently. Within this context, regression methods for interval data have been extensively studied. As most existing works focus on linear models, it is important to note that many problems in practice are nonlinear in nature and the development of nonlinear regression tools for interval data is crucial. This paper proposes an interval-valued random forests model that defines the splitting criterion of variance reduction based on an L2 type metric in the space of compact intervals. The model simultaneously considers the centers and ranges of the interval data as well as their possible interactions. Unlike most linear models that require additional constraints to ensure mathematical coherences, the proposed random forests model estimates the regression function in a nonparametric way, and so the predicted interval length is naturally nonnegative without any constraints. Simulation studies show that the new method outperforms typical existing regression methods for various linear, semi-linear, and nonlinear data archetypes and under different error measures. To demonstrate the applicability, a real data example is presented where the price range data of the Dow Jones Industrial Average index and its component stocks are analyzed.
Recommended Citation
Gaona-Partida, Paul; Yeh, Chih-Ching; Sun, Yan; and Cutler, Adele, "Random Forests Regression for Soft Interval Data" (2024). Mathematics and Statistics Faculty Publications. Paper 285.
https://digitalcommons.usu.edu/mathsci_facpub/285