A Comparison of Random Forest-Based Methods for Racial/Ethnic-Specific Classification of Obesity

Sun Young Jeon, Utah State University

Abstract

Obesity is typically defined using body mass index (BMI) and its established cut-off. However, some studies have highlighted the importance of developing racial/ethnic-specific classifications of obesity that reflect different body compositions and fat distributions. Using National Health and Nutrition Examination Survey (NHANES) data and Random Forest classification, this paper attempts to identify important body measures and cut-offs for predicting obesity-related health risks among White, Hispanic and Black male populations in the U.S. In particular, this paper compares the performance of three Random Forest- based methods for dealing with class imbalance: weighted Random Forest (WRF), Random Forest with down-sampling (DS), and Random Forest with SMOTE. Of the three methods, the best performing one turned out to be different for each population in the given dataset. Thus, WRF for Whites, Random Forest with SMOTE for Hispanics, and Random Forest with DS for Blacks are used as the final models for the classification. The results show that BMI is indeed an important body measure for predicting obesity-related health risks among White males, but is considerably less informative for Hispanic and Black males. On the other hand, using waist circumference along with population-specific cut-offs turned out to be more useful in predicting obesity-related health risks for these two populations.