Date of Award
Master of Science (MS)
Mathematics and Statistics
Daniel C. Coster
Objective: To study the estimation and inference m factor analyses when the data have normal or non-normal noise distributions.
Methods: Population data were created in package R with a specified number of factors, factor structure and observable variables with known loadings. Then, repeated simple random samples (SRS's) were taken from the population, independently. The maximum likelihood method with varimax rotation was used to perform factor analysis and inference on each sampled dataset. Factor loadings were estimated to determine if the estimation of the loadings was (approximately) unbiased and/or efficient for each specified population and chi-square x2-statistics were obtained to test hypotheses about the correct number of factors in simulated settings where the true number of factors was known. In this project, the number of true factors varied between l and 2, the number of observed variables was 6 for l factor and 3 each for 2 factors, and non-normal noise distributions were used to create actual observations. These non-normal distributions included: exponential, lognormal, gamma, poison, and discrete uniform. Different loading matrices were tried in combination with different standard deviation values for each noise distribution in three types of factor models and two sample sizes, 25 and 100.
Results and Conclusions: For the larger sample size of n=lOO, with standard normal factor populations, and normal errors added to each variable, the performance of the standard MLE estimation of factor loadings varied with decreasing standard deviation of the noise distribution, from underestimated to overestimated on average, but the chi-square tests gave expected results, in terms of false rejection (Type I Error) rate or power of the model, using 1000 replications. For normal common factors, but non-normal noise added to observed variables, the robustness of the loading estimation was excellent for all the noise distributions tried, and overall, the robustness of the chi-square tests was remarkable except that the Type I Error rates were greater than 100 out of 1000 (10%) in the I-factor model with one true factor for the Gamma noise distribution when the standard deviation was small.
When the sample size was decreased from 100 to 25, the estimation of the loadings became less accurate (larger estimated standard errors and greater bias), especially with larger values of the standard deviation of the noise distribution. The endurance of the estimated power function was, in some cases, poor when one factor models were fitted in simulations from populations with two true common factors. However, Type I error rates were not greatly impacted whenever the correct number of factors were fitted in the models. The same conclusions held for a normal population to which was added either normal or non-normal noise, and also for the discrete uniform population with discrete uniform noise.
Zhang, Ping, "Simulation Study of Estimation and Inference in Factor Analysis: Normal and Non-normal Noise Distributions" (2005). All Graduate Plan B and other Reports. 1284.