Effect of PCA Centering and Scaling on Classification of Mycobacteria from Raman Spectra
Raman spectroscopy has been used for decades to detect and identify biological substances as it provides specific molecular information. Spectra collected from biological samples are often complex, requiring the aid of data truncation techniques such as principal component analysis (PCA) and multivariate classification methods. Classification results depend on the proper selection of principal components (PCs) and how PCA is performed (scaling and/or centering). There are also guidelines for choosing the optimal number of PCs such as a scree plot, Kaiser criterion, or cumulative percent variance. The goal of this research is to evaluate these methods for best implementation of PCA and PC selection to classify Raman spectra of bacteria. Raman spectra of three different isolates of mycobacteria ( Mycobacterium sp. JLS, Mycobacterium sp. KMS, Mycobacterium sp. MCS) were collected and then passed through PCA and linear discriminant analysis for classification. Principal component analysis implementation as well as PC selection was evaluated by comparing the highest possible classification accuracies against accuracies determined by PC selection methods for each centering and scaling option. Centered and unscaled data provided the best results when selecting PCs based on cumulative percent variance.
Hanson C*, Sieverts M**, E Vargis+. Effect of PCA centering and scaling on classification of mycobacteria from Raman spectra. Applied Spectroscopy, 71: (6), 1249-1255 pdf