The data was first standardized using z-score scaling, a process that normalizes the dataset and ensures each feature contributes equally to the analysis. Following this, Principal Component Analysis (PCA) was conducted to extract principal components from the standardized data. This step is crucial in transforming the data into a set of linearly uncorrelated variables, known as principal components.
Each of these principal components was then evaluated for its explained variance ratio, which indicates the proportion of the dataset’s total variance that is captured by each component. This information is essential in understanding the significance of each principal component in representing the dataset.
Furthermore, a visualization was created to display the cumulative explained variance as a function of the number of principal components used. This graphical representation is invaluable for determining the optimal number of principal components required for dimensionality reduction. It helps in deciding how many principal components should be retained to capture the majority of the variance in the data while reducing the dimensionality, thus striking a balance between data simplification and information retention.