-
-
Notifications
You must be signed in to change notification settings - Fork 2
PCA
Principal Component Analysis is a method that analysis data, it is used for dimensional reduction, data visualising and more. The sole purpose of PCA is to find the principal components that describes the most variation in the data another way of saying this is PCA represent the data in new axes in a coordinate system.
If we have a data-set of cells, we don't know what type of cells they are, but we know how many types of cells there is. In this example we give them some sugar. Each column describes how much a cell grows. If we only have asked two cells it is easy to visualize in a graph. If we have three cells, we could either visualize it with a 3d-graph or draw three 2d-graph to see if there is any correlation. If there is more than 3 cells it becomes more difficult to do by plotting the data into one graph, and the amount of 2d-graph that shows all the data would be too much to look at. This is where PCA becomes handy. To do PCA there are some steps to follow.
Here we find the average of the column and subtract it from the original column.
N is the number of columns.
Here we calculate the Covariance matrix with the formula
This finds the ratio between the two sets.
We calculate the eigenvectors and eigenvalues for the Covariance matrix. Where the eigenvectors is
This step is calculated multiple times, to find the best suitable eigenvectors.
Sort the eigenvectors in decreasing order, and chose your number M, put the chosen data into a matrix as
Note if we don't want to reduce the data this step is unnecessary
We can now transform the data into the linear combinations we made by
If the data should not be reduced we use N instead of M.
If the description this video goes over PCA but uses a more grafic approach on PCA
https://www.youtube.com/watch?v=FgakZw6K1QQ
Christopher M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. New York: Springer, 2006. 738 pp. isbn: 978-0-387-31073-2.