Wednesday, 25 February 2015

Computing Correlation Coefficients in R

Correlation Coefficients


The general formulae to compute correlation coefficient between 2 variables is -






where cov(A,B) is the covariance between A & B and SA and SB are the standard deviations.

Manual Way in R


#Define 2 verctors 
> 
> A <- c(1,2,4,5)
> B <- c(5,6,8,1)
> 
> #Finding Covariance between A & B 
> A_diff <- A - mean(A)
> B_diff <- B - mean(B)
> 
> #Print both the variables created above
> A_diff
[1] -2 -1  1  2
> B_diff
[1]  0  1  3 -4
> 
> #Do the summation and divide by N-1 to get the covariance between the two vectors
> #N = 3 in this case 
> cov <- sum(A_diff*B_diff)/(3-1)
> 
> #Finding the squared difference w.r.t to mean for the vectors 
> A_sq <- A_diff^2
> B_sq <- B_diff^2
> 
> #Using the standard deviation formulae 
> 
> A_sd <- sqrt(sum(A_sq)/(3-1))
> 
> B_sd <- sqrt(sum(B_sq)/(3-1))
> 
> #Print the standard deviation 
> A_sd
[1] 2.236068
> B_sd
[1] 3.605551
> 
> #Plugging in values to find the correlation coefficient
> 
> corr <- cov/(A_sd*B_sd)
> 
> #Printing the correlation obtained - Manual way
> corr
[1] -0.3721042
> 
> #Using formulae for direct computation 
> 
> corr_test <- cor(A,B)
> corr_test
[1] -0.3721042
 Using the in-built function and manual way, we get the same result. 

No comments:

Post a Comment