Wednesday, 14 January 2015

All about Correlation - Part 1

What is Correlation ? 

Correlation indicates how strongly are 2 variables associated. Correlation,however does not imply causation. The value of correlation varies between 0 & 1.

Correlation does not mean causation


Using a R dataset - mtcars to understand correlation deeper.

> #To read the first few lines of the datset
> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
> 
> #To generate a correlation matrix
> correlations <- cor(mtcars[2:6])
> #Round function can also be used
> # correlations <- round(cor(mtcars[2:6]),2) - Will round to 2 decimal digits
> #Print the result
> correlations
            cyl       disp         hp       drat         wt
cyl   1.0000000  0.9020329  0.8324475 -0.6999381  0.7824958
disp  0.9020329  1.0000000  0.7909486 -0.7102139  0.8879799
hp    0.8324475  0.7909486  1.0000000 -0.4487591  0.6587479
drat -0.6999381 -0.7102139 -0.4487591  1.0000000 -0.7124406
wt    0.7824958  0.8879799  0.6587479 -0.7124406  1.0000000
> 
> #To create a visual plot of the correlations
> corrplot(correlations)
> 

Correlation matrix





No comments:

Post a Comment