Wednesday, 21 January 2015

R Basics: Graphics in R

R is a powerful tool which can be used to make amazing graphics for representing your data.This post is going to just scratch the surface of the most commonly used graphics.

BOXPLOT: It is used for quantitative variables and its goal is to give you an idea of the distribution of the data. The upper and the lower bounds of the box represent the 75th percentile and the 25th percentile respectively. The black line is seen to be towards the top which means the distribution is asymmetric.
Dataset Source: http://datamarket.com/data/set/12sc/infant-mortality-rate#!

#Graphical Plots in R 
install.packages('knitr')
library(knitr)
infant_mortality<-read.csv("infant-mortality-rate.csv")
dim(infant_mortality)
[1] 92  4
#Reading first few lines of datahead(infant_mortality)
  Year Australia India Pakistan
1 2010     4.437 50.58    71.35
2 2011     4.437 50.58    71.35
3 2012     4.437 50.58    71.35
4 2013     4.437 50.58    71.35
5 2014     4.437 50.58    71.35
6 2015     4.437 50.58    71.35
#Boxlpot
boxplot(infant_mortality$India,col="red")
boxplot
#Seeing the summary statistics of the dataset
summary(infant_mortality)
      Year      Australia        India         Pakistan   
 2010   : 1   Min.   :4.44   Min.   :50.6   Min.   :71.3  
 2011   : 1   1st Qu.:4.44   1st Qu.:50.6   1st Qu.:71.3  
 2012   : 1   Median :4.44   Median :50.6   Median :71.3  
 2013   : 1   Mean   :4.44   Mean   :50.6   Mean   :71.3  
 2014   : 1   3rd Qu.:4.44   3rd Qu.:50.6   3rd Qu.:71.3  
 2015   : 1   Max.   :4.44   Max.   :50.6   Max.   :71.3  
 (Other):86   NA's   :1      NA's   :1      NA's   :1  
#Boxplot for comarisons#Compares mortality rate VS  Year
 boxplot(infant_mortality$Pakistan ~ as.factor(infant_mortality$Year),col="blue")
boxplot1

Dataset Source: http://datamarket.com/data/set/vih/retail-prices-of-some-commodities-and-services-1996-2013#!
 #Barplot 
retail_prices<-read.csv("retail-prices-of-some-commoditie.csv")
head(retail_prices)
    Month Dairy.cheese Dark.chocolate Eggs Grapes
1 1996-11          701            190  348    447
2 1997-02          736            188  360    593
3 1997-05          742            189  360    381
4 1997-08          747            191  359    375
5 1997-11          758            194  360    479
6 1998-02          786            192  341    378
barplot(table(retail_prices$Eggs),col="red")

barplot
  #Histogram  hist(retail_prices$Eggs,col="blue",breaks=50) 

histogram
#Density Plots

#Density plots are smoothed histograms
 dens<-density(retail_prices$Eggs) 
#lwd is for line thickness 
#Density plots the percentage of observation instead of absolute numbers 
plot(dens,lwd=3,col="blue") 
#Density plots - Multiple Distributions 
dens_grapes<-density(retail_prices$Grapes) 
lines(dens_grapes,lwd=3,col="orange")
density
#Timeseries 

#Install the following package to plot a timeseries
install.packages("astsa")
library(astsa)

#start parameter is used to indicate the year on x-axis
retail_prices_timeseries<-ts(retail_prices$Eggs,start=c(1961,1))
plot(retail_prices_timeseries,,col="blue",lwd=3)

#To plot multiple timeseries on same graph for comparison 
retail_prices_timeseries1<-ts(retail_prices$Dark.chocolate,start=c(1961,1))
#Lines command is used to overlay multiple plots on same graph
lines(retail_prices_timeseries1,col="red",lwd=3)
retail_prices_timeseries2<-ts(retail_prices$Grapes,start=c(1961,1))
lines(retail_prices_timeseries2,col="green",lwd=3)
retail_prices_timeseries2<-ts(retail_prices$Dairy.cheese,start=c(1961,1))
lines(retail_prices_timeseries2,col="yellow",lwd=3)
Timeseries

No comments:

Post a Comment