R is a powerful tool which can be used to make amazing graphics for representing your data.This post is going to just scratch the surface of the most commonly used graphics.
BOXPLOT: It is used for quantitative variables and its goal is to give you an idea of the distribution of the data. The upper and the lower bounds of the box represent the 75th percentile and the 25th percentile respectively. The black line is seen to be towards the top which means the distribution is asymmetric.
Dataset Source: http://datamarket.com/data/set/12sc/infant-mortality-rate#!
Dataset Source: http://datamarket.com/data/set/vih/retail-prices-of-some-commodities-and-services-1996-2013#!
BOXPLOT: It is used for quantitative variables and its goal is to give you an idea of the distribution of the data. The upper and the lower bounds of the box represent the 75th percentile and the 25th percentile respectively. The black line is seen to be towards the top which means the distribution is asymmetric.
Dataset Source: http://datamarket.com/data/set/12sc/infant-mortality-rate#!
#Graphical Plots in R install.packages('knitr') library(knitr) infant_mortality<-read.csv("infant-mortality-rate.csv") dim(infant_mortality)
[1] 92 4 #Reading first few lines of data
head(infant_mortality)
Year Australia India Pakistan 1 2010 4.437 50.58 71.35 2 2011 4.437 50.58 71.35 3 2012 4.437 50.58 71.35 4 2013 4.437 50.58 71.35 5 2014 4.437 50.58 71.35 6 2015 4.437 50.58 71.35
#Boxlpot boxplot(infant_mortality$India,col="red")
#Seeing the summary statistics of the dataset summary(infant_mortality)
Year Australia India Pakistan 2010 : 1 Min. :4.44 Min. :50.6 Min. :71.3 2011 : 1 1st Qu.:4.44 1st Qu.:50.6 1st Qu.:71.3 2012 : 1 Median :4.44 Median :50.6 Median :71.3 2013 : 1 Mean :4.44 Mean :50.6 Mean :71.3 2014 : 1 3rd Qu.:4.44 3rd Qu.:50.6 3rd Qu.:71.3 2015 : 1 Max. :4.44 Max. :50.6 Max. :71.3 (Other):86 NA's :1 NA's :1 NA's :1
#Boxplot for comarisons#Compares mortality rate VS Year boxplot(infant_mortality$Pakistan ~ as.factor(infant_mortality$Year),col="blue")
Dataset Source: http://datamarket.com/data/set/vih/retail-prices-of-some-commodities-and-services-1996-2013#!
#Barplot
retail_prices<-read.csv("retail-prices-of-some-commoditie.csv") head(retail_prices)
Month Dairy.cheese Dark.chocolate Eggs Grapes 1 1996-11 701 190 348 447 2 1997-02 736 188 360 593 3 1997-05 742 189 360 381 4 1997-08 747 191 359 375 5 1997-11 758 194 360 479 6 1998-02 786 192 341 378
barplot(table(retail_prices$Eggs),col="red")
#Density Plots #Density plots are smoothed histograms dens<-density(retail_prices$Eggs) #lwd is for line thickness #Density plots the percentage of observation instead of absolute numbers plot(dens,lwd=3,col="blue") #Density plots - Multiple Distributions dens_grapes<-density(retail_prices$Grapes) lines(dens_grapes,lwd=3,col="orange") #Timeseries #Install the following package to plot a timeseries install.packages("astsa") library(astsa) #start parameter is used to indicate the year on x-axis retail_prices_timeseries<-ts(retail_prices$Eggs,start=c(1961,1)) plot(retail_prices_timeseries,,col="blue",lwd=3) #To plot multiple timeseries on same graph for comparison retail_prices_timeseries1<-ts(retail_prices$Dark.chocolate,start=c(1961,1)) #Lines command is used to overlay multiple plots on same graph lines(retail_prices_timeseries1,col="red",lwd=3) retail_prices_timeseries2<-ts(retail_prices$Grapes,start=c(1961,1)) lines(retail_prices_timeseries2,col="green",lwd=3) retail_prices_timeseries2<-ts(retail_prices$Dairy.cheese,start=c(1961,1)) lines(retail_prices_timeseries2,col="yellow",lwd=3)
#Histogram
hist(retail_prices$Eggs,col="blue",breaks=50)
No comments:
Post a Comment