Sunday, 12 April 2015

Data manipulation in R - Filter & Arrange

We can select rows which we want to display using the filter and arrange functions.
















Syntax for filter function

#1
b <- hflights %>%
        filter(Distance >= 2000)
View(b)
Notice, how we are referring more than one character in this code. 

#2
> hflights %>%
+ filter(UniqueCarrier %in% c("American", "Alaska"))
Source: local data frame [3,609 x 21]

You can even combine tests with boolean operators. If you use different tests separated with a comma, R just returns the 'ANDed' result ultimately. Otherwise, you can use &, | & ! operators.

Question -

1. Find out all the American Airlines flight that got cancelled after getting delayed.

Sample Code
> a <- hflights %>%
filter(UniqueCarrier == "American",DepDelay > 0, Cancelled == 1)
> head(a)
Source: local data frame [2 x 21]


2. Display a table that contains only flights that got cancelled over weekdays.
> f3 <- hflights %>%
 filter(DayOfWeek %in% c(1,2,3,4,5),Cancelled == 1)
> head(f3)
Source: local data frame [6 x 21]

3. Display a table that contains only flights that whose TaxiIn time is greater than Taxi Out time and Depdelay is not NA. 
> a <- hflights %>%
 filter(TaxiIn > TaxiOut, !is.na(DepDelay))
> head(a)















Syntax for arrange function

Let's understand the role of arrange function through a question.

Question -

1. Arrange the names of all unique carriers in alphabetical order and also arrange by depdelay. Do no t include columns that might have depdelay = NA

> a <- hflights %>%
         filter(TaxiIn > TaxiOut, !is.na(DepDelay))
> head(a)
 
> a <-a %>%
          arrange(UniqueCarrier,DepDelay)
> select(a,UniqueCarrier,DepDelay)
 
  UniqueCarrier DepDelay
1        AirTran      -14
2        AirTran      -13
3        AirTran      -12
4        AirTran      -11
5        AirTran      -11
6        AirTran      -11

Note: Wrapping any of the variable with desc() function will arrange the values in descending order.

We can always perform simple operations inside the arrange function directly.

Example -

> a <- hflights %>%
         arrange(desc(TaxiIn + TaxiOut))
> head(a)
 

This will arrange the data in descending order of Total Taxing Time. 

No comments:

Post a Comment