Samstag, 3. Januar 2015

Excel like PivotTable and more with R


A table contains values as below (download from link under figure):


Item Category Price Profit Actual Profit Calories
Beer Beverages  $4.00 50%  $2.00 200
Soda Beverages  $2.50 80%  $2.00 120
Chocolate Bar Candy  $2.00 75%  $1.50 255
Ice Cream Sandwich Frozen Treats  $3.00 67%  $2.00 240
Bottled Water Beverages  $3.00 83%  $2.50 0
Gummy Bears Candy  $2.00 50%  $1.00 300
Soda Beverages  $2.50 80%  $2.00 120
Hamburger Hot Food  $3.00 67%  $2.00 320
Popcorn Hot Food  $5.00 80%  $4.00 500
Licorice Rope Candy  $2.00 50%  $1.00 280
Hot Dog Hot Food  $1.50 67%  $1.00 265
Licorice Rope Candy  $2.00 50%  $1.00 280
Popcorn Hot Food  $5.00 80%  $4.00 500
Popcorn Hot Food  $5.00 80%  $4.00 500

It is very easy to summarize the data using the Excel PivotTable generator (even for multiple variables like "Profit" and "Actual Profit" simultaneously):
Example taken from the Data Smart book.
































It probably doesn't make a lot of sense to summarize the data by percentage, but this is just for illustration purposes.







A similar operation is the aggregate function in R (the Concessions.xlsx data frame is loaded as con):
> aggregate(cbind(Profit, Actual.Profit) ~ Category, data=con, FUN=sum)
       Category   Profit Actual.Profit
1     Beverages 31.23333          98.5
2         Candy 23.25000          46.5
3 Frozen Treats 23.00000          69.0
4      Hot Food 45.21667         142.0




The same is possible using the xtabs function:
> xtabs(cbind(Profit, Actual.Profit) ~ Category, data=con)
              
Category           Profit Actual.Profit
  Beverages      31.23333      98.50000
  Candy          23.25000      46.50000
  Frozen Treats  23.00000      69.00000
  Hot Food       45.21667     142.00000


Note, no special operation FUN is specified in xtabs.

Occurence of items can be done when specifing "Count" in the PivotTable builder:

R can do this with the plyr package and its count function.

> count(con$Item)
                       x freq
1                   Beer   20
2          Bottled Water   13
3          Chocolate Bar   13
4  Chocolate Dipped Cone   11
5            Gummy Bears   14
6              Hamburger   16
7                Hot Dog   15
8     Ice Cream Sandwich   10
9          Licorice Rope   13
10                Nachos   15
11                 Pizza   17
12               Popcorn   16
13              Popsicle   13
14                  Soda   13





This can also be done using the extremely powerful built-in functions of the data.table type:

> con[, .N, by=Item]
                     Item  N
 1:                  Beer 20
 2:         Bottled Water 13
 3:                  Soda 13
 4:         Chocolate Bar 13
 5:           Gummy Bears 14
 6:         Licorice Rope 13
 7:              Popsicle 13
 8:    Ice Cream Sandwich 10
 9: Chocolate Dipped Cone 11
10:               Popcorn 16
11:             Hamburger 16
12:                Nachos 15
13:                 Pizza 17
14:               Hot Dog 15


Here, .N is a built-in count function applied to the data.table object.

Or with a call to table where one can break down the items by profit:

> table(con$Item, con$Profit)
                      
                        0.25 0.5 0.67 0.75 0.8 0.83
  Beer                     0  20    0    0   0    0
  Bottled Water            0   0    0    0   0   13
  Chocolate Bar            0   0    0   13   0    0
  Chocolate Dipped Cone    0  11    0    0   0    0
  Gummy Bears              0  14    0    0   0    0
  Hamburger                0   0   16    0   0    0
  Hot Dog                  0   0   15    0   0    0
  Ice Cream Sandwich       0   0   10    0   0    0
  Licorice Rope            0  13    0    0   0    0
  Nachos                   0  15    0    0   0    0
  Pizza                   17   0    0    0   0    0
  Popcorn                  0   0    0    0  16    0
  Popsicle                 0   0    0    0   0   13
  Soda                     0   0    0    0  13    0


I.e. 13 items of "Bottled Water" giving a profit of 83% each have been sold.

Excel can then apply eg. summation when breaking down items by category:





















The operation to carry out (sum, count, ...) is defined when clicking on the small "i" symbol in the Values field. So in total, beer earned 80 $.

Another plyr built-in function of the data.table object allows to quickly obtain the summarized prices of the category sales, broken down by category.

> con[, sum(Price), by=list(Category, Item)]
         Category                  Item   V1
 1:     Beverages                  Beer 80.0
 2:     Beverages         Bottled Water 39.0
 3:     Beverages                  Soda 32.5
 4:         Candy         Chocolate Bar 26.0
 5:         Candy           Gummy Bears 28.0
 6:         Candy         Licorice Rope 26.0
 7: Frozen Treats              Popsicle 39.0
 8: Frozen Treats    Ice Cream Sandwich 30.0
 9: Frozen Treats Chocolate Dipped Cone 33.0
10:      Hot Food               Popcorn 80.0
11:      Hot Food             Hamburger 48.0
12:      Hot Food                Nachos 45.0
13:      Hot Food                 Pizza 34.0
14:      Hot Food               Hot Dog 22.5