## Samstag, 3. Januar 2015

### Excel like PivotTable and more with R

 Item Category Price Profit Actual Profit Calories Beer Beverages $4.00 50%$2.00 200 Soda Beverages $2.50 80%$2.00 120 Chocolate Bar Candy $2.00 75%$1.50 255 Ice Cream Sandwich Frozen Treats $3.00 67%$2.00 240 Bottled Water Beverages $3.00 83%$2.50 0 Gummy Bears Candy $2.00 50%$1.00 300 Soda Beverages $2.50 80%$2.00 120 Hamburger Hot Food $3.00 67%$2.00 320 Popcorn Hot Food $5.00 80%$4.00 500 Licorice Rope Candy $2.00 50%$1.00 280 Hot Dog Hot Food $1.50 67%$1.00 265 Licorice Rope Candy $2.00 50%$1.00 280 Popcorn Hot Food $5.00 80%$4.00 500 Popcorn Hot Food $5.00 80%$4.00 500

It is very easy to summarize the data using the Excel PivotTable generator (even for multiple variables like "Profit" and "Actual Profit" simultaneously):
 Example taken from the Data Smart book.

It probably doesn't make a lot of sense to summarize the data by percentage, but this is just for illustration purposes.

A similar operation is the aggregate function in R (the Concessions.xlsx data frame is loaded as con):
> aggregate(cbind(Profit, Actual.Profit) ~ Category, data=con, FUN=sum)
Category   Profit Actual.Profit
1     Beverages 31.23333          98.5
2         Candy 23.25000          46.5
3 Frozen Treats 23.00000          69.0
4      Hot Food 45.21667         142.0

The same is possible using the xtabs function:
> xtabs(cbind(Profit, Actual.Profit) ~ Category, data=con)

Category           Profit Actual.Profit
Beverages      31.23333      98.50000
Candy          23.25000      46.50000
Frozen Treats  23.00000      69.00000
Hot Food       45.21667     142.00000

Note, no special operation FUN is specified in xtabs.

Occurence of items can be done when specifing "Count" in the PivotTable builder:

R can do this with the plyr package and its count function.

> count(con$Item) x freq 1 Beer 20 2 Bottled Water 13 3 Chocolate Bar 13 4 Chocolate Dipped Cone 11 5 Gummy Bears 14 6 Hamburger 16 7 Hot Dog 15 8 Ice Cream Sandwich 10 9 Licorice Rope 13 10 Nachos 15 11 Pizza 17 12 Popcorn 16 13 Popsicle 13 14 Soda 13 This can also be done using the extremely powerful built-in functions of the data.table type: > con[, .N, by=Item] Item N 1: Beer 20 2: Bottled Water 13 3: Soda 13 4: Chocolate Bar 13 5: Gummy Bears 14 6: Licorice Rope 13 7: Popsicle 13 8: Ice Cream Sandwich 10 9: Chocolate Dipped Cone 11 10: Popcorn 16 11: Hamburger 16 12: Nachos 15 13: Pizza 17 14: Hot Dog 15 Here, .N is a built-in count function applied to the data.table object. Or with a call to table where one can break down the items by profit: > table(con$Item, con$Profit) 0.25 0.5 0.67 0.75 0.8 0.83 Beer 0 20 0 0 0 0 Bottled Water 0 0 0 0 0 13 Chocolate Bar 0 0 0 13 0 0 Chocolate Dipped Cone 0 11 0 0 0 0 Gummy Bears 0 14 0 0 0 0 Hamburger 0 0 16 0 0 0 Hot Dog 0 0 15 0 0 0 Ice Cream Sandwich 0 0 10 0 0 0 Licorice Rope 0 13 0 0 0 0 Nachos 0 15 0 0 0 0 Pizza 17 0 0 0 0 0 Popcorn 0 0 0 0 16 0 Popsicle 0 0 0 0 0 13 Soda 0 0 0 0 13 0 I.e. 13 items of "Bottled Water" giving a profit of 83% each have been sold. Excel can then apply eg. summation when breaking down items by category: The operation to carry out (sum, count, ...) is defined when clicking on the small "i" symbol in the Values field. So in total, beer earned 80$.

Another plyr built-in function of the data.table object allows to quickly obtain the summarized prices of the category sales, broken down by category.

> con[, sum(Price), by=list(Category, Item)]
Category                  Item   V1
1:     Beverages                  Beer 80.0
2:     Beverages         Bottled Water 39.0
3:     Beverages                  Soda 32.5
4:         Candy         Chocolate Bar 26.0
5:         Candy           Gummy Bears 28.0
6:         Candy         Licorice Rope 26.0
7: Frozen Treats              Popsicle 39.0
8: Frozen Treats    Ice Cream Sandwich 30.0
9: Frozen Treats Chocolate Dipped Cone 33.0
10:      Hot Food               Popcorn 80.0
11:      Hot Food             Hamburger 48.0
12:      Hot Food                Nachos 45.0
13:      Hot Food                 Pizza 34.0
14:      Hot Food               Hot Dog 22.5