Web Analytics Assignment 2 Hotel Clickstream Analysis
– This assignment will analyze the data (HotelClickStream.xls) and interpret the results. This dataset includes clickstream data of online transactions for hotel booking in year 2011. Appendix includes the detailed description for the variables.
– Please follow the instructions very carefully to do this assignment! Please do the following analyses and answer the corresponding questions. Please copy/summarize your key results for each question to a word file along with your answers to produce the final report for submission.
1. Please first create the following 2 additional variables into your data
1) REF_D (create a dummy variable indicating whether the transaction was referenced from other website, if not, the final booking website was directly accessed. If no information provided for the variable REF_DOMAIN_NAME, REF_D = 0; otherwise REF_D = 1)
2) LOG_PRICE (take the log transformation of the variable PROD_TOTPRICE using the LOG function in excel)
a) Please provide a summary table showing the top 10 domain names (DOMAIN_NAME) that generated the most volume of transactions the report should look like the following Table (Hint: one way to do this is to use the COUNTIF function in excel). Please summarize briefly your observations from the results
Rank
Domain Names
# of Transactions
1
marriott
524
b) Please provide a summary table showing the top 10 reference domain names (REF_DOMAIN_NAME) that generated the most volume of transactions the report should look like the following Table. Please summarize briefly your observations from the results.
  Rank              Â
Reference Domain Names         Â
# of Transactions
1
google
620
c) Please provide summary statistics (N, Max, Min, Mean, and Std.) for variables: DIRECTP_D; REF_D; DURATION; PAGES_VIEWED; LOG_PRICE; and TRANS_FREQ. Please report your summary statistics table and provide short descriptions (a few bullet points) of your observations.
2. Please use the Binary Outcome (Logistic/Logit) regression technique to answer the question on “what are the factors that influence people’s decision on whether to book directly on a hotel website or from other third party website?” Please use DIRECT_D as your Dependent Variable (DV); and REF_D, LOG_PRICE, TRANS_FREQ, DURATION, HOUSEHOLD_SIZE, CHILDREN_D, and CONNECTIONSPEED_D as your Independent Variables (IV). Please report and interpret your regression results, which should include the interpretation of the regression coefficients.
3. a) Please use the Count Data (Poisson) regression model to answer the question on “what are the factors that influence people’s booking frequencies?” Please use TRANS_FREQ as your DV; and REF_D, LOG_PRICE, PAGES_VIEWED, HOUSEHOLD_SIZE, CHILDREN_D, and CONNECTIONSPEED_D as your IVs. Please report and interpret your regression results, which should include the interpretation of the regression coefficients.
b) Please repeat the analysis in question a) using the Negative Binomial Regression model. Please report and interpret your regression results and coefficients.
c) Please summarize your observations by comparing the results from a) and b).
4. a) Please use the linear regression technique to answer the question on “what are the factors that influence how much time people spend on a website?” Please use DURATION as your DV; and you may decide on the IVs by conducting the similar exercises in Assignment #1. Please ONLY report and interpret your final regression results.
b) Please use the linear regression technique to answer the question on “what are the factors that influence how many pages people views when visiting a website?” Please use PAGES_VIEWED as your DV; and you may decide on the IVs by conducting the similar exercises in Assignment #1. Please ONLY report and interpret your final regression results.
c) Alternatively, you can also use count data model (Poisson or Negarive Binomial) since PAGES_VIEWED is a variable with discrete and non-negative integers. Using the similar set of IVs, do you see significantly different results by using linear regression vs. count data models?
d) Please summarize your observations by comparing the results from a), b), and c).
Attachment:- HotelClickStream.rar
all_session_clean
ID DOMAIN_ID MACHINE_ID SITE_SESSION_ID TRANS_FREQ DOMAIN_NAME DIRECTP_D PROD_NAME PROD_QTY PROD_TOTPRICE REF_DOMAIN_NAME DURATION PAGES_VIEWED HOUSEHOLD_SIZE CHILDREN_D CONNECTIONSPEED_D
1525 13877604970862366012 85643811 4447900536932 1 ichotelsgroup.com 1 FT. LAUDERDALE AIRPORT/CRUISE - CROWNE PLAZA HOTEL MON 27 JUN 2011~FRI 29 JUL 2011 32 2847.0399932861 23.328125 13 6 1 1
402 7101213156062330967 76460408 71774258860245 1 orbitz.com 0 WALT DISNEY WORLD MAGIC YOUR WAY TICKETS! N/A 1 2406.939994812 yahoo.com 47.109375 17 2 1 1
233 7772350535129410931 74286590 3825866182640 1 hyatt.com 1 HYATT REGENCY MAUI RESORT SPA FRI 11 MAR 2011~WED 16 MAR 2011 5 2168 google.com 20.05859375 19 1 0 1
2362 9530952911301729568 90015830 70000481538306 1 expedia.com 0 HOTEL - THE ADDRESS DUBAI MARINA ~SAT DEC/10/2011TOTHU DEC/15/2011 5 1958.6999969482 47.546875 39 1 0 1
2738 4024709573451844450 91435029 5158448795791 2 starwoodhotels.com 1 HOTEL-W NEW YORK - TIMES SQUARE 08/18~08/21 3 1797 whotels.com 14.599609375 19 1 0 1
569 3010609366849421442 78515126 3893423575098 1 jetblue.com 0 SAN DIEGO, CA, (SAN)~BOSTON, MA, (BOS) THU JUL 21~FRI AUG 12 3 1698 aol.com 20.15625 2 2 1 1
1207 9530952911301729568 83769402 4366274465799 1 expedia.com 0 HOTEL - RIU PALACE CABO SAN LUCAS ALL INCLUSIVE SUN 8/28/2011~ SAT 9/3/2011 6 1556.7599983215 16.15625 9 4 1 1
451 9663188555341498165 76953668 3787166584900 3 hotwire.com 0 HOTEL - THE REGENCY HOTEL - LONDON THU, JUN 16, 2011~MON, JUN 20, 2011 4 1554.719997406 7.6298828125 13 5 0 1
2194 4024709573451844450 89081812 4368581595171 1 starwoodhotels.com 1 HOTEL-SHERATON SEATTLE HOTEL 07/08~07/10 8 1520 10.158203125 14 1 0 1
3194 1910370585147107479 93143690 4567230320752 1 bestwestern.com 1 BEST WESTERN PLAZA HOTEL SAUGATUCK AUGUST 18~AUGUST 28 10 1511.8999977112 15.259765625 10 2 0 1
1811 9530952911301729568 86701264 4925815132156 4 expedia.com 0 HOTEL - MAJESTIC COLONIAL PUNTA CANA ALL INCLUSIVE SAT 9/3/2011~ SUN 9/11/2011 8 1504 16.078125 31 5 1 1
1497 7772350535129410931 85432156 4548907700377 24 hyatt.com 1 N/A ~ 1 1430.9099998474 google.com 43.109375 31 1 0 1
2297 3010609366849421442 89654646 72551859884301 1 jetblue.com 0 BOSTON, MA (BOS)~FORT LAUDERDALE, FL (FLL) FRI FEB 03~SAT FEB 11 4 1412 31.69921875 5 4 1 1
2233 17475197073474272331 89428989 5346425769954 2 travelocity.com 0 HOTEL - WALT DISNEY WORLD DOLPHIN FRI FEB 18~MON FEB 21 3 1374.5399971008 22.87890625 13 6 1 1
2660 17374070360368138569 91221026 4802082246774 4 hotels.com 0 HOTEL - "SOLE ON THE OCEAN, SUNNY ISLES BEACH" JULY 19 2011~JULY 23 2011 1 1328.0499992371 1 1 3 1 1
1212 4024709573451844450 83882510 5180394770622 2 starwoodhotels.com 1 HOTEL-THE ST. REGIS ROME 11/03~11/06 3 1280.8599967956 55.796875 85 2 1 1
212 17374070360368138569 73980812 3720281133114 1 hotels.com 0 HOTEL - "THE BREAKERS RESORT, MYRTLE BEACH" JULY 5 2011~JULY 11 2011 6 1251.1799964905 7.75 6 2 0 1
2636 7101213156062330967 91064400 4589161025620 1 orbitz.com 0 HOTEL - HAMPTON INN VIRGINIA BEACH-OCEANFRONT SOUTH MON, JUL 25, 2011~SAT, JUL 30, 2011 5 1194.2999992371 tripadvisor.com 28.7265625 11 3 0 1
2933 9530952911301729568 92099828 5816212459711 1 expedia.com 0 HOTEL - MONTE CARLO RESORT AND CASINO ~MON JAN/9/2012TOSAT JAN/14/2012 5 1163.0499992371 23.65625 17 5 0 1
3047 9530952911301729568 92551113 5519708983469 1 expedia.com