IT 110 Project Description
Choose one (1) of the two tasks below for your final project
Jack Bauer family is going to move to Pittsburgh. The family is recruiting a butler to help them make decisions. The tasks are:
1. A house. Jack Bauer family wants to buy a house. The requirements are:
a) The price is less than 500,000 USD.
b) It has investment potential.
c) Close to medical centers/hospitals, universities and supermarkets/malls (Target, Walmart, Whole Food, Costco, etc.).
d) Excellent traffic in surrounding areas.
2. Technology setup. Jack Bauer can’t live without Internet and asks for your decision support on Wifi router. He has narrowed down the selection to three comparable models: NETGEAR Nighthawk AC1900, ASUS AC1900, Linksys AC1750. He asked opinions from his friends and collected some reviews for these routers.
You are required to give a presentation to Jack Bauer family to help them make the above decision.
· For Task 1, you are suggested to use (but not limited to) Decision Tree and search for some more data from web to prepare a rich and exciting presentation.
· For Task 2, you are suggested to apply sentiment analysis and search for more data from web to help them make the decision.
· Data: Pittsburgh property price data, products review data.
* You are encouraged to additionally collect your own data to conduct more solid analysis.
· A visualization system to help:
https://vietexob.shinyapps.io/real_estate_app
https://vietexob.shinyapps.io/traffic_real_estate
This is an open project, please feel free to use your resources and power!
Jack Bauer family is looking forward to your presentation!
Task 1 Guide:
1. Whether a house should be recommended is a multi-factor decision, including its price, investment potential, traffic, proximity to public services, crime rate, neighborhood etc. Try to manually rate several (say, 20~) housing options by discussing among group members. Determine an overall rating of each housing option based on the multiple factors your group chose to judge upon.
1.1 This rating can be somewhat subjective, but the more options you rate, the more objective your analysis becomes in the later training step.
1.2 Suggested scale is 1~10, but the actual rating scale is up to you.
1.3 Theoretically, if you manually rate all the 3318 options in csv data, your job is done because you can recommend the houses with highest ratings to Jack family. However, do you have time for that?
2. The subset of data you manually rated (i.e., labeled) is your training data + test data. Again, the more data you label, the more time you need, but the more useful your trained model will be. You need to find a balance by yourselves.
Select 90% of labeled data as training data, the rest 10% as test data to optimize your decision tree parameters. You want to train a decision tree model with good performance on your test data.
Things to think about:
2.1 How do you sample the training data and test data?
2.2 Do you need to use all the attributes provided in csv? Any preprocessing of the raw data?
3. When you trained a satisfactory decision tree model, apply the model to unlabeled data to automatically label them (make rating prediction).
Finally, provide suggestion to the family with good reasons.
Task 2 Guide:
1. Determine whether to perform a sentence-level or review-level sentiment analysis (what is your document 'unit' in training and testing data)? You can choose either, but you need to make it consistent between training and testing data.
2. Prepare training data from Amazon (or equivalent websites). You can do manual copy & paste to create separate training files. You can also automate this by writing codes if you have good programmers in your group.
2.1 How many sentences/reviews need to be in training data? Of course the more the better.
2.2 Proportion of reviews/sentences for each camera type; proportions of positive and negative reviews used as training set also need to be carefully considered
2.3. Other miscellaneous stuff like preprocessing, capitalization, word stemming etc.
3. Prepare your testing data based on the txt files given on Blackboard. These files contain sentences/reviews without labels. You need to use your sentiment analysis model to automatically label them.
3.1 You can manually convert the four files into separate testing files (either sentence level or review level). If you have a programmer in your group, automating this process is encouraged.
4. Train your model, and apply it to your test data. Finally determine which router is best based on your analysis.
IT 110 Final Project Grading Rubric
Rubric Item
Points
Submission of presentation slides
5
Introduction
2
Method and Model
3
Results
2
Interpretation/Limitation
3
Presentation Quality
5
Total
20
Suggestions for Presentation
You are to develop a 10-20 professional Power Point slide presentation.
This assignment stresses organizing thoughts in the appropriate hierarchy and sequence, selecting words for their power and expressiveness, using technical terms in appropriate contexts, and representing complex data and symbols precisely in prose. This assignment requires the use of appropriate business language--do not use slang, unprofessional language, colloquialisms, etc.
Quality of Final Project:
1. Presentation is logically organized.
1. Presentation is readable and flows with continuity
1. Rules of grammar, punctuation, spelling, sentence construction, upper case letters, paragraph construction, etc. --- are followed.
1. Do the slides support, amplify, and clarify your work?
1. Do the slides represent correct information?
1. Is the presentation visually attractive?
1. Do the slides use a common font, theme, etc.?
1. The data analysis must support your work
1. Data analysis must be thorough, accurate, and complete
1. Data visualization must be accurate, useful, and visually appealing
Formatting:
1. Utilize graphics to help emphasize a particular concept or answer to a question
2. One of the early slides should include an agenda slide or objective slide and a summary slide
3. Slides should have references at the end (does not need to be included in your slide count).
Overall Presentation Impact!
IT 110 Final Project
(Please attach this page to your final report)
I affirm that I have neither given, received, nor witnessed unauthorized aid on this deliverable and have completed this work honestly and according to the professor’s guidelines. Please PRINT and sign below.
Group Member _______________________________________
Group Member _______________________________________
Group Member _______________________________________
Group Member _______________________________________
Group Member _______________________________________