In this individual assignment, you will perform an exploratory analysis with What-If Tool, to better understand the structure of datasets, investigate initial questions, and develop preliminary insights and hypotheses. Your final submission will take the form of a report consisting of key insights gained during your analysis.
Step 1: Dataset Selection and Initial Questions
Pick two datasets. These can be ones that are available for demo at https://pair-code.github.io/what-if-tool/explore/ (Links to an external site.). But we'll give you additional points if you choose to use datasets that are not available there.
After selecting datasets – but prior to analysis – write down an initial set of three questions you'd like to investigate about the datasets and prediction results from ML models.
Part 2: Exploratory Visual Analysis
Next, you will perform an exploratory analysis of your dataset and results from ML models using What-If Tool. You can either use their web demo if you use their provided datasets. You can also use notebooks and revise them with your datasets and models.
You should consider two different phases of exploration.
In the first phase, you should seek to gain an overview of the structure of your datasets and results from their models. What is the structure of datasets? Which features are used? Are there any notable issues with the distributions of datasets? What is the model performance? What features contributed the most? Are there any surprising relationships among subsets of data and model results? Are there any fairness issues?
In the second phase, you should investigate your initial questions, as well as any new questions that arise during your exploration. For each question, playing with the visualizations in What-If Tool, that might provide a useful answer. Interact with their functionalities (e.g., datapoint editors, dropdown menus, fairness analysis) to develop better perspectives, explore unexpected observations, or sanity check your assumptions. You should repeat this process for each of your questions, and also feel free to revise your questions or branch off to explore new questions.
What to submit?
You'll submit a single PDF as a form of a report. For each dataset, you will provide 10 most interesting or surprising findings (or "insights") with details and screenshots. Your "insights" can include important surprises or issues (such as skewed data distributions, critical fairness issues) as well as responses to your analysis questions. Each finding will consist of a title and 2-4 sentence descriptions, and screenshots. Provide sufficient detail so that anyone could read through your report and understand what you've learned. You are free, but not required, to annotate your images to draw attention to specific features of the data.
Do not submit a report cluttered with everything little thing you tried. Submit a clean, succinct report that highlights the most interesting, insightful observations. You don't need to tell us how the tool works -- we already know that. Think of this like a report to your manager who wants to know what the datasets look like and how the model worked.
The structure of the report will be:
- Dataset 1
- Which dataset?
- Three initial questions
- 10 most interesting findings
- Dataset 2
- Which dataset?
- Three initial questions
- 10 most interesting findings
Grading
- Clear questions applicable to the chosen datasets
- Clearly written, understandable descriptions that communicate primary insights
- Sufficient breadth of analysis, exploring multiple questions
- Sufficient depth of analysis, with appropriate follow-up questions
- Interesting insights that are worth reporting