project/Data Visualisation Project .docx.pdf
FIT5147 Narrative Visualisation Project
In this project, you are asked to create an interactive narrative visualisation that communicates some of your findings from the Data Exploration Project.
It is an individual assignment and worth 40% of your total mark for FIT5147.
Relevant learning outcome
● Choose appropriate data visualisations. ● Implement interactive data visualisations using R (Shiny) or JavaScript (D3).
Overview of the tasks
1. Identify which findings from the Data Exploration Project you wish to communicate. You do not need to use everything you have found, be selective. You should try to answer your research questions.
2. Clearly define the intended audience. The audience might be your classmates, the general public, politicians or whoever you like.
3. Design an interactive narrative visualisation using the five design sheet methodology. 4. Prepare a short presentation based on your five design sheets (one sheet per slide). More
information about the presentation will be provided later on Moodle. 5. Implement your visualisation as a web-based presentation using R (Shiny) or JavaScript (D3).
The use of other tools/visualisation library/visualisation software is subject to approval by your tutor. Specifically for R, you are not allowed to use the R markdown.
6. Write a report and export it to PDF. 7. Submit the report and source codes.
Report structure
Write a 15-pages (excluding bibliography, table of content, cover page, appendix) report consists of the following sections:
1. Project title Title of the narrative visualisation. This can be included in the cover page.
2. Your identity. Your full name, student ID, Lab number, and tutor name. This can be included in the cover page.
3. Introduction A precise description of what messages you wanted your narrative visualisation to convey and who the intended audience is.
4. Design This section contains a description of the visualisation design process. This summarises the five design sheets (i.e. details alternatives designs you considered and justifications of your final design).
5. Implementation This section contains a high-level description of the implementation, including libraries used and reasons for the implementation decisions. You are not expected to explain the codes in detail.
6. User guide This section contains instructions for viewing and exploring your narrative visualisation.
7. Conclusion Summarise your findings and what you have achieved. Reflect on what you have learnt in this project, including what in hindsight you might have done differently to improve the result.
8. Bibliography Appropriate references. Refer to this page to see appropriate referencing styles.
9. Appendix Place your five design sheets in the appendix. Make sure you provide clear images.
Your report should contain high-quality images of the visualisation. You should also briefly explain any reasons why your project was challenging (e.g. extensive data set, advance use of D3, etc.) in your report.
Marking Criteria
1. Design [15%] a. Appropriate use of five design sheet methodology and evaluation of alternatives
[5%]. b. Quality of final design: clear signposting of messages and intended narrative,
provision of appropriate context for the reader, good use of colour, references to data sources and appropriateness for the intended audience [7%].
c. Justification of final design in terms of the human perceptual system and human communication assumptions [3%].
2. Implementation [7%] a. Correctness and robustness, speed, accessibility [5%]. b. Comments and code quality [2%].
3. Difficulty [10%] Degree of difficulty, e.g.Use different sources of non-tabular data very well, dealing with large dataset, advanced D3 programming/advanced R(shiny) programming sophisticated user interaction (e.g. animation, linked interaction).
4. Presentation [3%] a. Quality of oral presentation (confidence, speed, voice) and quality of slides
(legibility, design, images) [1%]. b. Logical structure [1%]. c. Choice of content (completeness, appropriate level, discussion of design and
implementation alternatives) [1%]. 5. Report [5%]
a. Quality of writing, referencing, images, logical structure [1.5%]. b. Completeness [3.5%].
Submission due dates
● Submit presentation slides to Moodle by Friday, 23 October 2020, 5:00 PM (Presentations will be done in Week 11 & 12, During your lab.)
● Submit a PDF report and a zip file to Moodle by Monday, 16 November 2020, 5:00 PM. NOTE: Times are expressed in Aust/Melbourne local time
https://guides.lib.monash.edu/citing-referencing/apa
How to submit
Once you have completed your work, The following files are to be submitted:
● Presentation slides containing your five design sheets. Name the file StudentName_StudentID_Presentation.pdf and submitted via Moodle (i.e., Assessments/Presentation)
● A PDF report (max 15 pages) and a zipped file containing your visualisation source code and any data files that are needed to run your code. Please ensure you name the file correctly using the following format:
o StudentName_StudentID_Report.pdf o StudentName_StudentID_Code.zip
These two files (i.e., .pdf and .zip) must be submitted via Moodle (i.e., Assessments/ Visualisation Project Code). Do not zip these files into one zip archive, submit two independent PDF file and zip file.
Please note we cannot mark any work submitted via email or sharing via GDrive. Please ensure that you submit correctly via Moodle since it is only in this process that you complete the required student declaration without which work cannot be assessed.
It is your responsibility to ENSURE that the files you submit are the correct files - we strongly recommend after uploading a submission, and prior to actually submitting in Moodle, that you download the submission and double-check its contents.
Your assignment MUST show a status of "Submitted for grading" before it will be marked.
If your submission shows a status of "Draft (not submitted)" it will not be assessed and will incur late penalties if submitted after the due date/time.
Note that you DO NOT need to publish your app on the web.
Late submissions and special consideration
● We encourage everyone to submit the presentation slide on time. We give zero mark for late Presentation Slides submission.
● For Visualisation(i.e., report and code), Assessments received after the submission deadline, or after the extended submission date for those with special consideration, will be penalised at 5% of total mark [37%] per day for a maximum penalty period of ten (10) consecutive days.
● If an assessment is received after the penalty period, then zero marks will be awarded. ● For further information on eligibility for Special Consideration, please refer to the relevant
section on the Assessment page on Moodle.
Resubmissions
If you are retaking this unit from a previous semester, please ensure you choose a completely new topic and dataset.
project/DataExplorationProject.pdf
FIT5147 Monash University 30369916
FIT5147 Data Exploration Project
30369916
Jianwei Jing
FIT5147 Monash University 30369916
1.Introduction
After busy weekdays, people like to find some entertainment to take their
weekends. Doing some sports has become important for many people, which
can help them not only stay in a good mood but also keep them healthy.
Meanwhile, for those people who don’t have enough time to do sports, watching
competitive sports has become a way to entertain in their daily life. Competitive
sports have existed in our society for many centuries. We have held Olympic
game for 31 times, which contains multiple kinds of competitive sports. Not
only the Olympic Games, different types of sporting events are held all over the
world which motivate more and more people to participate in competitive
sports.
However, while we enjoy watching competitive sports, injuries are also
troubling professional athletes. In order to get better result, professional athletes
have to do some difficult moves in the game. While improving viewing rate,
these movements also cause a great threat to the physical health of athletes.
NBA (National Basketball Association) is an American men's professional
basketball league, which has many talent professional athletes from all over the
world. However, injuries plague many talented athletes. Some of their
competitive condition are affected after getting injury, some of them have to
end their professional career because of injury.
This project is going to talk about what makes a professional basketball
player have a longer career, which will be discussed based on some facts in
those athletes. We will analyse the injury statistic of basketball professional
players in NBA to discuss which basketball position is more likely to get injury
and how much affect to players after they get injury. Also, we will analyse the
technology statistics of some professional players to discuss at what age does a
basketball player begin to slip. These questions can help us to figure out how to
make a professional basketball player have a good career.
2.Data Wrangling
About data wrangling part, let’s look at the data we have. We have three
tables to do the data exploration including:
Injuries Statistics from 2010 to 2020
FIT5147 Monash University 30369916
Players Technical Statistics from 1950 to 2017
Three Outstanding Players Technical Statistics during their career
We are considering to use R to do the data wrangling work. In order to
discuss about which basketball position is easy to get injured, we need to
combine the injuries statistics table and players technical statistics table to help
us, which means the players technical statistics from 1950 to 2010 are useless
anymore. Here, we create a new table named “injury-stat” by R. Firstly, when
we look at the injuries table, we can see that it has columns called “Acquired”
and “Relinquished”. Actually the values blow these columns are all players’
name, so the first step we are going to combine these two columns as one
column which is called “Player”, then we are going to combine injury table and
statistics table as one table called “injurystat”. After doing that, we can get the
new table below.
Injurystat table
Based on injurystat table we can see the injured Players and their
statistics. After this, we have finished our data wrangling work.
3. Data Checking
We are going to use is.na() of R to check if our data has empty value.
FIT5147 Monash University 30369916
Data Checking Result1
Here we can see. Table “Players” doesn’t have null value, that’s what we
want here. New table “injurystat” has many empty values. For the question
which basketball position is easiest to get injury, we are going to focus on the
attributes “Notes” and “Pos” here. Obviously “Notes” doesn’t have empty
values because this table records the injury description for those players.
However, “Pos” has some empty values. We are considering that because of the
injury, some players don’t have their technical statistics during some year,
which cause the empty values in the table.
Also, in order to get more exploration of the data, we are going to keep
the table “injury” and “Stats” afterward, so we also did data checking for those
two tables.
Data Checking Result2
Table “injury” doesn’t have any empty values, but “Stats” has some
empty values. After checking the table “Stats”, we notice that some attributes
are not recorded in the technical statistics before 1978. That is why we have
many empty values in this table, but it doesn’t matter, we just need the players’
technical statistics after 2010. So far, our data wrangling and checking work
have done already.
FIT5147 Monash University 30369916
4.Data Exploration
Question1: Which basketball position is the easiest position to get injury.
This question we are going to use R to solve this question.
Injurystat table
If we want to discuss which position is easy to get injury. We are going to
focus on these two rows. We can notice that one player’s position is not fixed.
Aj. Price’s position was PG in some years, he could also be SG for some years.
That is because basketball players’ position can change after a season ends.
Based on these facts, we need to use groupby() and summarize() to get the
result we want.
Group_by() and Summarise() code1
Injurystat2 table
FIT5147 Monash University 30369916
Here we can see, regardless of the count, we can get thee players’ name
and their positions summary. Then we are going to calculate the distribution of
the position.
Group_by() and Summarise() code2
Injurystat4 table
Obviously, we can notice that there are 227 players who got injury are
PF, and 223 players who got injury are SG, 200 players who got injury are SF,
PG and C are 162 and 163. It seems that PF and SG are most two difficult
position in professional basketball.
Question2: How much the injury affects Players’ career.
Actually, this question is a tough question to figure out the affection to all
injured players. If we want to figure out the affection to all players, we need to
generate about 1000 tables or visualization results for that. Therefore, we are
going to research some specific player’s technical statistics to discuss this
question.
In this case, we are going to research about two outstanding players in
NBA, Paul George and Derrick Rose. They were so talented for basketball,
however they got serious injured and had to suspend their basketball career for a
few years. Their basketball careers are worth to discuss.
We are going to use R to generate the technical statistics for Paul George
and Derrick Rose.
FIT5147 Monash University 30369916
Paul George’s technical statistics
Derrick Rose’s technical statistics
In this case we are going to focus on one factor in the tables: Ts.(True
Shooting percentage). This factor refers to measure of shooting efficiency for a
player. One more thing need to be mentioned is that Paul George got serious
injury when 2014 (his leg is broken), and Derrick Rose got serious injury when
2012(torn acl). We can focus on the change of the Ts of them after 2014 and
2012.
We can see the line chart below:
Paul George’s Technical Statistics
Derrick Rose’s Technical Statistics
FIT5147 Monash University 30369916
We can notice that after getting injury, their true shooting percentage got
dropped dramatically. Derrick Rose is a good example to reflect the affection
caused by injury, his true shooting percentage got a bit increased but it would
never get to the peak as before. In fact, the effects of injuries are very long
lasting, not just physically, but also mentally. However, by analysing the change
of Paul George’s true shooting percentage, we can also see another fact. Some
players do can improve themselves after getting injury. He even reached a new
peak on his true shooting percentage.
Question3: At what age does a basketball player begin to slip
This question needs to analyse big amount of data to draw the accurate
conclusion, but in this case, we are only going to talk about three outstanding
players for this question.
We need more attributes to discuss this question, so that we are going to
use tableau to solve this problem.
We are going to set age as the x axis and 2P%(2-point percentage), 3P%(3-
point percentage), and FT%(Free Throw Percentage) as the y axis. If we put
them together and get the average of these values visualization, we can see the
result below.
Shooting Percentage Result(Average)
FIT5147 Monash University 30369916
According to the result we got, we can see that after 30, 2P% and 3P%
were gradually dropping, and it was difficult to get back as before. However,
FT% were always steady during their career. That’s because there is no
disturbing when they free throw. The fact is that 30 is a turning point for an
athlete, they wouldn’t do well in games as before when they after 30.
5.Conclusion
After researching the above questions, we can draw our conclusions for
this project. First, in basketball field, SG(Shooting Guard) and PF(Power
Forward) are the positions which are most likely to get injury. We can’t say
other positions are less likely to get injury, but seems that SG and PF are easy to
get injury. There no more explanations for that based on research. In my
personal opinion, SG and PF would jump more times than other positions, this
may cause they are easier to get injury.
Second, in most cases, after players get injury, they would experience a
hard period to get back to peak. Some of them may be not able to get back to
peak anymore, but some of them may reach to a new peak. However, there is no
doubt that injuries can seriously affect a professional athlete's career. Third, it
seems that 30 is a turning point for a professional player, this conclusion needs
more research to prove that. There is one thing we can be sure is that
professional sports players’ career are not that long as other jobs. Players have
to consider seriously what to do after they retire.
6.Reflection
From this data project exportation, I have learnt the knowledge of the R,
Tableau and Excel data process. I have to say data visualization and exploration
are useful and fun. When I tried to solve the questions in different, I could
always find some fun facts and somethings new for me. Actually, data is a good
tool to measure professional players status. As what I know, ESPN will do a lot
of athlete data research after a season ends. Sometimes ESPN will release some
fun facts of players based on the data analysis. We can see the importance of
data for professional players. Wish every athlete has a perfect career.
Data exploration and visualization are very useful for measuring the facts
happening in this world. Sometimes when we feel confused about something,
we can use this tool to get a better angle to consider things.
FIT5147 Monash University 30369916
7. Bibliography
NBA Advanced Stats. [online] Available at:https://stats.nba.com/leaders/?Season=2019-
20&SeasonType=Regular%20Season [Accessed 18 Sept 2020]