Identify a chart type that could be used to display different editorial perspectives of your dataset and explain why you felt it to be appropriate.
Identify two other chart types that could show something about your subject matter, though maybe not confined to the data you are looking at. In other words, chart types that could incorporate data not already included in your selected dataset.
Review the classifying chart families in Chapter 6 of your textbook. Select at least one chart type from each of the classifying chart families (CHRTS) that could portray different editorial perspectives about your subject. This may include additional data, not already included in your selected dataset.
Data Visualisation
2
3
Data Visualisation A Handbook for Data Driven Design
Andy Kirk
4
SAGE Publications Ltd
1 Oliver’s Yard
55 City Road
London EC1Y 1SP
SAGE Publications Inc.
2455 Teller Road
Thousand Oaks, California 91320
SAGE Publications India Pvt Ltd
B 1/I 1 Mohan Cooperative Industrial Area
Mathura Road
New Delhi 110 044
SAGE Publications Asia-Pacific Pte Ltd
3 Church Street
#10-04 Samsung Hub
Singapore 049483
5
© Andy Kirk 2016
First published 2016
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers.
Library of Congress Control Number: 2015957322
British Library Cataloguing in Publication data
A catalogue record for this book is available from the British Library
ISBN 978-1-4739-1213-7
ISBN 978-1-4739-1214-4 (pbk)
Editor: Mila Steele
Editorial assistant: Alysha Owen
Production editor: Ian Antcliff
Marketing manager: Sally Ransom
Cover design: Shaun Mercier
Typeset by: C&M Digitals (P) Ltd, Chennai, India
Printed and bound in Great Britain by Bell and Bain Ltd, Glasgow
6
Contents List of Figures with Source Notes Acknowledgements About the Author INTRODUCTION PART A FOUNDATIONS
1 Defining Data Visualisation 2 Visualisation Workflow
PART B THE HIDDEN THINKING 3 Formulating Your Brief 4 Working With Data 5 Establishing Your Editorial Thinking
PART C DEVELOPING YOUR DESIGN SOLUTION 6 Data Representation 7 Interactivity 8 Annotation 9 Colour 10 Composition
PART D DEVELOPING YOUR CAPABILITIES 11 Visualisation Literacy
References Index
7
List of Figures with Source Notes 1.1 A Definition for Data Visualisation 19 1.2 Per Capita Cheese Consumption in the U.S., by Sarah Slobin (Fortune magazine) 20 1.3 The Three Stages of Understanding 22 1.4–6 Demonstrating the Process of Understanding 24–27 1.7 The Three Principles of Good Visualisation Design 30 1.8 Housing and Home Ownership in the UK, by ONS Digital Content Team 33 1.9 Falling Number of Young Homeowners, by the Daily Mail 33 1.10 Gun Deaths in Florida (Reuters Graphics) 34 1.11 Iraq’s Bloody Toll, by Simon Scarr (South China Morning Post) 34 1.12 Gun Deaths in Florida Redesign, by Peter A. Fedewa (@pfedewa) 35 1.13 If Vienna would be an Apartment, by NZZ (Neue Zürcher Zeitung) [Translated] 45 1.14 Asia Loses Its Sweet Tooth for Chocolate, by Graphics Department (Wall Street Journal) 45 2.1 The Four Stages of the Visualisation Workflow 54 3.1 The ‘Purpose Map’ 76 3.2 Mizzou’s Racial Gap Is Typical On College Campuses, by FiveThirtyEight 77 3.3 Image taken from ‘Wealth Inequality in America’, by YouTube user ‘Politizane’ (www.youtube.com/watch?v=QPKKQnijnsM) 78 3.4 Dimensional Changes in Wood, by Luis Carli (luiscarli.com) 79 3.5 How Y’all, Youse and You Guys Talk, by Josh Katz (The New York Times) 80 3.6 Spotlight on Profitability, by Krisztina Szücs 81 3.7 Countries with the Most Land Neighbours 83 3.8 Buying Power: The Families Funding the 2016 Presidential Election, by Wilson Andrews, Amanda Cox, Alicia DeSantis, Evan Grothjan, Yuliya Parshina-Kottas, Graham Roberts, Derek Watkins and Karen Yourish (The New York Times) 84 3.9 Image taken from ‘Texas Department of Criminal Justice’ Website (www.tdcj.state.tx.us/death_row/dr_executed_offenders.html) 86
8
3.10 OECD Better Life Index, by Moritz Stefaner, Dominikus Baur, Raureif GmbH 89 3.11 Losing Ground, by Bob Marshall, The Lens, Brian Jacobs and Al Shaw (ProPublica) 89 3.12 Grape Expectations, by S. Scarr, C. Chan, and F. Foo (Reuters Graphics) 91 3.13 Keywords and Colour Swatch Ideas from Project about Psychotherapy Treatment in the Arctic 92 3.14 An Example of a Concept Sketch, by Giorgia Lupi of Accurat 92 4.1 Example of a Normalised Dataset 99 4.2 Example of a Cross-tabulated Dataset 100 4.3 Graphic Language: The Curse of the CEO, by David Ingold and Keith Collins (Bloomberg Visual Data), Jeff Green (Bloomberg News) 101 4.4 US Presidents by Ethnicity (1789 to 2015) 114 4.5 OECD Better Life Index, by Moritz Stefaner, Dominikus Baur, Raureif GmbH 116 4.6 Spotlight on Profitability, by Krisztina Szücs 117 4.7 Example of ‘Transforming to Convert’ Data 119 4.8 Making Sense of the Known Knowns 123 4.9 What Good Marathons and Bad Investments Have in Common, by Justin Wolfers (The New York Times) 124 5.1 The Fall and Rise of U.S. Inequality, in Two Graphs Source: World Top Incomes Database; Design credit: Quoctrung Bui (NPR) 136 5.2–4 Why Peyton Manning’s Record Will Be Hard to Beat, by Gregor Aisch and Kevin Quealy (The New York Times) 138–140 C.1 Mockup Designs for ‘Poppy Field’, by Valentina D’Efilippo (design); Nicolas Pigelet (code); Data source: The Polynational War Memorial, 2014 (poppyfield.org) 146 6.1 Mapping Records and Variables on to Marks and Attributes 152 6.2 List of Mark Encodings 153 6.3 List of Attribute Encodings 153 6.4 Bloomberg Billionaires, by Bloomberg Visual Data (Design and development), Lina Chen and Anita Rundles (Illustration) 155 6.5 Lionel Messi: Games and Goals for FC Barcelona 156 6.6 Image from the Home page of visualisingdata.com 156 6.7 How the Insane Amount of Rain in Texas Could Turn Rhode Island Into a Lake, by Christopher Ingraham (The Washington Post) 156
9
6.8 The 10 Actors with the Most Oscar Nominations but No Wins 161 6.9 The 10 Actors who have Received the Most Oscar Nominations 162 6.10 How Nations Fare in PhDs by Sex Interactive, by Periscopic; Research by Amanda Hobbs; Published in Scientific American 163 6.11 Gender Pay Gap US, by David McCandless, Miriam Quick (Research) and Philippa Thomas (Design) 164 6.12 Who Wins the Stanley Cup of Playoff Beards? by Graphics Department (Wall Street Journal) 165 6.13 For These 55 Marijuana Companies, Every Day is 4/20, by Alex Tribou and Adam Pearce (Bloomberg Visual Data) 166 6.14 UK Public Sector Capital Expenditure, 2014/15 167 6.15 Global Competitiveness Report 2014–2015, by Bocoup and the World Economic Forum 168 6.16 Excerpt from a Rugby Union Player Dashboard 169 6.17 Range of Temperatures (°F) Recorded in the Top 10 Most Populated Cities During 2015 170 6.18 This Chart Shows How Much More Ivy League Grads Make Than You, by Christopher Ingraham (The Washington Post) 171 6.19 Comparing Critics Scores (Rotten Tomatoes) for Major Movie Franchises 172 6.20 A Career in Numbers: Movies Starring Michael Caine 173 6.21 Comparing the Frequency of Words Used in Chapter 1 of this Book 174 6.22 Summary of Eligible Votes in the UK General Election 2015 175 6.23 The Changing Fortunes of Internet Explorer and Google Chrome 176 6.24 Literarcy Proficiency: Adult Levels by Country 177 6.25 Political Polarization in the American Public’, Pew Research Center, Washington, DC (February, 2015) (http://www.people- press.org/2014/06/12/political-polarization-in-the-american-public/) 178 6.26 Finviz (www.finviz.com) 179 6.27 This Venn Diagram Shows Where You Can Both Smoke Weed and Get a Same-Sex Marriage, by Phillip Bump (The Washington Post) 180 6.28 The 200+ Beer Brands of SAB InBev, by Maarten Lambrechts for Mediafin: www.tijd.be/sabinbev (Dutch),
10
www.lecho.be/service/sabinbev (French) 181 6.29 Which Fossil Fuel Companies are Most Responsible for Climate Change? by Duncan Clark and Robin Houston (Kiln), published in the Guardian, drawing on work by Mike Bostock and Jason Davies 182 6.30 How Long Will We Live – And How Well? by Bonnie Berkowitz, Emily Chow and Todd Lindeman (The Washington Post) 183 6.31 Crime Rates by State, by Nathan Yau 184 6.32 Nutrient Contents – Parallel Coordinates, by Kai Chang (@syntagmatic) 185 6.33 How the ‘Avengers’ Line-up Has Changed Over the Years, by Jon Keegan (Wall Street Journal) 186 6.34 Interactive Fixture Molecules, by @experimental361 and @bootifulgame 187 6.35 The Rise of Partisanship and Super-cooperators in the U.S. House of Representatives. Visualisation by Mauro Martino, authored by Clio Andris, David Lee, Marcus J. Hamilton, Mauro Martino, Christian E. Gunning, and John Armistead Selde 188 6.36 The Global Flow of People, by Nikola Sander, Guy J. Abel and Ramon Bauer 189 6.37 UK Election Results by Political Party, 2010 vs 2015 190 6.38 The Fall and Rise of U.S. Inequality, in Two Graphs. Source: World Top Incomes Database; Design credit: Quoctrung Bui (NPR) 191 6.39 Census Bump: Rank of the Most Populous Cities at Each Census, 1790–1890, by Jim Vallandingham 192 6.40 Coal, Gas, Nuclear, Hydro? How Your State Generates Power. Source: U.S. Energy Information Administration, Credit: Christopher Groskopf, Alyson Hurt and Avie Schneider (NPR) 193 6.41 Holdouts Find Cheapest Super Bowl Tickets Late in the Game, by Alex Tribou, David Ingold and Jeremy Diamond (Bloomberg Visual Data) 194 6.42 Crude Oil Prices (West Texas Intermediate), 1985–2015 195 6.43 Percentage Change in Price for Select Food Items, Since 1990, by Nathan Yau 196 6.44 The Ebb and Flow of Movies: Box Office Receipts 1986–2008, by Mathew Bloch, Lee Byron, Shan Carter and Amanda Cox (The New York Times) 197 6.45 Tracing the History of N.C.A.A. Conferences, by Mike Bostock,
11
Shan Carter and Kevin Quealy (The New York Times) 198 6.46 A Presidential Gantt Chart, by Ben Jones 199 6.47 How the ‘Avengers’ Line-up Has Changed Over the Years, by Jon Keegan (Wall Street Journal) 200 6.48 Native and New Berliners – How the S-Bahn Ring Divides the City, by Julius Tröger, André Pätzold, David Wendler (Berliner Morgenpost) and Moritz Klack (webkid.io) 201 6.49 How Y’all, Youse and You Guys Talk, by Josh Katz (The New York Times) 202 6.50 Here’s Exactly Where the Candidates Cash Came From, by Zach Mider, Christopher Cannon, and Adam Pearce (Bloomberg Visual Data) 203 6.51 Trillions of Trees, by Jan Willem Tulp 204 6.52 The Racial Dot Map. Image Copyright, 2013, Weldon Cooper Center for Public Service, Rector and Visitors of the University of Virginia (Dustin A. Cable, creator) 205 6.53 Arteries of the City, by Simon Scarr (South China Morning Post) 206 6.54 The Carbon Map, by Duncan Clark and Robin Houston (Kiln) 207 6.55 Election Dashboard, by Jay Boice, Aaron Bycoffe and Andrei Scheinkman (Huffington Post). Statistical model created by Simon Jackman 208 6.56 London is Rubbish at Recycling and Many Boroughs are Getting Worse, by URBS London using London Squared Map © 2015 www.aftertheflood.co 209 6.57 Automating the Design of Graphical Presentations of Relational Information. Adapted from McKinlay, J. D. (1986). ACM Transactions on Graphics, 5(2), 110–141. 213 6.58 Comparison of Judging Line Size vs Area Size 213 6.59 Comparison of Judging Related Items Using Variation in Colour (Hue) vs Variation in Shape 214 6.60 Illustrating the Correct and Incorrect Circle Size Encoding 216 6.61 Illustrating the Distortions Created by 3D Decoration 217 6.62 Example of a Bullet Chart using Banding Overlays 218 6.63 Excerpt from What’s Really Warming the World? by Eric Roston and Blacki Migliozzi (Bloomberg Visual Data) 218 6.64 Example of Using Markers Overlays 219 6.65 Why Is Her Paycheck Smaller? by Hannah Fairfield and Graham Roberts (The New York Times) 219
12
6.66 Inside the Powerful Lobby Fighting for Your Right to Eat Pizza, by Andrew Martin and Bloomberg Visual Data 220 6.67 Excerpt from ‘Razor Sales Move Online, Away From Gillette’, by Graphics Department (Wall Street Journal) 220 7.1 US Gun Deaths, by Periscopic 225 7.2 Finviz (www.finviz.com) 226 7.3 The Racial Dot Map: Image Copyright, 2013, Weldon Cooper Center for Public Service, Rector and Visitors of the University of Virginia (Dustin A. Cable, creator) 227 7.4 Obesity Around the World, by Jeff Clark 228 7.5 Excerpt from ‘Social Progress Index 2015’, by Social Progress Imperative, 2015 228 7.6 NFL Players: Height & Weight Over Time, by Noah Veltman (noahveltman.com) 229 7.7 Excerpt from ‘How Americans Die’, by Matthew C. Klein and Bloomberg Visual Data 230 7.8 Model Projections of Maximum Air Temperatures Near the Ocean and Land Surface on the June Solstice in 2014 and 2099: NASA Earth Observatory maps, by Joshua Stevens 231 7.9 Excerpt from ‘A Swing of Beauty’, by Sohail Al-Jamea, Wilson Andrews, Bonnie Berkowitz and Todd Lindeman (The Washington Post) 231 7.10 How Well Do You Know Your Area? by ONS Digital Content team 232 7.11 Excerpt from ‘Who Old Are You?’, by David McCandless and Tom Evans 233 7.12 512 Paths to the White House, by Mike Bostock and Shan Carter (The New York Times) 233 7.13 OECD Better Life Index, by Moritz Stefaner, Dominikus Baur, Raureif GmbH 233 7.14 Nobel Laureates, by Matthew Weber (Reuters Graphics) 234 7.15 Geography of a Recession, by Graphics Department (The New York Times) 234 7.16 How Big Will the UK Population be in 25 Years Time? by ONS Digital Content team 234 7.17 Excerpt from ‘Workers’ Compensation Reforms by State’, by Yue Qiu and Michael Grabell (ProPublica) 235 7.18 Excerpt from ‘ECB Bank Test Results’, by Monica Ulmanu, Laura Noonan and Vincent Flasseur (Reuters Graphics) 236 7.19 History Through the President’s Words, by Kennedy Elliott, Ted
13
Mellnik and Richard Johnson (The Washington Post) 237 7.20 Excerpt from ‘How Americans Die’, by Matthew C. Klein and Bloomberg Visual Data 237 7.21 Twitter NYC: A Multilingual Social City, by James Cheshire, Ed Manley, John Barratt, and Oliver O’Brien 238 7.22 Killing the Colorado: Explore the Robot River, by Abrahm Lustgarten, Al Shaw, Jeff Larson, Amanda Zamora and Lauren Kirchner (ProPublica) and John Grimwade 238 7.23 Losing Ground, by Bob Marshall, The Lens, Brian Jacobs and Al Shaw (ProPublica) 239 7.24 Excerpt from ‘History Through the President’s Words’, by Kennedy Elliott, Ted Mellnik and Richard Johnson (The Washington Post) 240 7.25 Plow, by Derek Watkins 242 7.26 The Horse in Motion, by Eadweard Muybridge. Source: United States Library of Congress’s Prints and Photographs division, digital ID cph.3a45870. 243 8.1 Titles Taken from Projects Published and Credited Elsewhere in This Book 248 8.2 Excerpt from ‘The Color of Debt: The Black Neighborhoods Where Collection Suits Hit Hardest’, by Al Shaw, Annie Waldman and Paul Kiel (ProPublica) 249 8.3 Excerpt from ‘Kindred Britain’ version 1.0 © 2013 Nicholas Jenkins – designed by Scott Murray, powered by SUL-CIDR 249 8.4 Excerpt from ‘The Color of Debt: The Black Neighborhoods Where Collection Suits Hit Hardest’, by Al Shaw, Annie Waldman and Paul Kiel (ProPublica) 250 8.5 Excerpt from ‘Bloomberg Billionaires’, by Bloomberg Visual Data (Design and development), Lina Chen and Anita Rundles (Illustration) 251 8.6 Excerpt from ‘Gender Pay Gap US?’, by David McCandless, Miriam Quick (Research) and Philippa Thomas (Design) 251 8.7 Excerpt from ‘Holdouts Find Cheapest Super Bowl Tickets Late in the Game’, by Alex Tribou, David Ingold and Jeremy Diamond (Bloomberg Visual Data) 252 8.8 Excerpt from ‘The Life Cycle of Ideas’, by Accurat 252 8.9 Mizzou’s Racial Gap Is Typical On College Campuses, by FiveThirtyEight 253 8.10 Excerpt from ‘The Infographic History of the World’, Harper Collins (2013); by Valentina D’Efilippo (co-author and designer);
14
James Ball (co-author and writer); Data source: The Polynational War Memorial, 2012 254 8.11 Twitter NYC: A Multilingual Social City, by James Cheshire, Ed Manley, John Barratt, and Oliver O’Brien 255 8.12 Excerpt from ‘US Gun Deaths’, by Periscopic 255 8.13 Image taken from Wealth Inequality in America, by YouTube user ‘Politizane’ (www.youtube.com/watch?v=QPKKQnijnsM) 256 9.1 HSL Colour Cylinder: Image from Wikimedia Commons published under the Creative Commons Attribution-Share Alike 3.0 Unported license 265 9.2 Colour Hue Spectrum 265 9.3 Colour Saturation Spectrum 266 9.4 Colour Lightness Spectrum 266 9.5 Excerpt from ‘Executive Pay by the Numbers’, by Karl Russell (The New York Times) 267 9.6 How Nations Fare in PhDs by Sex Interactive, by Periscopic; Research by Amanda Hobbs; Published in Scientific American 268 9.7 How Long Will We Live – And How Well? by Bonnie Berkowitz, Emily Chow and Todd Lindeman (The Washington Post) 268 9.8 Charting the Beatles: Song Structure, by Michael Deal 269 9.9 Photograph of MyCuppa mug, by Suck UK (www.suck.uk.com/products/mycuppamugs/) 269 9.10 Example of a Stacked Bar Chart Based on Ordinal Data 270 9.11 Rim Fire – The Extent of Fire in the Sierra Nevada Range and Yosemite National Park, 2013: NASA Earth Observatory images, by Robert Simmon 270 9.12 What are the Current Electricity Prices in Switzerland [Translated], by Interactive things for NZZ (the Neue Zürcher Zeitung) 271 9.13 Excerpt from ‘Obama’s Health Law: Who Was Helped Most’, by Kevin Quealy and Margot Sanger-Katz (The New York Times) 272 9.14 Daily Indego Bike Share Station Usage, by Randy Olson (@randal_olson) (http://www.randalolson.com/2015/09/05/visualizing-indego-bike- share-usage-patterns-in-philadelphia-part-2/) 272 9.15 Battling Infectious Diseases in the 20th Century: The Impact of Vaccines, by Graphics Department (Wall Street Journal) 273 9.16 Highest Max Temperatures in Australia (1st to 14th January 2013), Produced by the Australian Government Bureau of
15
Meteorology 274 9.17 State of the Polar Bear, by Periscopic 275 9.18 Excerpt from Geography of a Recession by Graphics Department (The New York Times) 275 9.19 Fewer Women Run Big Companies Than Men Named John, by Justin Wolfers (The New York Times) 276 9.20 NYPD, Council Spar Over More Officers by Graphics Department (Wall Street Journal) 277 9.21 Excerpt from a Football Player Dashboard 277 9.22 Elections Performance Index, The Pew Charitable Trusts © 2014 278 9.23 Art in the Age of Mechanical Reproduction: Walter Benjamin by Stefanie Posavec 279 9.24 Casualties, by Stamen, published by CNN 279 9.25 First Fatal Accident in Spain on a High-speed Line [Translated], by Rodrigo Silva, Antonio Alonso, Mariano Zafra, Yolanda Clemente and Thomas Ondarra (El Pais) 280 9.26 Lunge Feeding, by Jonathan Corum (The New York Times); whale illustration by Nicholas D. Pyenson 281 9.27 Examples of Common Background Colour Tones 281 9.28 Excerpt from NYC Street Trees by Species, by Jill Hubley 284 9.29 Demonstrating the Impact of Red-green Colour Blindness (deuteranopia) 286 9.30 Colour-blind Friendly Alternatives to Green and Red 287 9.31 Excerpt from, ‘Pyschotherapy in The Arctic’, by Andy Kirk 289 9.32 Wind Map, by Fernanda Viégas and Martin Wattenberg 289 10.1 City of Anarchy, by Simon Scarr (South China Morning Post) 294 10.2 Wireframe Sketch, by Giorgia Lupi for ‘Nobels no degree’ by Accurat 295 10.3 Example of the Small Multiples Technique 296 10.4 The Glass Ceiling Persists Redesign, by Francis Gagnon (ChezVoila.com) based on original by S. Culp (Reuters Graphics) 297 10.5 Fast-food Purchasers Report More Demands on Their Time, by Economic Research Service (USDA) 297 10.6 Stalemate, by Graphics Department (Wall Street Journal) 297 10.7 Nobels No Degrees, by Accurat 298 10.8 Kasich Could Be The GOP’s Moderate Backstop, by FiveThirtyEight 298
16
10.9 On Broadway, by Daniel Goddemeyer, Moritz Stefaner, Dominikus Baur, and Lev Manovich 299 10.10 ER Wait Watcher: Which Emergency Room Will See You the Fastest? by Lena Groeger, Mike Tigas and Sisi Wei (ProPublica) 300 10.11 Rain Patterns, by Jane Pong (South China Morning Post) 300 10.12 Excerpt from ‘Pyschotherapy in The Arctic’, by Andy Kirk 301 10.13 Gender Pay Gap US, by David McCandless, Miriam Quick (Research) and Philippa Thomas (Design) 301 10.14 The Worst Board Games Ever Invented, by FiveThirtyEight 303 10.15 From Millions, Billions, Trillions: Letters from Zimbabwe, 2005−2009, a book written and published by Catherine Buckle (2014), table design by Graham van de Ruit (pg. 193) 303 10.16 List of Chart Structures 304 10.17 Illustrating the Effect of Truncated Bar Axis Scales 305 10.18 Excerpt from ‘Doping under the Microscope’, by S. Scarr and W. Foo (Reuters Graphics) 306 10.19 Record-high 60% of Americans Support Same-sex Marriage, by Gallup 306 10.20 Images from Wikimedia Commons, published under the Creative Commons Attribution-Share Alike 3.0 Unported license 308 11.1–7 The Pursuit of Faster’ by Andy Kirk and Andrew Witherley 318–324
17
Acknowledgements
This book has been made possible thanks to the unwavering support of my incredible wife, Ellie, and the endless encouragement from my Mum and Dad, the rest of my brilliant family and my super group of friends.
From a professional standpoint I also need to acknowledge the fundamental role played by the hundreds of visualisation practitioners (no matter under what title you ply your trade) who have created such a wealth of brilliant work from which I have developed so many of my convictions and formed the basis of so much of the content in this book. The people and organisations who have provided me with permission to use their work are heroes and I hope this book does their rich talent justice.
18
About the Author
Andy Kirk is a freelance data visualisation specialist based in Yorkshire, UK. He is a visualisation design consultant, training provider, teacher, researcher, author, speaker and editor of the award-winning website visualisingdata.com After graduating from Lancaster University in 1999 with a BSc (hons) in Operational Research, Andy held a variety of business analysis and information management positions at organisations including West Yorkshire Police and the University of Leeds. He discovered data visualisation in early 2007 just at the time when he was shaping up his proposal for a Master’s (MA) Research Programme designed for members of staff at the University of Leeds. On completing this programme with distinction, Andy’s passion for the subject was unleashed. Following his graduation in December 2009, to continue the process of discovering and learning the subject he launched visualisingdata.com, a blogging platform that would chart the ongoing development of the data visualisation field. Over time, as the field has continued to grow, the site too has reflected this, becoming one of the most popular in the field. It features a wide range of fresh content profiling the latest projects and contemporary techniques, discourse about practical and theoretical matters, commentary about key issues, and collections of valuable references and resources. In 2011 Andy became a freelance professional focusing on data visualisation consultancy and training workshops. Some of his clients include CERN, Arsenal FC, PepsiCo, Intel, Hershey, the WHO and McKinsey. At the time of writing he has delivered over 160 public and private training events across the UK, Europe, North America, Asia, South Africa and Australia, reaching well over 3000 delegates. In addition to training workshops Andy also has two academic teaching positions. He joined the highly respected Maryland Institute College of Art (MICA) as a visiting lecturer in 2013 and has been teaching a module on the Information Visualisation Master’s Programme since its inception. In January 2016, he began teaching a data visualisation module as part of the MSc in Business Analytics at the Imperial College Business School in London.
19
Between 2014 and 2015 Andy was an external consultant on a research project called ‘Seeing Data’, funded by the Arts & Humanities Research Council and hosted by the University of Sheffield. This study explored the issues of data visualisation literacy among the general public and, among many things, helped to shape an understanding of the human factors that affect visualisation literacy and the effectiveness of design.
20
Introduction
I.1 The Quest Begins In his book The Seven Basic Plots, author Christopher Booker investigated the history of telling stories. He examined the structures used in biblical teachings and historical myths through to contemporary storytelling devices used in movies and TV. From this study he found seven common themes that, he argues, can be identifiable in any form of story.
One of these themes was ‘The Quest’. Booker describes this as revolving around a main protagonist who embarks on a journey to acquire a treasured object or reach an important destination, but faces many obstacles and temptations along the way. It is a theme that I feel shares many characteristics with the structure of this book and the nature of data visualisation.
You are the central protagonist in this story in the role of the data visualiser. The journey you are embarking on involves a route along a design workflow where you will be faced with a wide range of different conceptual, practical and technical challenges. The start of this journey will be triggered by curiosity, which you will need to define in order to accomplish your goals. From this origin you will move forward to initiating and planning your work, defining the dimensions of your challenge. Next, you will begin the heavy lifting of working with data, determining what qualities it contains and how you might share these with others. Only then will you be ready to take on the design stage. Here you will be faced with the prospect of handling a spectrum of different design options that will require creative and rational thinking to resolve most effectively.
The multidisciplinary nature of this field offers a unique opportunity and challenge. Data visualisation is not an especially difficult capability to acquire, it is largely a game of decisions. Making better decisions will be your goal but sometimes clear decisions will feel elusive. There will be occasions when the best choice is not at all visible and others when there will be many seemingly equal viable choices. Which one to go with? This book aims to be your guide, helping you navigate efficiently through these
21
difficult stages of your journey.
You will need to learn to be flexible and adaptable, capable of shifting your approach to suit the circumstances. This is important because there are plenty of potential villains lying in wait looking to derail progress. These are the forces that manifest through the imposition of restrictive creative constraints and the pressure created by the relentless ticking clock of timescales. Stakeholders and audiences will present complex human factors through the diversity of their needs and personal traits. These will need to be astutely accommodated. Data, the critical raw material of this process, will dominate your attention. It will frustrate and even disappoint at times, as promises of its treasures fail to materialise irrespective of the hard work, love and attention lavished upon it.
Your own characteristics will also contribute to a certain amount of the villainy. At times, you will find yourself wrestling with internal creative and analytical voices pulling against each other in opposite directions. Your excitably formed initial ideas will be embraced but will need taming. Your inherent tastes, experiences and comforts will divert you away from the ideal path, so you will need to maintain clarity and focus.
The central conflict you will have to deal with is the notion that there is no perfect in data visualisation. It is a field with very few ‘always’ and ‘nevers’. Singular solutions rarely exist. The comfort offered by the rules that instruct what is right and wrong, good and evil, has its limits. You can find small but legitimate breaking points with many of them. While you can rightly aspire to reach as close to perfect as possible, the attitude of aiming for good enough will often indeed be good enough and fundamentally necessary.
In accomplishing the quest you will be rewarded with competency in data visualisation, developing confidence in being able to judge the most effective analytical and design solutions in the most efficient way. It will take time and it will need more than just reading this book. It will also require your ongoing effort to learn, apply, reflect and develop. Each new data visualisation opportunity poses a new, unique challenge. However, if you keep persevering with this journey the possibility of a happy ending will increase all the time.
I.2 Who is this Book Aimed at? 22
The primary challenge one faces when writing a book about data visualisation is to determine what to leave in and what to leave out. Data visualisation is big. It is too big a subject even to attempt to cover it all, in detail, in one book. There is no single book to rule them all because there is no one book that can cover it all. Each and every one of the topics covered by the chapters in this book could (and, in several cases, do) exist as whole books in their own right.
The secondary challenge when writing a book about data visualisation is to decide how to weave all the content together. Data visualisation is not rocket science; it is not an especially complicated discipline. Lots of it, as you will see, is rooted in common sense. It is, however, certainly a complex subject, a semantic distinction that will be revisited later. There are lots of things to think about and decide on, as well as many things to do and make. Creative and analytical sensibilities blend with artistic and scientific judgments. In one moment you might be checking the statistical rigour of your calculations, in the next deciding which tone of orange most elegantly contrasts with an 80% black. The complexity of data visualisation manifests itself through how these different ingredients, and many more, interact, influence and intersect to form the whole.
The decisions I have made in formulating this book‘s content have been shaped by my own process of learning about, writing about and practising data visualisation for, at the time of writing, nearly a decade. Significantly – from the perspective of my own development – I have been fortunate to have had extensive experience designing and delivering training workshops and postgraduate teaching. I believe you only truly learn about your own knowledge of a subject when you have to explain it and teach it to others.
I have arrived at what I believe to be an effective and proven pedagogy that successfully translates the complexities of this subject into accessible, practical and valuable form. I feel well qualified to bridge the gap between the large population of everyday practitioners, who might identify themselves as beginners, and the superstar technical, creative and academic minds that are constantly pushing forward our understanding of the potential of data visualisation. I am not going to claim to belong to that latter cohort, but I have certainly been the former – a beginner – and most of my working hours are spent helping other beginners start their journey. I know the things that I would have valued when I was starting out and I
23
know how I would have wished them to be articulated and presented for me to develop my skills most efficiently.
There is a large and growing library of fantastic books offering many different theoretical and practical viewpoints on the subject of data visualisation. My aim is to bring value to this existing collection of work by taking on a particular perspective that is perhaps under-represented in other texts – exploring the notion and practice of a visualisation design process. As I have alluded to in the opening, the central premise of this book is that the path to mastering data visualisation is achieved by making better decisions: effective choices, efficiently made. The book’s central goal is to help develop your capability and confidence in facing these decisions.
Just as a single book cannot cover the whole of this subject, it stands that a single book cannot aim to address directly the needs of all people doing data visualisation. In this section I am going to run through some of the characteristics that shape the readers to whom this book is primarily targeted. I will also put into context the content the book will and will not cover, and why. This will help manage your expectations as the reader and establish its value proposition compared with other titles.
Domain and Duties The core audiences for whom this book has been primarily written are undergraduate and postgraduate-level students and early career researchers from social science subjects. This reflects a growing number of people in higher education who are interested in and need to learn about data visualisation.
Although aimed at social sciences, the content will also be relevant across the spectrum of academic disciplines, from the arts and humanities right through to the formal and natural sciences: any academic duty where there is an emphasis on the use of quantitative and qualitative methods in studies will require an appreciation of good data visualisation practices. Where statistical capabilities are relevant so too is data visualisation.
Beyond academia, data visualisation is a discipline that has reached mainstream consciousness with an increasing number of professionals and organisations, across all industry types and sizes, recognising the
24
importance of doing it well for both internal and external benefit. You might be a market researcher, a librarian or a data analyst looking to enhance your data capabilities. Perhaps you are a skilled graphic designer or web developer looking to take your portfolio of work into a more data- driven direction. Maybe you are in a managerial position and not directly involved in the creation of visualisation work, but you need to coordinate or commission others who will be. You require awareness of the most efficient approaches, the range of options and the different key decision points. You might be seeking generally to improve the sophistication of the language you use around commissioning visualisation work and to have a better way of expressing and evaluating work created for you.
Basically, anyone who is involved in whatever capacity with the analysis and visual communication of data as part of their professional duties will need to grasp the demands of data visualisation and this book will go some way to supporting these needs.
Subject Neutrality One of the important aspects of the book will be to emphasise that data visualisation is a portable practice. You will see a broad array of examples of work from different industries, covering very different topics. What will become apparent is that visualisation techniques are largely subject-matter neutral: a line chart that displays the ebb and flow of favourable opinion towards a politician involves the same techniques as using a line chart to show how a stock has changed in value over time or how peak temperatures have changed across a season in a given location. A line chart is a line chart, regardless of the subject matter. The context of the viewers (such as their needs and their knowledge) and the specific meaning that can be drawn will inevitably be unique to each setting, but the role of visualisation itself is adaptable and portable across all subject areas.
Data visualisation is an entirely global concern, not focused on any defined geographic region. Although the English language dominates the written discourse (books, websites) about this subject, the interest in it and visible output from across the globe are increasing at a pace. There are cultural matters that influence certain decisions throughout the design process, especially around the choices made for colour usage, but otherwise it is a discipline common to all.
25
Level and Prerequisites The coverage of this book is intended to serve the needs of beginners and those with intermediate capability. For most people, this is likely to be as far as they might ever need to go. It will offer an accessible route for novices to start their learning journey and, for those already familiar with the basics, there will be content that will hopefully contribute to fine- tuning their approaches.
For context, I believe the only distinction between beginner and intermediate is one of breadth and depth of critical thinking rather than any degree of difficulty. The more advanced techniques in visualisation tend to be associated with the use of specific technologies for handling larger, complex datasets and/or producing more bespoke and feature-rich outputs.
This book is therefore not aimed at experienced or established visualisation practitioners. There may be some new perspectives to enrich their thinking, some content that will confirm and other content that might constructively challenge their convictions. Otherwise, the coverage in this book should really echo the practices they are likely to be already observing.
As I have already touched on, data visualisation is a genuinely multidisciplinary field. The people who are active in this field or profession come from all backgrounds – everyone has a different entry point and nobody arrives with all constituent capabilities. It is therefore quite difficult to define just what are the right type and level of pre- existing knowledge, skills or experiences for those learning about data visualisation. As each year passes, the savvy-ness of the type of audience this book targets will increase, especially as the subject penetrates more into the mainstream. What were seen as bewilderingly new techniques several years ago are now commonplace to more people.
That said, I think the following would be a fair outline of the type and shape of some of the most important prerequisite attributes for getting the most out of this book:
Strong numeracy is necessary as well as a familiarity with basic statistics. While it is reasonable to assume limited prior knowledge of data
26
visualisation, there should be a strong desire to want to learn it. The demands of learning a craft like data visualisation take time and effort; the capabilities will need nurturing through ongoing learning and practice. They are not going to be achieved overnight or acquired alone from reading this book. Any book that claims to be able magically to inject mastery through just reading it cover to cover is over-promising and likely to under-deliver. The best data visualisers possess inherent curiosity. You should be the type of person who is naturally disposed to question the world around them or can imagine what questions others have. Your instinct for discovering and sharing answers will be at the heart of this activity. There are no expectations of your having any prior familiarity with design principles, but a desire to embrace some of the creative aspects presented in this book will heighten the impact of your work. Unlock your artistry! If you are somebody with a strong creative flair you are very fortunate. This book will guide you through when and crucially when not to tap into this sensibility. You should be willing to increase the rigour of your analytical decision making and be prepared to have your creative thinking informed more fundamentally by data rather than just instinct. A range of technical skills covering different software applications, tools and programming languages is not expected for this book, as I will explain next, but you will ideally have some knowledge of basic Excel and some experience of working with data.
I.3 Getting the Balance
Handbook vs Tutorial Book The description of this book as being a ‘handbook’ positions it as being of practical help and presented in accessible form. It offers direction with comprehensive reference – more of a city guidebook for a tourist than an instruction manual to fix a washing machine. It will help you to know what things to think about, when to think about them, what options exist and how best to resolve all the choices involved in any data-driven design.
Technology is the key enabler for working with data and creating
27
visualisation design outputs. Indeed, apart from a small proportion of artisan visualisation work that is drawn by hand, the reliance on technology to create visualisation work is an inseparable necessity. For many there is a understandable appetite for step-by-step tutorials that help them immediately to implement data visualisation techniques via existing and new tools.
However, writing about data visualisation through the lens of selected tools is a bit of a minefield, given the diversity of technical options out there and the mixed range of skills, access and needs. I greatly admire those people who have authored tutorial-based texts because they require astute judgement about what is the right level, structure and scope.
The technology space around visualisation is characterised by flux. There are the ongoing changes with the enhancement of established tools as well as a relatively high frequency of new entrants offset by the decline of others. Some tools are proprietary, others are open source; some are easier to learn, others require a great deal of understanding before you can even consider embarking on your first chart. There are many recent cases of applications or services that have enjoyed fleeting exposure before reaching a plateau: development and support decline, the community of users disperses and there is a certain expiry of value. Deprecation of syntax and functions in programming languages requires the perennial updating of skills.
All of this perhaps paints a rather more chaotic picture than is necessarily the case but it justifies the reasons why this book does not offer teaching in the use of any tools. While tutorials may be invaluable to some, they may also only be mildly interesting to others and possibly of no value to most. Tools come and go but the craft remains. I believe that creating a practical, rather than necessarily a technical, text that focuses on the underlying craft of data visualisation with a tool-agnostic approach offers an effective way to begin learning about the subject in appropriate depth. The content should be appealing to readers irrespective of the extent of their technical knowledge (novice to advanced technicians) and specific tool experiences (e.g. knowledge of Excel, Tableau, Adobe Illustrator).
There is a role for all book types. Different people want different sources of insight at different stages in their development. If you are seeking a text that provides in-depth tutorials on a range of tools or pages of programmatic instruction, this one will not be the best choice. However, if
28
you consult only tutorial-related books, the chances are you will likely fall short on the fundamental critical thinking that will be needed in the longer term to get the most out of the tools with which you develop strong skills.
To substantiate the book’s value, the digital companion resources to this book will offer a curated, up-to-date collection of visualisation technology resources that will guide you through the most common and valuable tools, helping you to gain a sense of what their roles are and where these fit into the design workflow. Additionally, there will be recommended exercises and many further related digital materials available for exploring.
Useful vs Beautiful Another important distinction to make is that this book is not intended to be seen as a beauty pageant. I love flicking through those glossy ‘coffee table’ books as much as the next person; such books offer great inspiration and demonstrate some of the finest work in the field. This book serves a very different purpose. I believe that, as a beginner or relative beginner on this learning journey, the inspiration you need comes more from understanding what is behind the thinking that makes these amazing works succeed and others not.
My desire is to make this the most useful text available, a reference that will spend more time on your desk than on your bookshelf. To be useful is to be used. I want the pages to be dog-eared. I want to see scribbles and annotated notes made across its pages and key passages underlined. I want to see sticky labels peering out above identified pages of note. I want to see creases where pages have been folded back or a double-page spread that has been weighed down to keep it open. In time I even want its cover reinforced with wallpaper or wrapping paper to ensure its contents remain bound together. There is every intention of making this an elegantly presented and packaged book but it should not be something that invites you to ‘look, but don’t touch’.
Pragmatic vs Theoretical The content of this book has been formed through many years of absorbing knowledge from all manner of books, generations of academic papers, thousands of web articles, hundreds of conference talks, endless online and
29
personal discussions, and lots of personal practice. What I present here is a pragmatic translation and distillation of what I have learned down the years.
It is not a deeply academic or theoretical book. Where theoretical context and reference is relevant it will be signposted as I do want to ground this book in as much evidenced-based content as possible; it is about judging what is going to add most value. Experienced practitioners will likely have an appetite for delving deeper into theoretical discourse and the underlying sciences that intersect in this field but that is beyond the scope of this particular text.
Take the science of visual perception, for example. There is no value in attempting to emulate what has already been covered by other books in greater depth and quality than I could achieve. Once you start peeling back the many different layers of topics like visual and cognitive science the boundaries of your interest and their relevance to data visualisation never seem to arrive. You get swallowed up by the depth of these subjects. You realise that you have found yourself learning about what the very concept of light and sight is and at that point your brain begins to ache (well, mine does at least), especially when all you set out to discover was if a bar chart would be better than a pie chart.
An important reason for giving greater weight to pragmatism is because of people: people are the makers, the stakeholders, the audiences and the critics in data visualisation. Although there are a great deal of valuable research-driven concepts concerning data visualisation, their practical application can be occasionally at odds with the somewhat sanitised and artificial context of the research methods employed. To translate them into real-world circumstances can sometimes be easier said than done as the influence of human factors can easily distort the significance of otherwise robust ideas.
I want to remove the burden from you as a reader having to translate relevant theoretical discourse into applicable practice. Critical thinking will therefore be the watchword, equipping you with the independence of thought to decide rationally for yourself what the solutions are that best fit your context, your data, your message and your audience. To do this you will need an appreciation of all the options available to you (the different things you could do) and a reliable approach for critically determining what choices you should make (the things you will do and why).
30
Contemporary vs Historical This book is not going to look too far back into the past. We all respect the ancestors of this field, the great names who, despite primitive means, pioneered new concepts in the visual display of statistics to shape the foundations of the field being practised today. The field’s lineage is decorated by the influence of William Playfair’s first ever bar chart, Charles Joseph Minard’s famous graphic about Napoleon’s Russian campaign, Florence Nightingale’s Coxcomb plot and John Snow’s cholera map. These are some of the totemic names and classic examples that will always be held up as the ‘firsts’. Of course, to many beginners in the field, this historical context is of huge interest. However, again, this kind of content has already been superbly covered by other texts on more than enough occasions. Time to move on.
I am not going to spend time attempting to enlighten you about how we live in the age of ‘Big Data’ and how occupations related to data are or will be the ‘sexiest jobs’ of our time. The former is no longer news, the latter claim emerged from a single source. I do not want to bloat this book with the unnecessary reprising of topics that have been covered at length elsewhere. There is more valuable and useful content I want you to focus your time on.
The subject matter, the ideas and the practices presented here will hopefully not date a great deal. Of course, many of the graphic examples included in the book will be surpassed by newer work demonstrating similar concepts as the field continues to develop. However, their worth as exhibits of a particular perspective covered in the text should prove timeless. As more research is conducted in the subject, without question there will be new techniques, new concepts, new empirically evidenced principles that emerge. Maybe even new rules. There will be new thought- leaders, new sources of reference, new visualisers to draw insight from. New tools will be created, existing tools will expire. Some things that are done and can only be done by hand as of today may become seamlessly automated in the near future. That is simply the nature of a fast-growing field. This book can only be a line in the sand.
Analysis vs Communication
31
A further important distinction to make concerns the subtle but significant difference between visualisations which are used for analysis and visualisations used for communication.
Before a visualiser can confidently decide what to communicate to others, he or she needs to have developed an intimate understanding of the qualities and potential of the data. This is largely achieved through exploratory data analysis. Here, the visualiser and the viewer are the same person. Through visual exploration, different interrogations can be pursued ‘on the fly’ to unearth confirmatory or enlightening discoveries about what insights exist.
Visualisation techniques used for analysis will be a key component of the journey towards creating visualisation for communication but the practices involved differ. Unlike visualisation for communication, the techniques used for visual analysis do not have to be visually polished or necessarily appealing. They are only serving the purpose of helping you to truly learn about your data. When a data visualisation is being created to communicate to others, many careful considerations come into play about the requirements and interests of the intended or expected audience. This has a significant influence on many of the design decisions you make that do not exist alone with visual analysis.
Exploratory data analysis is a huge and specialist subject in and of itself. In its most advanced form, working efficiently and effectively with large complex data, topics like ‘machine learning’, using self-learning algorithms to help automate and assist in the discovery of patterns in data, become increasingly relevant. For the scope of this book the content is weighted more towards methods and concerns about communicating data visually to others. If your role is in pure data science or statistical analysis you will likely require a deeper treatment of the exploratory data analysis topic than this book can reasonably offer. However, Chapter 4 will cover the essential elements in sufficient depth for the practical needs of most people working with data.
Print vs Digital The opportunity to supplement the print version of this book with an e- book and further digital companion resources helps to cushion the agonising decisions about what to leave out. This text is therefore
32
enhanced by access to further digital resources, some of which are newly created, while others are curated references from the endless well of visualisation content on the Web. Included online (book.visualisingdata.com) will be:
a completed case-study project that demonstrates the workflow activities covered in this book, including full write-ups and all related digital materials; an extensive and up-to-date catalogue of over 300 data visualisation tools; a curated collection of tutorials and resources to help develop your confidence with some of the most common and valuable tools; practical exercises designed to embed the learning from each chapter; further reading resources to continue learning about the subjects covered in each chapter.
I.4 Objectives Before moving on to an outline of the book’s contents, I want to share four key objectives that I hope to accomplish for you by the final chapter. These are themes that will run through the entire text: challenge, enlighten, equip and inspire.
To challenge you I will be encouraging you to recognise that your current thinking about visualisation may need to be reconsidered, both as a creator and as a consumer. We all arrive in visualisation from different subject and domain origins and with that comes certain baggage and prior sensibilities that can distort our perspectives. I will not be looking to eliminate these, rather to help you harness and align them with other traits and viewpoints.
I will ask you to relentlessly consider the diverse decisions involved in this process. I will challenge your convictions about what you perceive to be good or bad, effective or ineffective visualisation choices: arbitrary choices will be eliminated from your thinking. Even if you are not necessarily a beginner, I believe the content you read in this book will make you question some of your own perspectives and assumptions. I will encourage you to reflect on your previous work, asking you to consider how and why you have designed visualisations in the way that you have: where do you need to improve? What can you do better?
33
It is not just about creating visualisations, I will also challenge your approach to reading visualisations. This is not something you might usually think much about, but there is an important role for more tactical approaches to consuming visualisations with greater efficiency and effectiveness.
To enlighten you will be to increase your awareness of the possibilities in data visualisation. As you begin your discovery of data visualisation you might not be aware of the whole: you do not entirely know what options exist, how they are connected and how to make good choices. Until you know, you don’t know – that is what the objective of enlightening is all about.
As you will discover, there is a lot on your plate, much to work through. It is not just about the visible end-product design decisions. Hidden beneath the surface are many contextual circumstances to weigh up, decisions about how best to prepare your data, choices around the multitude of viable ways of slicing those data up into different angles of analysis. That is all before you even reach the design stage, where you will begin to consider the repertoire of techniques for visually portraying your data – the charts, the interactive features, the colours and much more besides.
This book will broaden your visual vocabulary to give you more ways of expressing your data visually. It will enhance the sophistication of your decision making and of visual language for any of the challenges you may face.
To equip is to ensure you have robust tactics for managing your way through the myriad options that exist in data visualisation. The variety it offers makes for a wonderful prospect but, equally, introduces the burden of choice. This book aims to make the challenge of undertaking data visualisation far less overwhelming, breaking down the overall prospect into smaller, more manageable task chunks.
The structure of this book will offer a reliable and flexible framework for thinking, rather than rules for learning. It will lead to better decisions. With an emphasis on critical thinking you will move away from an over- reliance on gut feeling and taste. To echo what I mentioned earlier, its role as a handbook will help you know what things to think about, when to think about them and how best to resolve all the thinking involved in any data-driven design challenge you meet.
34
To inspire is to give you more than just a book to read. It is the opening of a door into a subject to inspire you to step further inside. It is about helping you to want to continue to learn about it and expose yourself to as much positive influence as possible. It should elevate your ambition and broaden your capability.
It is a book underpinned by theory but dominated by practical and accessible advice, including input from some of the best visualisers in the field today. The range of print and digital resources will offer lots of supplementary material including tutorials, further reading materials and suggested exercises. Collectively this will hopefully make it one of the most comprehensive, valuable and inspiring titles out there.
I.5 Chapter Contents The book is organised into four main parts (A, B, C and D) comprising eleven chapters and preceded by the ‘Introduction’ sections you are reading now.
Each chapter opens with an introductory outline that previews the content to be covered and provides a bridge between consecutive chapters. In the closing sections of each chapter the most salient learning points will be summarised and some important, practical tips and tactics shared. As mentioned, online there will be collections of practical exercises and further reading resources recommended to substantiate the learning from the chapter.
Throughout the book you will see sidebar captions that will offer relevant references, aphorisms, good habits and practical tips from some of the most influential people in the field today.
Introduction This introduction explains how I have attempted to make sense of the complexity of the subject, outlining the nature of the audience I am trying to reach, the key objectives, what topics the book will be covering and not covering, and how the content has been organised.
35
Part A: Foundations Part A establishes the foundation knowledge and sets up a key reference of understanding that aids your thinking across the rest of the book. Chapter 1 will be the logical starting point for many of you who are new to the field to help you understand more about the definitions and attributes of data visualisation. Even if you are not a complete beginner, the content of the chapter forms the terms of reference that much of the remaining content is based on. Chapter 2 prepares you for the journey through the rest of the book by introducing the key design workflow that you will be following.
Chapter 1: Defining Data Visualisation
Defining data visualisation: outlining the components of thinking that make up the proposed definition for data visualisation. The importance of conviction: presenting three guiding principles of good visualisation design: trustworthy, accessible and elegant. Distinctions and glossary: explaining the distinctions and overlaps with other related disciplines and providing a glossary of terms used in this book to establish consistency of language.
Chapter 2: Visualisation Workflow
The importance of process: describing the data visualisation design workflow, what it involves and why a process approach is required. The process in practice: providing some useful tips, tactics and habits that transcend any particular stage of the process but will best prepare you for success with this activity.
Part B: The Hidden Thinking Part B discusses the first three preparatory stages of the data visualisation design workflow. ‘The hidden thinking’ title refers to how these vital activities, that have a huge influence over the eventual design solution, are somewhat out of sight in the final output; they are hidden beneath the surface but completely shape what is visible. These stages represent the often neglected contextual definitions, data wrangling and editorial challenges that are so critical to the success or otherwise of any
36
visualisation work – they require a great deal of care and attention before you switch your attention to the design stage.
Chapter 3: Formulating Your Brief
What is a brief?: describing the value of compiling a brief to help initiate, define and plan the requirements of your work. Establishing your project’s context: defining the origin curiosity or motivation, identifying all the key factors and circumstances that surround your work, and defining the core purpose of your visualisation. Establishing your project’s vision: early considerations about the type of visualisation solution needed to achieve your aims and harnessing initial ideas about what this solution might look like.
Chapter 4: Working With Data
Data literacy: establishing a basic understanding with this critical literacy, providing some foundation understanding about datasets and data types and some observations about statistical literacy. Data acquisition: outlining the different origins of and methods for accessing your data. Data examination: approaches for acquainting yourself with the physical characteristics and meaning of your data. Data transformation: optimising the condition, content and form of your data fully to prepare it for its analytical purpose. Data exploration: developing deeper intimacy with the potential qualities and insights contained, and potentially hidden, within your data.
Chapter 5: Establishing Your Editorial Thinking
What is editorial thinking?: defining the role of editorial thinking in data visualisation. The influence of editorial thinking: explaining how the different dimensions of editorial thinking influence design choices.
Part C: Developing Your Design Solution
37
Part C is the main part of the book and covers progression through the data visualisation design and production stage. This is where your concerns switch from hidden thinking to visible thinking. The individual chapters in this part of the book cover each of the five layers of the data visualisation anatomy. They are treated as separate affairs to aid the clarity and organisation of your thinking, but they are entirely interrelated matters and the chapter sequences support this. Within each chapter there is a consistent structure beginning with an introduction to each design layer, an overview of the many different possible design options, followed by detailed guidance on the factors that influence your choices.
The production cycle: describing the cycle of development activities that take place during this stage, giving a context for how to work through the subsequent chapters in this part.
Chapter 6: Data Representation
Introducing visual encoding: an overview of the essentials of data representation looking at the differences and relationships between visual encoding and chart types. Chart types: a detailed repertoire of 49 different chart types, profiled in depth and organised by a taxonomy of chart families: categorical, hierarchical, relational, temporal, and spatial. Influencing factors and considerations: presenting the factors that will influence the suitability of your data representation choices.
Chapter 7: Interactivity
The features of interactivity:
Data adjustments: a profile of the options for interactively interrogating and manipulating data. View adjustments: a profile of the options for interactively configuring the presentation of data.
Influencing factors and considerations: presenting the factors that will influence the suitability of your interactivity choices.
Chapter 8: Annotation
38
The features of annotation:
Project annotation: a profile of the options for helping to provide viewers with general explanations about your project. Chart annotation: a profile of the annotated options for helping to optimise viewers’ understanding your charts.
Influencing factors and considerations: presenting the factors that will influence the suitability of your annotation choices.
Chapter 9: Colour
The features of colour:
Data legibility: a profile of the options for using colour to represent data. Editorial salience: a profile of the options for using colour to direct the eye towards the most relevant features of your data. Functional harmony: a profile of the options for using colour most effectively across the entire visualisation design.
Influencing factors and considerations: presenting the factors that will influence the suitability of your colour choices.
Chapter 10: Composition
The features of composition:
Project composition: a profile of the options for the overall layout and hierarchy of your visualisation design. Chart composition: a profile of the options for the layout and hierarchy of the components of your charts.
Influencing factors and considerations: presenting the factors that will influence the suitability of your composition choices.
Part D: Developing Your Capabilities Part D wraps up the book’s content by reflecting on the range of capabilities required to develop confidence and competence with data
39
visualisation. Following completion of the design process, the multidisciplinary nature of this subject will now be clearly established. This final part assesses the two sides of visualisation literacy – your role as a creator and your role as a viewer – and what you need to enhance your skills with both.
Chapter 11: Visualisation Literacy
Viewing: Learning to see: learning about the most effective strategy for understanding visualisations in your role as a viewer rather than a creator. Creating: The capabilities of the visualiser: profiling the skill sets, mindsets and general attributes needed to master data visualisation design as a creator.
40
Part A Foundations
41
1 Defining Data Visualisation
This opening chapter will introduce you to the subject of data visualisation, defining what data visualisation is and is not. It will outline the different ingredients that make it such an interesting recipe and establish a foundation of understanding that will form a key reference for all of the decision making you are faced with.
Three core principles of good visualisation design will be presented that offer guiding ideals to help mould your convictions about distinguishing between effective and ineffective in data visualisation.
You will also see how data visualisation sits alongside or overlaps with other related disciplines, and some definitions about the use of language in this book will be established to ensure consistency in meaning across all chapters.
1.1 The Components of Understanding To set the scene for what is about to follow, I think it is important to start this book with a proposed definition for data visualisation (Figure 1.1). This definition offers a critical term of reference because its components and their meaning will touch on every element of content that follows in this book. Furthermore, as a subject that has many different proposed definitions, I believe it is worth clarifying my own view before going further:
Figure 1.1 A Definition for Data Visualisation
42
At first glance this might appear to be a surprisingly short definition: isn’t there more to data visualisation than that, you might ask? Can nine words sufficiently articulate what has already been introduced as an eminently complex and diverse discipline?
I have arrived at this after many years of iterations attempting to improve the elegance of my definition. In the past I have tried to force too many words and too many clauses into one statement, making it cumbersome and rather undermining its value. Over time, as I have developed greater clarity in my own convictions, I have in turn managed to establish greater clarity about what I feel is the real essence of this subject. The definition above is, I believe, a succinct and practically useful description of what the pursuit of visualisation is truly about. It is a definition that largely informs the contents of this book. Each chapter will aim to enlighten you about different aspects of the roles of and relationships between each component expressed. Let me introduce and briefly examine each of these one by one, explaining where and how they will be discussed in the book.
Firstly, data, our critical raw material. It might appear a formality to mention data in the definition for, after all, we are talking about data visualisation as opposed to, let’s say, cheese visualisation (though visualisation of data using cheese has happened, see Figure 1.2), but it needs to be made clear the core role that data has in the design process. Without data there is no visualisation; indeed there is no need for one. Data plays the fundamental role in this work, so you will need to give it your undivided attention and respect. You will discover in Chapter 4 the importance of developing an intimacy with your data to acquaint yourself with its physical properties, its meaning and its potential qualities.
43
Figure 1.2 Per Capita Cheese Consumption in the US
Data is names, amounts, groups, statistical values, dates, comments, locations. Data is textual and numeric in format, typically held in datasets in table form, with rows of records and columns of different variables.
This tabular form of data is what we will be considering as the raw form of data. Through tables, we can look at the values contained to precisely read them as individual data points. We can look up values quite efficiently, scanning across many variables for the different records held. However, we cannot easily establish the comparative size and relationship between multiple data points. Our eyes and mind are not equipped to translate easily the textual and numeric values into quantitative and qualitative meaning. We can look at the data but we cannot really see it without the context of relationships that help us compare and contrast them effectively with other values. To derive understanding from data we need to see it represented in a different, visual form. This is the act of data representation.
This word representation is deliberately positioned near the front of the definition because it is the quintessential activity of data visualisation design. Representation concerns the choices made about the form in which your data will be visually portrayed: in lay terms, what chart or charts you will use to exploit the brain’s visual perception capabilities most effectively.
When data visualisers create a visualisation they are representing the data they wish to show visually through combinations of marks and attributes. Marks are points, lines and areas. Attributes are the appearance properties
44
of these marks, such as the size, colour and position. The recipe of these marks and their attributes, along with other components of apparatus, such as axes and gridlines, form the anatomy of a chart.
In Chapter 6 you will gain a deeper and more sophisticated appreciation of the range of different charts that are in common usage today, broadening your visual vocabulary. These charts will vary in complexity and composition, with each capable of accommodating different types of data and portraying different angles of analysis. You will learn about the key ingredients that shape your data representation decisions, explaining the factors that distinguish the effective from the ineffective choices.
Beyond representation choices, the presentation of data concerns all the other visible design decisions that make up the overall visualisation anatomy. This includes choices about the possible applications of interactivity, features of annotation, colour usage and the composition of your work. During the early stages of learning this subject it is sensible to partition your thinking about these matters, treating them as isolated design layers. This will aid your initial critical thinking. Chapters 7–10 will explore each of these layers in depth, profiling the options available and the factors that influence your decisions.
However, as you gain in experience, the interrelated nature of visualisation will become much more apparent and you will see how the overall design anatomy is entirely connected. For instance, the selection of a chart type intrinsically leads to decisions about the space and place it will occupy; an interactive control may be included to reveal an annotated caption; for any design property to be even visible to the eye it must possess a colour that is different from that of its background.
The goal expressed in this definition states that data visualisation is about facilitating understanding. This is very important and some extra time is required to emphasise why it is such an influential component in our thinking. You might think you know what understanding means, but when you peel back the surface you realise there are many subtleties that need to be acknowledged about this term and their impact on your data visualisation choices. Understanding ‘understanding’ (still with me?) in the context of data visualisation is of elementary significance.
When consuming a visualisation, the viewer will go through a process of understanding involving three stages: perceiving, interpreting and
45
comprehending (Figure 1.3). Each stage is dependent on the previous one and in your role as a data visualiser you will have influence but not full control over these. You are largely at the mercy of the viewer – what they know and do not know, what they are interested in knowing and what might be meaningful to them – and this introduces many variables outside of your control: where your control diminishes the influence and reliance on the viewer increases. Achieving an outcome of understanding is therefore a collective responsibility between visualiser and viewer.
These are not just synonyms for the same word, rather they carry important distinctions that need appreciating. As you will see throughout this book, the subtleties and semantics of language in data visualisation will be a recurring concern.
Figure 1.3 The Three Stages of Understanding
Let’s look at the characteristics of the different stages that form the process of understanding to help explain their respective differences and mutual dependencies.
Firstly, perceiving. This concerns the act of simply being able to read a chart. What is the chart showing you? How easily can you get a sense of the values of the data being portrayed?
Where are the largest, middle-sized and smallest values? What proportion of the total does that value hold? How do these values compare in ranking terms? To which other values does this have a connected relationship?
The notion of understanding here concerns our attempts as viewers to
46
efficiently decode the representations of the data (the shapes, the sizes and the colours) as displayed through a chart, and then convert them into perceived values: estimates of quantities and their relationships to other values.
Interpreting is the next stage of understanding following on from perceiving. Having read the charts the viewer now seeks to convert these perceived values into some form of meaning:
Is it good to be big or better to be small? What does it mean to go up or go down? Is that relationship meaningful or insignificant? Is the decline of that category especially surprising?
The viewer’s ability to form such interpretations is influenced by their pre- existing knowledge about the portrayed subject and their capacity to utilise that knowledge to frame the implications of what has been read. Where a viewer does not possess that knowledge it may be that the visualiser has to address this deficit. They will need to make suitable design choices that help to make clear what meaning can or should be drawn from the display of data. Captions, headlines, colours and other annotated devices, in particular, can all be used to achieve this.
Comprehending involves reasoning the consequence of the perceiving and interpreting stages to arrive at a personal reflection of what all this means to them, the viewer. How does this information make a difference to what was known about the subject previously?
Why is this relevant? What wants or needs does it serve? Has it confirmed what I knew or possibly suspected beforehand or enlightened me with new knowledge? Has this experience impacted me in an emotional way or left me feeling somewhat indifferent as a consequence? Does the context of what understanding I have acquired lead me to take action – such as make a decision or fundamentally change my behaviour – or do I simply have an extra grain of knowledge the consequence of which may not materialise until much later?
Over the page is a simple demonstration to further illustrate this process of understanding. In this example I play the role of a viewer working with a sample isolated chart (Figure 1.4). As you will learn throughout the design
47
chapters, a chart would not normally just exist floating in isolation like this one does, but it will serve a purpose for this demonstration.
Figure 1.4 shows a clustered bar chart that presents a breakdown of the career statistics for the footballer Lionel Messi during his career with FC Barcelona.
The process commences with perceiving the chart. I begin by establishing what chart type is being used. I am familiar with this clustered bar chart approach and so I quickly feel at ease with the prospect of reading its display: there is no learning for me to have to go through on this occasion, which is not always the case as we will see.