My name is Duncan Bradley. I'm at the University of Manchester. And I'm going to talk today about my work on the psychology of data visualisation and some empirical work on how we process a magnitude in data visualisation. And just to say, to start off with this is work that's either in preparation or under review.

Data visualisations work by encoding values, using visual features and there are countless ways to visualize the same data. By this animation here. But some choices can result in misleading charts. Like, if you use eight bananas to represent sixteen bananas, because one banana represents two of the fruit.

It's not just bad design choices that can interfere with interpretation. Sometimes it's cognitive biases as well.

For an example of that: If I show you these two histograms, and ask which one has the larger mean perhaps your first impression would be that it's the one on the right, because the bars are taller in reality, because histograms plot the measured values on the x-axis then these two have the same overall mean value. If you wanted to have the larger mean value on the one on the right, then you'd see something like this instead.

So there's a cognitive bias going on here that we've already identified. We can take another example as well.

If you think about bar charts. We just consider one bar to start off with and you ask people which of these two data points is most likely to have come from the population the results in this average displayed in the bar. Lots of people will suggest that the point that occurs within the bar is more likely to produce that average bar rather than the one above, even though they're actually coming from exactly the same distance away from the mean.

And this is known as the within-bar bias. And that's why I think it's important to study cognitive processing with regard to data visualisations. It's not just deliberate deception, improper designs are the issues when it comes to processing. And I'd like to suggest that misleading charts and ineffective charts are united by the fact that they both don't take into account cognitive processing.

So this is why I study cognitive processes and try and answer the question - how is data comprehended? And then hopefully, from this we can start to work out what might mislead people and what might result in effective communication. And this involves using techniques from experimental psychology, using careful experimental design, highly controlled stimuli, multiple observations, and appropriate statistical analysis as well.

So why do we need to do this? Why don't we just ask people what they like? This graphic comes from a paper that came out in the '90s where a group of researchers asked clinicians to perform some tasks with data, and they gave them four different displays with exactly the same data, and they measured the number of correct decisions they made.

There is a data there in a table, data in a pie chart, data in a bar graph and data in an icon array. All exactly the same data but displayed differently. And it was found that the icon arrays were the most accurate. But zero out of the 34 participants in the study preferred this method of displaying the data. And eight expressed what the office called 'considerable contempt'. I can imagine you can work out what that might involve. So that's why we use this experimental technique.

One design choice that is often made when designing a graph is deciding on the axis range. Here's a simple chart, showing the relationship between hours spent studying and final test score. And there's a nice big increase with increase in time spent studying. We can also show this chart with the y-axis starting at zero and ending at 100, which shows a much shallower gradient but it is obviously exactly the same data displayed here.

And studies have found that if you manipulate the y-axis like this, people's impressions in the data broadly reflect what is being displayed. And so if it looks like there's a big difference between values, people will conclude that there is a big difference. And if it looks like there's not, then people will conclude that there's only a small difference.

So this has led some people to produce graphics like this which on first impression, it looks like a truncated y-axis where the axis doesn't start at zero is just as likely to mislead people than one which is not truncated until you notice that the axis starts at 98% down there, so actually 99% of the time they're suggesting people are misled by this truncated y-axis.

I think the story is not actually as simple as that, because if we look at this plot of average global temperature by year. We're doing what's been suggested here, and when we're not truncating the y-axis, it's starting at zero.

But what this does, is it doesn't really take into account the meaning of this, and it doesn't really show the appropriate context, assessing a global temperature, which is actually that a small increase in temperature, when you consider on a large scale, is actually really meaningful and a really important difference. And so this chart on the right is a better display of the change in global temperature, even though it's got this truncated y-axis.

So if we revisit this graphic that likes to pop up again on Twitter. It doesn't really seem that it's 99% of the time the axis truncated is misleading people. And that really just depends on what the appropriate axis is for the data, whether it's going to be misleading or not. So perhaps a better graphic is this.

So just to summarize that research. Changing the axis range influences interpretations in the magnitude of the differences between values. And that occurs in both line charts and for bar charts as well. And it doesn't seem to be eliminated by warnings that you tell people about the issues with truncation, with axis range. It doesn't really seem to make this effect go away.

So we've just seen some work about how you might change the axes to communicate differences between values, trends and that sort of thing. But we can also think about changing the axes to communicate magnitude as well.

I think this graph in the New York Times here is a good example of that. It shows the number of black members of the US Senate over time, but rather than just the y-axis terminating just above the highest plotted values, it extends all the way to the maximum number of senators, which is 100 and it gives the impression of the very small magnitude. The data points appear low down and the blank space highlights this magnitude.

And it's not just small magnitude as well. This chart from the Financial Times uses the y-axis where 100 is the same level as observed previously, and you can see in cases like Latvia and Romania and Bulgaria actually extending way beyond that 100% level because the cases or deaths or hospitalisations have gone beyond the peak. It really conveys this large magnitude.

And this is what my empirical work has looked at. This is an example of data displayed in two different ways, exactly the same data, but it's either plotted at low physical position right down the bottom of the graph because of the axis range, or the same data plotted at a high physical position because the axis has been transposed. And again, it's the same dataset in both versions of each graph.

We asked "How are interpretations of magnitude affected by axis range?" and we showed people the two versions of each graph over 40 experimental trials, we collected participants data (recruited them) using prolific.co. 150 participants in total and they're asked to assess these various scenarios, involving some sort of risk.

So here's an example of the trial. They're asked to consider the risk of experiencing heavy rainfall. And the three data points represent three particular days that they might go camping, and they're asked, in this first Likert scale if you camp on one of these days, what is the chance you experience heavy rainfall?

So they're being asked to rate the magnitude of the data points, they're also asked to rate the severity of the consequences (how bad it would be), but for the purposes of this talk I'll just talk about their ratings of the magnitude of the data points.

We pre-registered our hypotheses and our analysis plan and broadly we predicted that when data points were presented high up then it would result in judgments of greater magnitude, compared to when they're presented low down, even though it's the same data points being displayed.

We analysed this with cumulative link mixed-effects models, which are a type of model appropriate to Likert scale data. If you analyse Likert scale data with a metric model that assumes that each of the points has an equal distance between them, it can result in sometimes spurious findings, so a cumulative link mixed-effects model analysis doesn't make this assumption.

And I used this package to identify the most complex random effect structures that will converge and then remove those which aren't contributing to explaining variance in the ratings.

This here is a visualization of the data from this experiment. I'll remind you of the Likert scale here the red and the orange are representing impressions of large magnitudes and then down on the left side the greens are representing small magnitudes.

So if we look at data points at high positions, we see that there's a larger proportion of values associated with greater magnitudes (when the data points are presented high up) and there's a larger proportion of values associated with smaller magnitudes (when the data are presented low down).

And when we model this, we can discern in fact data points presented high up in a graph are associated with higher magnitudes than data points presented low down. And this is a statistically significant effect.

What's driving this effect? There's actually two cues to magnitude here in these groups. One of them is just the absolute position. When they presented high up - people associate things that are high up with more and it's the position in physical space that it's providing this cue to magnitude.

There's actually another cue as well, which is the relative position of the data points when you consider them in the context of the axis. So it's their position relative to other values along the axis. So these data points aren't as high up, but they're higher than other possible values that could appear in the chart. Or the ones on the right are lower than other possible values the could appear.

But we obviously can't differentiate these two explanations. So what we did here, is we introduced inverted charts, where you can now still have data points that are high up and low down but now you have this distinction where the data points that are low down actually, the values are higher than the other plausible values so we can help to differentiate these two possible explanations for what's driving this effect.

And that's what we did in experiment two. We manipulated the physical position as before, but also manipulated the axis orientation. There was 120 participants in 24 trials, so we'd expect if it's absolute position is really what's driving this impression of magnitude we'd expect there to be no interaction, regardless of the orientation we'd expect data points that are high up to be associated with higher values and data points that are low down to be associated with low values. We'd expect a crossover if relative positioning is what's driving this effect.

Here's what we found for the conventional orientation, same results as before, where there's a larger proportion of ratings at the high end of the scale, associated with high magnitudes for data points that were present high and the opposite for data points that were presented low down and then for the inverted access, we start to see this mirroring of this effect or actually that there is a greater proportion of data points associated with high magnitudes for data points that are presented low down and the opposite for data points presented high up.

And we see this crossover interaction that we expected. That's a significant interaction.

But whilst we find that there's a significant difference between the data points presented in different conditions with a conventional axis, we don't find that significant difference for an inverted axis. And this could be because we had a fewer number of participants, and fewer experimental trials for this experiment compared to the previous experiment.

So, in order to work out whether this is a genuine effect, we replicated experiment one, but using just inverted axis scales, so now a larger number of trials, just to see whether experiment one (with conventional axes) and experiment three (with inverted axes) whether when you have the same amount of data you still see these effects coming out.

To remind you, at the top there is the data from experiment two and in experiment three we see a much stronger version of the same results. And then when we model it, we see this come out and there's a significant difference.

So, going back to this question before - what's driving this effect? Is it just the way that the data points appear, or is it the axis range that that's causing people to make this judgment. And it seems that relative position is the explanation for why this is occurring. So people are paying attention in a way to some sort of implicit context provided by the axis range.

So there are couple of the issues with this in terms of applying it to the way that people really design data visualisation. I think the biggest one is the axis limits here were arbitrary. There wasn't any real reason to set the axis limits the way we did, other than to make data points appear large and small.

And the second is that we only used risk scenarios. We only asked people to assess the data about risks which we all know cause people should be to be biased in the way that they think.

So to follow this up, how do we get more realistic axis limits? How do we get axis limits that people are more likely to use? So that we're presenting participants with charts they are likely to see in the real world. Well I think, one of the realistic axis limits is just the default settings that people are probably not that likely to change.

So this is the default settings for ggplot2 if you just build your bar chart then the data points are displayed like this. Whereas, if you put in a custom upper limit with just a single line of code exactly the same data points appear very different.

So here I think we've got more realistic axis limits. And so we can say that the one on the left is truncated whereas the one on the right is extended. It's not the same truncation as before with the zero being cut off. But we cut it off from the top, so that we are not displaying any values above those, even if those values could have occurred.

We used a wider range of scenarios with 150 participants and 32 experiments. This is what we ask people to look at so they were told in both the truncated and extended charts that there was a denominator supplied which was how many in each group could have been observed versus how many actually were observed, which is what is displayed here.

So there could have been 400 in online yoga class A but we actually only observed maybe 75 or 76 people saying that they felt more flexible. And they were asked to rate how effective the yoga classes were overall in this example.

They were asked to make a broad judgment about the data which essentially reflects the map. So in this case with the yoga classes, if not many people say they feel more flexible after class, then that's less effective than if lot's of people say that they felt more flexible.

So here's our two versions of the stimuli again. And here's what we observed. Each of these data points represents a rating on the scale. So we see this bimodal distribution, that it could be the data that we presented had this bimodal distribution but what we do see is that there's a sort of heavier tail towards the low magnitude end where people were giving ratings with smaller magnitudes for data points presented in the extended range condition where they were made to appear small in the context.

And we also see over here there's a greater proportion of people giving very high magnitude ratings when the data points are made to appear as if they're large. That's a significant difference between those two conditions.

So far, we looked at both at position encoding types, and either position or length encoding types with the dot plots and the bar charts but there's lots of other ways of representing or encoding data like direction or area or angle or colour.

With the exception of colour, these are all geometric encodings. The next step is to see does this also generalise to non-geometric encodings and to look at colour, which is a different way of displaying this data.

Here's an example of how you might do this. This map is showing the amount of support for federal ban on abortion in different states of the US. Whereas this chart might normally be designed to show the differences between different states.

What this colour legend at the bottom does here is it shows the entire possible range of values. And you see that actually there's no more than 30% in any given state, which shows how there is a relative similarity across each different state. And it just goes to show how small the magnitude is - how low the support is for that ban. So that's for small magnitudes. 

We can also see this example, that highlights how the colour legend might help to represent large magnitudes. This is where the colour legend was set-up and fixed to represent between 0 and 30 positive tests per 100,000 inhabitants. So when the rate went to 135 per 100,000 that it went literally so off the scale and every area in the map appeared to be this very dark blue which gives this impression of very large magnitude, whereas actually perhaps in another country where there was higher rates of Covid, then this would seem like a small number, because the scale would have been designed differently.

So we have two exactly the same maps here. There's no difference between these two maps. On the left hand side, the largest value in the data set appears at the extreme end of the colour legend. Whereas on the right hand side that's, that same largest value appears just halfway through the colour legend and there's this extension of the colour legend to greater values that don't appear in the dataset.

So we can call this truncated version and extended version. So we ran this experiment with 160 participants and 48 experimental trials files each, and we asked them to assess this data as if it was pollution data and to assess - if you saw a map like this, how urgently do you think pollution should be dealt with in these regions. And then we could compare how they respond to the truncated version and extended version.

So this is what they saw. This map shows the levels of a certain type of pollution. 4 geographic regions. And they're asked - how urgently should pollution levels in these regions be address? And they respond to the scale, according to what they think the data shows.

So we've got our two versions here. And the distribution of urgency ratings is quite stark here. If you show, if the data appears such that it's the largest in the scale, then it appears large to people. Whereas if it's not there, if it's in the middle of the scale, then people don't see it as representing a higher magnitude, they don't rate it as urgent. And that's a statistically significant difference.

So if this seems familiar, it may be because it is. We talk in psychology about framing effects, about the idea of presenting information, in different contexts, and that context providing some sort of influence on the way that that information is interpreted. So the surrounding context has an effect on the takeaway message.

The visual analogue of this is the Eddinghaus Illusion where with the same visual information, these two orange circles appear different to someone depending on the context that surrounds them.

And we have this in language as well. If you ask people at how well a participant in an election did. If they got "almost 500 votes" people think that that person did quite well. But if they got "only 500 votes" in the election even though the number is pretty much identical, "only 500 votes" seems worse. It's the surrounding context here, it's the framing of the information that seems to be affecting people's interpretations.

And this, I think, opens up the possibility that there might be other cognitive biases that affect interpretation of information presented in data visualisations.

A remaining question, I think, and I'd be interested to hear people's thoughts on this is, given that the experiment with the bar charts here and the experiment with the colour maps here, given that they are essentially performing - it's the same manipulation that occurs in both. Why is it that we see such a stark difference below, but only a subtle difference above?

And this raises the question, how much does axis range affect judgments of magnitude when we expand this to all of the visualisations that people could create, how much can one of expect their manipulation of the axis range to actually affect people's interpretations of the information?

And I suspect that this potentially isn't actually to do with encoding types here, it's to do with whether people are aware of the denominator, whether they are aware of what the context was outside of the visualisation.

As I mentioned before, they were given a denominator in the case of the bar charts. They were essentially rating that within the context of: "given that four hundred people each were asked this question," "and this is how many people responded in this way."

They had this awareness of the denominator whereas they weren't given that with the pollution data. So perhaps this is a future direction to manipulate them to work out whether this is affecting peoples interpretation of magnitude.

So in terms of key points: Different displays of the same data can for provoke different interpretations, and we studied cognitive processing in order to provide insight into how people comprehend these displays of data.

My work is illustrated that the inferences of magnitude are to some extent informed by the axis limits and that it's the relative position of data points on the axis not their physical positions, but the relative positions that seems to influence the judgements. But the strength of this association seems to vary.

And with that I'd like to thank you all for listening, and I'd be happy to take questions.