Sunday, 10 August 2014

How to read a VDU graph...

I'm a pretty simple guy. So the stuff that I put onto Virology Down Under's (VDU) blog is usually something I think can be understood by you - my yard stick is that if I can understand it, then I think you can. Sometimes it can get pretty technical though and with things always done in a rush, I don't stop and explain as much as I could. Which is why I value feedback. And I've had some good stuff from @DeclanButlerNat, @JorgeCastillaE and @Moro_Cedric this week. 

Different levels of experience read this blog and my posts on Twitter, so sometimes I direct my graphs towards them. But I do understand that we scientists can be easily carried away by our interests and forget that we're quite used to interpreting our own presentation styles in a certain and speedy way. We've had lots of experience doing it that way. I can change a tyre (as I was reminded a couple of nights ago, at midnight) but I couldn't fix my engine.

At the heart of reading a graph is this fact: you have to look at the axes to understand what the lines or bars or areas mean. Once you know the style, you can understand it at a glance - but first time, examine it with care. If it's one of mine, feel free to ask me what I'm trying to show if it is not immediately obvious. I very well may have failed to make it clear.

So this is a little overview of how to read some of the graphs which I use to communicate what I consider to be otherwise yawn-inducing tables of numbers about viral infection and disease numbers.

A picture is worth a thousand words..

This is a good thing because with my lack of typing skills, if I had to type 1,000 word all the time, that would be at least 200 typos. Graphs plot those tabular numbers in a more colourful and visual way. Once you know how to read a graph, they can become powerful and quick ways to get a quick update on the state of play. On VDU the game seems to be about outbreak data. That's just the way things have evolved for me since I first blogged on 28-March 2013. This includes graphing the number of people with disease (cases), changes in the number of cases, numbers that are suspected versus the number that are actually laboratory confirmed (my currency), dates of onset illness (favoured piece of data and the hardest to come by publicly), the numbers who die, the proportion (%) of all cases/detections who die, dates when disease was reported, sex, age and all of that can be plotted on graphs by day, week, month or year.

Interpreting a basic graph on VDU...

The graph below (Graph 1) comes from following Middle East respiratory syndrome (MERS) public data. It shows the key parts of the structure of the graphs - the axes (the horizontal and vertical lines that are the key to reading the plotted numbers) and the axes.

  • A basic graph has a bottom horizontal line called the x-axis and it has a vertical line on the side called the y-axis. These are used to tell you what the numbers plotted on the graph mean; they are a key to the placement of each point on a graph, according to at least 2 different values.
  • Each point on a graph represents a coordinate. Its made up of an x-axis values (abscissa) and a y-axis value (ordinate). For example we plot 50 cases reported on Thursday or 50 on the y-axis and Thursday one the x-axis (x,y)
  • The points that we plot as pairs of x and y data can be joined up and shown as a line (the area underneath the line can also be coloured in which looks like a mountain that may have peaks and troughs) or they can be plotted as bars. There are other ways too - but I keep it simple. Joining up these dots is not always accurate - we may have no idea what is really happening to the numbers between any 2 points, in that case a bar graph may be more realistic as it shows the numbers at a distinct point in time. Sometimes bar graphs don't work from a formatting perspective (eg bars get so skinny you can't see them). Other times, joining the dots reveals the trends (the general direction that events are heading even if we don't know the values). Trends are useful in infectious disease as they show what has happened and what the latest data mean in the context of what has come before - so not too unrealistic. Some of this is about being accurate while not being too overly obsessive.

The particular example graph I've included below (Graph 1)  is a little trickier than some because it has 2 y-axes (vertical lines) - a primary (left-hand side) and a secondary (right-hand side). Some of the numbers are plotted against the primary y-axis (left vertical line) and some against the secondary y-axis (the right hand vertical line). This lets me "double-dip" on shared x-axis numbers, in this case, dates. I'm graphing the course of 2 different things (number of actual cases by day of illness onset) and the number of reported detected by date. These are 2 different things that have dates in common. 

This graph lets us compare, using the same x-axis, what the MERS case numbers look like when they are plotted by the day the people were reported to have become ill compared to the date of public reporting of the cases. There are differences that become more clear when you can run the 2 lines on the same graph, that may be a bit harder to see when they are plotted on 2 separate graphs. This graph highlights that when cases become ill and when they are reported are different things. It also shows that there were a bunch of cases (113) reported in 1 day that have never been given dates of illness onset (or hospitalization or the date they were each reported to the Ministry of Health). It also makes use of the 2 y-axes to have different scales. The primary or left-hand y-axis goes up to 35 while the secondary or right-hand y-axis maxes out at 120. If the same axis values were used, the illness onset cases would mostly be hard to see.


Graph 1. The basics of a graph.
What about cumulative graphs? What are they and how do I interpret those?

The next graph is made to show cases piling up over time (Graph 2). This is the graph that sparked this blog. It plots numbers as a line graph but instead of showing the value at that timepoint (day, week, month, year), it adds the new number to sum of all the previous numbers. It is plotting a cumulative tally, so it will always be a hill with an upwards (left-to-right, bottom to top) slope except when there are no new cases to add, when the curve becomes parallel to the x-axis - a flat line. How steep that line is can tells us how rapidly cases are piling up. That can also be fudged if you present the chart with a very short or long x-axis.

  • In the case of the Zaire ebolavirus outbreak in West Africa, we have the unusual ability to compare numbers from multiple countries at the same time, and use the same x-axis. Here, we show the date when the World Health Organization's Disease Outbreak News update was released. Sadly for us graph addicts, this doesn't include any illness onset dates, but the WHO do have those data and plot it themselves here (1).
  • A steep slope indicates a rapid rise in cases and this results from a lot of new cases being added in a short period of time.
  • A near flat or horizontal slope to the line shows that there are not many new cases being added. 
  • In this graph we also show multiple lines plotted using the primary (left) x-axis to present how much and at what rate the total suspect, probable and laboratory confirmed case numbers are piling up (pink) as well as how the deaths from among that number are changing (blue line) and how many of the cases are being laboratory confirmed (green line) as due to the virus suspected of being the cause. This last one is important as it gives a glimpse of how the laboratory network is coping, perhaps how specimen access is going and how much faith to put in the other two totals. Why are we worried about the result totals? Because many other things can look like Ebola virus disease (EVD) early on, and even later in the disease course. A laboratory test is the only way to be certain that the patient had that virus.
  • Nigeria's numbers look to be rising alarmingly fast. Relative to each other they are, but compared to the dozens of new EVD cases being added between reports in other countries, it is still a small (although still very bad for Nigeria!) increase. This highlights that care is needed when reading charts. Perhaps also an understanding that between different outbreaks, the rate of new cases being added is disease specific. Lots of influenzavirus detections during flu season is what we expect, any ebolavirus cases are not what we expect nor what we want to see. Context. A hard thing to account for and probably a matter of experience.
Graph 2. The cumulative case graph. Adding new numbers to the sum of all the numbers that came before. 
Click on image to enlarge.

Graph 3. Changing the scale. Raising the primary y-axis (left) scale to 750, the level of the other country graphs, makes Nigeria's case numbers look tiny. But it underestimates the impact of the localised spread of Zaire ebolavirus in an are that was not part of the outbreak until a case flew in and spread it. Changing the scale is not just whimsical decision making, it can highlight the importance of events that may otherwise go unnoticed.
Click on image to enlarge.

Take care when interpreting a graph - look at the axes and also use your noodle

Finally, I'm going to look at the way in which I present the numbers I plot on a graph. I'm using the cumulative case chart for Liberia as my example (Graph 4 collection). Its the same one used in Graph 3 - the only thing different is that I've dragged the x-axis to the left (shrunk) or to the right (stretched) to see what that does. 
  • The line plots look more or less steep when you shrink or stretch the x-axis, respectively. But the numbers have not changed. Possibly, our interpretation of them has, as a result of seeing the slope change. Remember though, check the axes. If you look at the x-axis, the shrunken version shows that those cases have climbed over a longer period than the slope suggests. Always check the denominator (the y of x/y) when you think about slope. Equally, the flatter curves of the stretched out x-axis, at the bottom of the Graph 4 collection, have to be looked at in context with time. The dates have been dragged out to what may be an unreasonable length, which makes the slopes look less; but they are still steeper in July than they were in April. Look around the graph for comparison. 
  • As I said above, the current multi-country outbreak lets us compare and so we can see that some areas are adding new cases very rapidly between each report (Liberia and Sierra Leone) while others (Guinea) are not adding as many as quickly. Nigeria looks to have jumped quickly but that is also because of the altered scale (discussed above) 
  • On VDU I get around this by also adding charts that plot total numbers per day or week or month or year. This shows a more discrete series of data that grow or shrink as the outbreak peaks or resolves. The 2 peaks of influenza A(H7N9) virus outbreaks illustrate this nicely - especially when combined with a cumulative case chart (Graph 5)!
  • There is no real right or wrong here (although there are pixel width constraints)- but don't let your perceptions fool you when looking at someone's graphs for the first time. Take some time to really look at the graphs.
Graph 4 collection. Stretching the x-axis can seem like stretching the truth. But carefully read the axes. Some experience is needed here and ultimately you are at the mercy of the person presenting the data.
Click on image to enlarge.


Graph 5. Influenza A(H7N9) virus outbreak in China during 2013 and 2014. Plotting the numbers discretely (by week) clearly shows the two outbreak peaks (darker blue lines joining the data point dots) and gives valuable context to the cumulative graph in the background (pale blue mountain). This is probably my favourite style of disease numbers graph.
Click on image to enlarge.
I hope that has helped make sense of my graphs, and perhaps those of others too. I'm always on Twitter so hit me up with questions about this or requests for more posts like this, or to tell me whether it was helpful.

References

  1. http://www.who.int/csr/disease/ebola/EVD_WestAfrica_WHO_RiskAssessment_20140624.pdf?ua=1