In a previous post, I argued that colors are best used when mapped to quantitative variables in natural ways. I use this technique in work for my dissertation. Jon Peltier disagreed and we had a discussion in the comments section here. Jon made several good points, and I have tried to address some of his concerns. Below I discuss the progress I've made and compare the old and new figures.
I'm looking at the dynamics of campaigns, particularly how candidates interact with each other and with voters. In one component, I'm using data from the Wisconsin Advertising Project, which tracks each ad aired over the course of the campaign. The unit of analysis is the individual ad! This obviously requires some aggregation and extensive data manipulation that creates opportunity for lots of errors. The data are also extremely complicated. Plots are ideal tools in this situation. I've spent a lot of time lately looking at plots of the data in different media markets, at different levels of aggregation, etc.
I'm trying to compare the negativity of television advertising by each campaign, but also paying attention to the rate of advertising. I usually arrange a matrix of time series plots with Kerry on the left and Bush on the right and let each row represent a media market. This makes it really easy to compare advertising strategies across candidates (left to right) and across media markets (top to bottom). I map time to the x-axis as usual, and, since I am most interested in negativity, map campaign negativity to the y-axis. But I think that ad frequency is an important component to consider as well. What graphical element should it be mapped to?
Some choices include:
- A third spatial dimension, which is barely worth mentioning.
- A second time series with a separate scale in the same plot. This seems a little confusing and doesn't emphasize my main variable. As Jon pointed out, the points where the lines intersect tend to be given too much attention by the reader.
- Two plots per candidate/market combination. This makes it more difficult to compare across candidates or markets.
- Coloring the line.
I like coloring the line the best. It doesn't distract from the main purpose of the plot, which is to look at negativity across time, but it does allow me to roughly check the relationship between negativity and frequency. But what colors should I use? I just started with the ggplot2 default gradient, which goes blue to red. This seems like a fairly intuitive coloring, since I naturally associate blue with inactivity and red with heavy activity. These plots are given below.
I posted a similar graph of faked data in a recent blog in which I discussed Jon Peltier's mapping of non-intuitive colors to qualitative variables. Jon was kind enough to provide thoughtful feedback on my graph and we had an interesting discussion in the comments section. He pointed out that the colors used above are misleading.
Color gradients are difficult to interpret. On your colored line, I can see that it’s blue at one end and red at the other, but it’s not so easy to determine which portion of the line is which intermediate shade of purple or magenta. In your legend, I am hard-pressed to differentiate between 40 and 60 in your legend, or also between 80 and 100, but the difference between 60 and 80 stands out very strongly. Is it my eyes, my monitor, or an underlying feature of human perception? I don’t know, but in any case it’s a problem.
He makes a great point and perhaps it is possible to improve the colors. Since these graphs are important to me, I certainly want to try. I took this paper by Zeileis et al as a starting point, particularly their ideas about sequential palettes (pp. 8-10).
Sequential palettes are used for coding numerical information that ranges in a certain interval where low values are considered to be uninteresting and high values are considered to be interesting....
The simplest solution is to employ light/dark contrasts, i.e. rely on the most basic perceptual axis. The interestingness can be coded by an increasing amount of gray corresponding to a decreasing luminance in HCL space.
This seems close to what I am trying to do. I want to look at the average ad negativity across time. But obviously average negativity is less important when very few ads are running. So I made a new color gradient that ranges from white, meaning no ads ran that day, to totally black, meaning 200 ads ran that day.
The improvement isn't as large as I had hoped, but I think the grayscale gradient works better than the blue-to-red. There is more contrast in the grayscale gradient, which I like better. The default color gradient basically gives a purple line, with some blue and red at the extremes. The grayscale gradient, on the other hand, shows a little more variation. It also nicely emphasizes the important areas (i.e. where the advertising is the heaviest). The changes in the scale still don't seem quite linear (e.g. the difference between 200 and 150 doesn't seem as large as the difference between 100 and 50 or 50 and 0). But I think it is an improvement.
Overall, I have to work more on this. I definitely think the new grayscale gradient works better. It gives a rough sense of the rate of advertising at any given point in the campaign and, importantly, emphasizes the areas of heavy advertising. Yet I think it can still be improved. I am changing only luminance in the figure, ranging the color from white to black. It is possible to vary two dimensions of the HCL space simultaneously as well, giving some color to the gradient. But for now, this seems like an improvement and I think I'll stick with it.

