Grammatical graphics

The Grammar of Graphics

The title comes from Leland Wilkinson’s book, The Grammar of Graphics

The first chapter is quite mathematical – give it a look by following the link above.

One purpose was to give a way of describing graphics in the face of a combinatorial explosion of designs.

  • How to describe graphics to the computer for drawing.

But it also provides a way for teaching graphics.


Sets the meaning of the drawing space.

  • For stats, x-y coordinate frame suffices … almost!
    • axes can be quantitative or categorical
    • for categorical axes,
      • there’s space between the positions marking the levels
      • we can use this interstitial space to various purposes
        • random scatter (jitter)
        • dodging (side-by-side), stacking (on top of) incorporates a third variable
  • Facetting to incorporate a third/fourth/fifth variable

StatPREP guideline:

  • All aspects of the frame should be mapped to variables!
  • The vertical axis should always be the response variable.

Assessment items:

  1. How many scales are there?
  2. Which variable is mapped to each scale? Is that variable continuous or discrete
  3. What variables are displayed by the position of the indicated point?
  4. What does each dot correspond to in the real world?
  5. What’s the sex of the person at the indicated point?
  6. For categorical scales, what property sets the order? Is it meaningful?
  7. Should zero be included on the vertical axis? Why or why not?


Glyphs are the individual data marks in the frame. Each layer consists of glyphs of the same type. They have graphical properties (“aesthetics” in the grammar):

  • shape
  • color
  • size
  • transparency (alpha)
  • position
  • and so on

Assessment item:

  1. How many variables are represented by position?
  2. How many aesthetics are there? (Include x- and y-position in your count.)
  3. How many variables are represented?
  4. Which variable is represented in more than one way? Name the variable and say what the different ways are.
  5. What are the values for the variables at each of the indicated points?
  6. Keeping the same frame, how can you change the graph to keep all the variables but make it clearer how union membership relates to wage?


A graph consists of a frame and one or more layers.

StatPREP guidelines:

  1. ALWAYS include a data layer
  2. Statistical objects (if needed) should be represented as additional layers.
    • It’s helpful to consider including two distinct types of layers for statistics:
      1. Interval layers
      2. Density layers

Interval layers

Some assessment items:

  1. One of the graphs shows a prediction interval, the other a confidence interval. Which is which? Explain what you saw in the graphs that led to your answer.
  2. Which, if any, of the employment sectors show a statistically significant wage difference between married and unmarried workers?
  3. How would you re-arrange the graph if you wanted to examine sex-related differences in wages?

Density layers

Assessment items:

  1. Are wages approximately “normally” distributed?
  2. The “glass ceiling” describes a phenomenon where women tend not to rise as high in employment as men. What about the graph is consistent with the glass ceiling?

Point out alternatives

  1. What determines the order of the categories on the horizontal axis?
  2. Is a jet ski more likely to be stolen than a sailboat? How can you tell this from the graph?
  3. What does the data frame look like on which this graph is based?
  4. How many variables are there in the “raw” data? What is the unit of observation? (Ans: A stolen boat.)
  5. Suppose you are working for a boat insurance company. You want to figure out what are the risk factors for a boat being stolen.
    • What variables would you select from the company’s database? (e.g., Is it trailable?)
    • What would be the unit of observation? (Ans: A boat)