Stat405            Polishing, continued


                             Hadley Wickham
Monday, 9 November 2009
1. Coming up next.
               2. Project.
               3. Colour scales
               4. Themes.




Monday, 9 November 2009
Monday           Wednesday

           Nov 9             Polishing           Spam

         Nov 16           Posters / Project      Spam

         Nov 23             Debugging         Thanksgiving
                            Professional
         Nov 30             development
                                               Poster pres


Monday, 9 November 2009
Project 3

                  Your turn to find an interesting data
                  source and analyse it. Any data is
                  acceptable but it needs to be large
                  (>1,000 points).
                  If you don’t have any ideas, try looking at
                  http://delicious.com/hadley/data



Monday, 9 November 2009
Deliverables
                  Poster.
                  Single double-sided page handout,
                  summarising the major findings on the
                  poster.
                  The R code used to clean the data and
                  produce the plots. (No manual steps!)



Monday, 9 November 2009
Timeline
                  Nov 17-19. Meet to discuss choice of
                  data and initial analysis ideas. Data
                  preparation.
                  Nov 23-30. Review of draft poster
                  (optional, but highly recommended).
                  Dec 2. Final poster presentation. Turn in
                  electronic copy of handout & code.


Monday, 9 November 2009
Polishing



Monday, 9 November 2009
Tools

                  Scales. Used to override default
                  perceptual mappings, and tune
                  parameters of axes and legends.
                  Themes: control presentation of
                  non-data elements.



Monday, 9 November 2009
Your turn


                  Recall the four key parameters that all
                  scales have in common. What are their
                  names and what do they do?




Monday, 9 November 2009
p <- qplot(carat, price, data = diamonds, geom = "hex")

     # First argument (name) controls legend title
     p + scale_fill_continuous("Count")

     # Breaks and labels control legend keys
     p + scale_fill_continuous(breaks = c(1000, 3500, 7000))
     p + scale_fill_continuous(breaks = c(0, 4000, 8000))

     # Why don't 0 and 8000 have colours?
     p + scale_fill_continuous(breaks = c(0, 4000, 8000),
       limits = c(0, 8000))

     # Can use labels to make more human readable
     breaks <- c(0, 2000, 4000, 6000, 8000)
     labels <- format(breaks, big.mark = ",")
     p + scale_fill_continuous(breaks = breaks, labels = labels,
        limits = c(0, 8000))

Monday, 9 November 2009
Default colour scales

                  Discrete: evenly spaced hues of equal
                  chroma and luminance. No colour
                  appears more important than any other.
                  Does not imply order.
                  Continuous: evenly spaced hues
                  between two colours.



Monday, 9 November 2009
Alternatives

                  Discrete: hue, brewer, manual
                  Continuous: gradient, gradient2,
                  gradientn




Monday, 9 November 2009
Color brewer
                  Cynthia Brewer applied the basic
                  principles and then rigorously tested the
                  results to produce a selection of good
                  palettes, particularly tailored for maps:
                  http://colorbrewer2.org/
                  Can use cut_interval() or cut_number()
                  to convert continuous to discrete.


Monday, 9 November 2009
# Fancy looking trigonometric function
     vals <- seq(-4 * pi, 4 * pi, len = 50)
     df <- expand.grid(x = vals, y = vals)
     df$r <- with(df, sqrt(x ^ 2 + y ^ 2))
     df$z <- with(df, cos(r ^ 2) * exp(- r / 6))
     df$z_cut <- cut_interval(df$z, 9)

     (p1 <-               qplot(x, y, data = df, fill = z,
       geom               = "tile"))
     (p2 <-               qplot(x, y, data = df, fill = z_cut,
       geom               = "tile"))



Monday, 9 November 2009
p1 + scale_fill_gradient(low = "white",
       high = "black")

     # Highlight deviations
     p1 + scale_fill_gradient2()
     p1 + scale_fill_gradient2(breaks = seq(-1, 1,
       by = 0.25), limits = c(-1, 1))
     p1 + scale_fill_gradient2(mid = "white",
       low = "black", high = "black")

     p2 + scale_fill_brewer(pal = "Blues")



Monday, 9 November 2009
Your turn

                  Read through the examples for
                  scale_colour_brewer,
                  scale_colour_gradient2 and
                  scale_colour_gradientn.
                  Experiment!



Monday, 9 November 2009
Colour blindness

                  7-10% of men are red-green colour
                  “blind”. (Many other rarer types of colour
                  blindness)
                  Solutions: avoid red-green contrasts; use
                  redundant mappings; test. I like color
                  oracle: http://colororacle.cartography.ch



Monday, 9 November 2009
Themes



Monday, 9 November 2009
Visual appearance
                  So far have only discussed how to get the
                  data displayed the way you want,
                  focussing on the essence of the plot.
                  Themes give you a huge amount of
                  control over the appearance of the plot,
                  the choice of background colours, fonts
                  and so on.


Monday, 9 November 2009
# Two built in themes. The default:
     qplot(carat, price, data = diamonds)

     # And a theme with a white background:
     qplot(carat, price, data = diamonds) + theme_bw()

     # Use theme_set if you want it to apply to every
     # future plot.
     theme_set(theme_bw())

     # This is the best way of seeing all the default
     # options
     theme_bw()
     theme_grey()

Monday, 9 November 2009
Elements
                  You can also make your own theme, or
                  modify and existing.
                  Themes are made up of elements which
                  can be one of: theme_line, theme_segment,
                  theme_text, theme_rect, theme_blank
                  Gives you a lot of control over plot
                  appearance.


Monday, 9 November 2009
Elements
                  Axis: axis.line, axis.text.x, axis.text.y,
                  axis.ticks, axis.title.x, axis.title.y
                  Legend: legend.background, legend.key,
                  legend.text, legend.title
                  Panel: panel.background, panel.border,
                  panel.grid.major, panel.grid.minor
                  Strip: strip.background, strip.text.x,
                  strip.text.y


Monday, 9 November 2009
p <- qplot(displ, hwy, data = mpg) +
       opts(title = "Bigger engines are less efficient")

     # To modify a plot
     p
     p + opts(plot.title   =
       theme_text(size =   12, face = "bold"))
     p + opts(plot.title   = theme_text(colour = "red"))
     p + opts(plot.title   = theme_text(angle = 45))
     p + opts(plot.title   = theme_text(hjust = 1))




Monday, 9 November 2009
Your turn
                  Fix the overlapping y labels on this plot:
                  qplot(reorder(model, hwy), hwy, data =
                  mpg)
                  Rotate the labels on these strips so they
                  are easier to read.
                  qplot(hwy, reorder(model, hwy), data =
                  mpg) + facet_grid(manufacturer ~ .,
                  scales = "free", space = "free")


Monday, 9 November 2009

21 Polishing

  • 1.
    Stat405 Polishing, continued Hadley Wickham Monday, 9 November 2009
  • 2.
    1. Coming upnext. 2. Project. 3. Colour scales 4. Themes. Monday, 9 November 2009
  • 3.
    Monday Wednesday Nov 9 Polishing Spam Nov 16 Posters / Project Spam Nov 23 Debugging Thanksgiving Professional Nov 30 development Poster pres Monday, 9 November 2009
  • 4.
    Project 3 Your turn to find an interesting data source and analyse it. Any data is acceptable but it needs to be large (>1,000 points). If you don’t have any ideas, try looking at http://delicious.com/hadley/data Monday, 9 November 2009
  • 5.
    Deliverables Poster. Single double-sided page handout, summarising the major findings on the poster. The R code used to clean the data and produce the plots. (No manual steps!) Monday, 9 November 2009
  • 6.
    Timeline Nov 17-19. Meet to discuss choice of data and initial analysis ideas. Data preparation. Nov 23-30. Review of draft poster (optional, but highly recommended). Dec 2. Final poster presentation. Turn in electronic copy of handout & code. Monday, 9 November 2009
  • 7.
  • 8.
    Tools Scales. Used to override default perceptual mappings, and tune parameters of axes and legends. Themes: control presentation of non-data elements. Monday, 9 November 2009
  • 9.
    Your turn Recall the four key parameters that all scales have in common. What are their names and what do they do? Monday, 9 November 2009
  • 10.
    p <- qplot(carat,price, data = diamonds, geom = "hex") # First argument (name) controls legend title p + scale_fill_continuous("Count") # Breaks and labels control legend keys p + scale_fill_continuous(breaks = c(1000, 3500, 7000)) p + scale_fill_continuous(breaks = c(0, 4000, 8000)) # Why don't 0 and 8000 have colours? p + scale_fill_continuous(breaks = c(0, 4000, 8000), limits = c(0, 8000)) # Can use labels to make more human readable breaks <- c(0, 2000, 4000, 6000, 8000) labels <- format(breaks, big.mark = ",") p + scale_fill_continuous(breaks = breaks, labels = labels, limits = c(0, 8000)) Monday, 9 November 2009
  • 11.
    Default colour scales Discrete: evenly spaced hues of equal chroma and luminance. No colour appears more important than any other. Does not imply order. Continuous: evenly spaced hues between two colours. Monday, 9 November 2009
  • 12.
    Alternatives Discrete: hue, brewer, manual Continuous: gradient, gradient2, gradientn Monday, 9 November 2009
  • 13.
    Color brewer Cynthia Brewer applied the basic principles and then rigorously tested the results to produce a selection of good palettes, particularly tailored for maps: http://colorbrewer2.org/ Can use cut_interval() or cut_number() to convert continuous to discrete. Monday, 9 November 2009
  • 14.
    # Fancy lookingtrigonometric function vals <- seq(-4 * pi, 4 * pi, len = 50) df <- expand.grid(x = vals, y = vals) df$r <- with(df, sqrt(x ^ 2 + y ^ 2)) df$z <- with(df, cos(r ^ 2) * exp(- r / 6)) df$z_cut <- cut_interval(df$z, 9) (p1 <- qplot(x, y, data = df, fill = z, geom = "tile")) (p2 <- qplot(x, y, data = df, fill = z_cut, geom = "tile")) Monday, 9 November 2009
  • 15.
    p1 + scale_fill_gradient(low= "white", high = "black") # Highlight deviations p1 + scale_fill_gradient2() p1 + scale_fill_gradient2(breaks = seq(-1, 1, by = 0.25), limits = c(-1, 1)) p1 + scale_fill_gradient2(mid = "white", low = "black", high = "black") p2 + scale_fill_brewer(pal = "Blues") Monday, 9 November 2009
  • 16.
    Your turn Read through the examples for scale_colour_brewer, scale_colour_gradient2 and scale_colour_gradientn. Experiment! Monday, 9 November 2009
  • 17.
    Colour blindness 7-10% of men are red-green colour “blind”. (Many other rarer types of colour blindness) Solutions: avoid red-green contrasts; use redundant mappings; test. I like color oracle: http://colororacle.cartography.ch Monday, 9 November 2009
  • 18.
  • 19.
    Visual appearance So far have only discussed how to get the data displayed the way you want, focussing on the essence of the plot. Themes give you a huge amount of control over the appearance of the plot, the choice of background colours, fonts and so on. Monday, 9 November 2009
  • 20.
    # Two builtin themes. The default: qplot(carat, price, data = diamonds) # And a theme with a white background: qplot(carat, price, data = diamonds) + theme_bw() # Use theme_set if you want it to apply to every # future plot. theme_set(theme_bw()) # This is the best way of seeing all the default # options theme_bw() theme_grey() Monday, 9 November 2009
  • 21.
    Elements You can also make your own theme, or modify and existing. Themes are made up of elements which can be one of: theme_line, theme_segment, theme_text, theme_rect, theme_blank Gives you a lot of control over plot appearance. Monday, 9 November 2009
  • 22.
    Elements Axis: axis.line, axis.text.x, axis.text.y, axis.ticks, axis.title.x, axis.title.y Legend: legend.background, legend.key, legend.text, legend.title Panel: panel.background, panel.border, panel.grid.major, panel.grid.minor Strip: strip.background, strip.text.x, strip.text.y Monday, 9 November 2009
  • 23.
    p <- qplot(displ,hwy, data = mpg) + opts(title = "Bigger engines are less efficient") # To modify a plot p p + opts(plot.title = theme_text(size = 12, face = "bold")) p + opts(plot.title = theme_text(colour = "red")) p + opts(plot.title = theme_text(angle = 45)) p + opts(plot.title = theme_text(hjust = 1)) Monday, 9 November 2009
  • 24.
    Your turn Fix the overlapping y labels on this plot: qplot(reorder(model, hwy), hwy, data = mpg) Rotate the labels on these strips so they are easier to read. qplot(hwy, reorder(model, hwy), data = mpg) + facet_grid(manufacturer ~ ., scales = "free", space = "free") Monday, 9 November 2009