Part 3: Plotting the Vowels

Author

Bethany Gardner

Published

February 9, 2024

Doi

Now to actually make the vowel plots! This document goes into detail about how I decided to make them the way I did and how to implement them in ggplot, but if you just want to see the final results, jump down to here, here, and here.

3.1 Setup

library(tidyverse)
library(magrittr)
library(ggtext)
library(ggforce)
library(ggrepel)
library(rcartocolor)
library(png)
library(patchwork)

options(dplyr.summarise.inform = FALSE)

1: Data wrangling (tidyr, dplyr, purrr, stringr), ggplot2 for plotting.
2: Pipe operator.
3: Markdown/HTML formatting for text in plots.
4: Ellipsis plots.
5: Offset text labels from points.
6: Color themes.
7: Open PNG images.
8: Add images on top of plots.
9: Don’t print a message every time summarise() is called on a grouped dataframe.

(Note: this could be done in Python, but I strongly prefer the ggplot package for plotting.)

Data

Load the vowel formant data from Part 2:

formants <- read.csv("data/formants.csv", stringsAsFactors = TRUE) %>%
  select(-Vowel_Time, -Count) %>%
  mutate(
    Speaker = ifelse(Speaker == "G", "Gretchen", "Lauren"),
    List = ifelse(
      List == "episode", "Lingthusiasm Episodes", "Wells Lexical Set"
    )
  ) %>%
  mutate(across(where(is.character), as.factor))

str(formants)

1: Read formants data from 2_annotate_audio.qmd, and keep columns for List, Vowel, Word, Speaker, F1, and F2.
2: Recode the values for Speaker and List from abbreviations to full strings for plot labels, then make them both factors.

'data.frame':   397 obs. of  6 variables:
 $ List   : Factor w/ 2 levels "Lingthusiasm Episodes",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Vowel  : Factor w/ 11 levels "ɑ","æ","ɔ","ə",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ Word   : Factor w/ 47 levels "among","another",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Speaker: Factor w/ 2 levels "Gretchen","Lauren": 1 1 1 1 1 2 2 2 2 2 ...
 $ F1     : num  766 602 733 766 626 ...
 $ F2     : num  1260 1254 1175 1373 1260 ...

IPA Symbols

However, the IPA symbols aren’t encoded correctly. They’ll render in RStudio, but not when Quarto renders the document to HTML, or always when ggplot renders the plots. This isn’t what we want:

ɑ, æ, ɔ, ə, ɛ, i, ɪ, o, u, ʊ, ʌ

So, the next step is to enter the unicode values manually (copied from this Wikipedia page):

vowels <- c(
  "i_lower"   = "\u0069",  # i (close front unrounded)
  "i_upper"   = "\u026A",  # ɪ (near-close front unrounded)
  "epsilon"   = "\u025B",  # ɛ (open-mid front unrounded)
  "ash"       = "\u00E6",  # æ (near-open front unrounded)
  "schwa"     = "\u0259",  # ə (mid central)
  "horseshoe" = "\u028A",  # ʊ (near-close near-back rounded)
  "u"         = "\u0075",  # u (close back rounded)
  "o"         = "\u006F",  # o (close-mid back rounded)
  "hat"       = "\u028C",  # ʌ (open-mid back unrounded)
  "open_o"    = "\u0254",  # ɔ (open-mid back rounded)
  "alpha"     = "\u0251"   # ɑ (open back unrounded)
)

These are ordered from front to back, then close to open (Figure 4).

Then match the unicode for the IPA symbol to the words:

formants %<>% mutate(
  Vowel = case_when(
    Word %in% c("ball", "father", "honorific", "lot", "palm", "start") ~ vowels["alpha"],
    Word %in% c("bang", "bath", "hand", "laugh", "trap") ~ vowels["ash"],
    Word %in% c("bought", "cloth", "core", "north", "thought", "wrong") ~ vowels["open_o"],
    Word %in% c("among", "famous", "support") ~ vowels["schwa"],
    Word %in% c("bet", "dress", "guest", "says", "square") ~ vowels["epsilon"],
    Word %in% c("beat", "believe", "fleece", "people") ~ vowels["i_lower"],
    Word %in% c("bit", "finish", "kit", "near", "pin") ~ vowels["i_upper"],
    Word %in% c("force", "goat") ~ vowels["o"],
    Word %in% c("blue", "goose", "through", "who") ~ vowels["u"],
    Word %in% c("could", "cure", "put", "foot") ~ vowels["horseshoe"],
    Word %in% c("another", "but", "fun", "strut") ~ vowels["hat"],
  ) %>% factor(levels = vowels, ordered = TRUE)
)

str(formants)

1: If the value in the Word column is ball, father, honorific, lot, or palm, then assign the alpha value from the vowels list.
2: Convert character to factor, then specify the order of the factors (same as in vowels list above) to make sure it stays consistent.

'data.frame':   397 obs. of  6 variables:
 $ List   : Factor w/ 2 levels "Lingthusiasm Episodes",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Vowel  : Ord.factor w/ 11 levels "i"<"ɪ"<"ɛ"<"æ"<..: 5 5 5 5 5 5 5 5 5 5 ...
  ..- attr(*, "names")= chr [1:397] "schwa" "schwa" "schwa" "schwa" ...
 $ Word   : Factor w/ 47 levels "among","another",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Speaker: Factor w/ 2 levels "Gretchen","Lauren": 1 1 1 1 1 2 2 2 2 2 ...
 $ F1     : num  766 602 733 766 626 ...
 $ F2     : num  1260 1254 1175 1373 1260 ...

Now the IPA vowels consistently render correctly:

i, ɪ, ɛ, æ, ə, ʊ, u, o, ʌ, ɔ, ɑ

Lingthusiasm Theme

The Lingthusiasm font is Josefin Sans, which is available from Google Fonts.

I downloaded and installed it to my computer. There are a number of different ways to add new fonts without having to install them separately outside of RStudio, such as font_add_google() from the showtext package. However, that method was causing errors rendering the IPA symbols.

systemfonts() shows the list of fonts installed on my computer that R recognizes, and it finds Josefin Sans:

systemfonts::system_fonts() %>%
  filter(str_detect(family, "Josefin Sans")) %>%
  select(path, name, family) %>%
  pivot_longer(cols = everything())

1: Get dataframe of fonts available.
2: Filter to include Josefin Sans.
3: Select columns to print; flip to list vertically.

# A tibble: 3 × 2
  name   value                                                                  
  <chr>  <chr>                                                                  
1 path   "C:\\Users\\betha\\AppData\\Local\\Microsoft\\Windows\\Fonts\\JosefinS…
2 name   "JosefinSans-Thin"                                                     
3 family "Josefin Sans"

However, the fonts loaded by default just include Times New Roman, Arial, and Courier New:

windowsFonts()

$serif
[1] "TT Times New Roman"

$sans
[1] "TT Arial"

$mono
[1] "TT Courier New"

This tells R to load Josefin Sans into the set of available fonts, so text will render in Josefin Sans if family = sans_alt, but stick with the default sans font otherwise (and not break the IPA symbols).

windowsFonts(sans_alt = "Josefin Sans")
windowsFonts()

$serif
[1] "TT Times New Roman"

$sans
[1] "TT Arial"

$mono
[1] "TT Courier New"

$sans_alt
[1] "Josefin Sans"

The hex codes for the green and navy are:

lingthusiasm_green = "#26b14c"
lingthusiasm_navy = "#051458"

And the logo:

lingthusiasm_logo <- readPNG("resources/lingthusiasm_logo_circle.png", native = TRUE)
lingthusiasm_tagline <- readPNG("resources/lingthusiasm_logo_tagline.png", native = TRUE)

1: Read logo images. native = TRUE specifies reading it as a raster object instead of an array, which is the format patchwork::inset_element() needs.

Putting it together:

tibble(
  "Color" = c("green", "navy"),
  "Hex" = c(lingthusiasm_green, lingthusiasm_navy),
  "Extra_Col" = c(1, 1)
) %>%
  ggplot(aes(x = Color, y = Extra_Col, fill = Hex, label = Hex)) +
  geom_tile() +
  geom_text(size = 10, color = "white") +
  scale_fill_identity() +
  theme_classic() +
  labs(title = "Lingthusiasm Theme") +
  theme(
    plot.title = element_text(
      family = "sans_alt", size = 28,
      margin = margin(t = 1, b = 1, unit = "lines"), hjust = 0.55,
      color = lingthusiasm_navy
    ),  
    axis.text = element_blank(), axis.title = element_blank(),
    axis.line = element_blank(), axis.ticks = element_blank()
  ) +
  inset_element(
    p = lingthusiasm_logo,
    left = unit(0.05, "snpc"), right = unit(0.25, "snpc"),
    top = unit(1.2, "snpc"), bottom = unit(1, "snpc")
  )

1: Color is names of the two lingthusiasm theme colors.
2: Hex is the two hex codes.
3: Extra_Col is a dummy value because ggplot needs a Y axis.
4: The X axis is Color, and the Y axis is Extra_Col, which just creates two boxes next to each other. Fill and label are specified by Hex.
5: Draw a square for each color.
6: Label the squares with the hex code strings (keeping the default font).
7: Fill the squares with the hex code color values.
8: Set the plot title text to be navy, Josefin Sans, size 28, centered with some space above and below.
9: Remove the axis lines, labels, titles, and ticks.
10: Use the patchwork package to add the logo image on top of the plot. This step needs to be last.
11: Specify the positions for each corner of the logo, using spnc units so it stays square even if the overall plot is rectangular.

Figure 1: Lingthusiasm colors, font, and logo.

3.2 Plot Vowel Means

Now let’s take a look at the data! F1 gets plotted on the Y axis, and F2 gets plotted on the X axis.

means_1 <- formants %>%
  group_by(Speaker, List, Vowel) %>%
  summarise(F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, label = Vowel)) +
  geom_textbox(
    fill = lingthusiasm_green, box.colour = NA,
    color = "white", size = 4.5, halign = 0.5, valign = 0.5,
    width = unit(0.10, "snpc"), height = unit(0.10, "snpc"),
    box.padding = unit(c(0, 0, 0, 0), "snpc"), box.r = unit(0.01, "snpc")
  ) +  
  facet_grid(Speaker ~ List) +
  theme_classic() +
  theme(
    axis.line = element_line(color = lingthusiasm_navy),
    axis.ticks = element_line(color = lingthusiasm_navy),
    panel.border = element_rect(color = lingthusiasm_navy, fill = NA),
    strip.background = element_rect(color = lingthusiasm_navy),
    text = element_text(size = 12, family = "sans_alt", color = lingthusiasm_navy),
    axis.text = element_text(color = lingthusiasm_navy),
    strip.text = element_text(color = lingthusiasm_navy, size = 12)
  ) +
  labs(title = "Vowel Means")

means_1

1: Take the full data set, group it by Speaker then List then Vowel, and then calculate the means of F1 and F2 for each Speaker x List x Vowel.
2: All layers of the plot have F2 on the X axis, F1 on the Y axis, and are labelled by Vowel.
3: Write the vowel symbols (because Label = Vowel) at the location of their means.
4: Make the text box background lingthusiasm green with no outline.
5: Make the text white, size 4.5 (note that this is on a different scale than the rest of the text sizes specified in later theme()), vertically and horizontally centered.
6: Set the size of the text boxes, using snpc (squared normalized parent coordinates) to be relative to the size of the plot but always square.
7: No margins inside the text boxes and a slight curve on the corners.
8: Split the plot to have Gretchen’s data in the top panels and Lauren’s data in the bottom panels, and the data from the Lingthusiasm episodes in the left panels and the data from the Wells lexical set recordings in the right panels.
9: Change the default theme to have a white background with no grid lines.
10: Change all the lines (axis lines, axis ticks, outline around panels, outline around panel labels) to be the lingthusiasm navy.
11: Make all the text navy Josefin Sans. Set the base size as 12, but make the text of the speaker panel labels bigger.
12: Set the title, and leave the other axis/legend labels as their default values of “F1”, “F2”, and “Vowel.”

(Sidenote: saving the theme specifications so we don’t have to keep retyping theme.)

lingthusiasm_theme <- theme(
  axis.line = element_line(color = lingthusiasm_navy),
  axis.ticks = element_line(color = lingthusiasm_navy),
  panel.border = element_rect(color = lingthusiasm_navy, fill = NA),
  strip.background = element_rect(color = lingthusiasm_navy),
  text = element_text(size = 12, family = "sans_alt", color = lingthusiasm_navy),
  axis.text = element_text(color = lingthusiasm_navy),
  strip.text = element_text(color = lingthusiasm_navy, size = 12)
)

However, vowel plots typically have their axes reversed, so that the highest value of (F1, F2) is at the bottom left corner instead of the top right corner. This isn’t standard data visualization procedure, but it has a cool and useful result.

means_2 <- formants %>%
  group_by(List, Speaker, Vowel) %>%
  summarise(F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, label = Vowel)) +
  geom_textbox(
    fill = lingthusiasm_green, box.colour = NA,
    color = "white", size = 4.5, halign = 0.5, valign = 0.5,
    width = unit(0.10, "snpc"), height = unit(0.10, "snpc"),
    box.padding = unit(c(0, 0, 0, 0), "snpc"), box.r = unit(0.01, "snpc")
  ) +  
  facet_grid(Speaker ~ List) +
  scale_x_reverse(breaks = c(1000, 1500, 2000, 2500)) +
  scale_y_reverse(limits = c(1050, 225), n.breaks = 4) +
  theme_classic() +
  lingthusiasm_theme +
  labs(title = "Vowel Means")

means_2

1: Just annotating the lines that changed from the previous chunk.
2: Add this to flip the X axis. Specify breaks because the default values aren’t even.
3: Add this to flip the Y axis. The limits (see how they’re reversed) are specified because the defaults were a bit too narrow, and like this the axis ticks/labels are spaced more evenly.
4: Add this to specify the colors and font sizes etc.

Now the layout resembles the IPA vowel chart! Front vowels are on the left, and back vowels are on the right; close vowels are on the top, and open vowels are on the bottom.

3.3 Plot Individual Data Points

Just plotting the means for each vowel loses a lot of information, so let’s take a look at the underlying data.

Now, we’ll distinguish between vowels by color. First, make a legend that will be easier to read than the default by creating a string that prints each vowel in its corresponding color (using the ggtext package to render the HTML formatting).

vowel_key <- tibble("Vowel" = vowels, "Color" = carto_pal(12, "Bold")[1:11]) %>%
  mutate(Styled = str_c("<b style='color:", Color, "'>", Vowel, "</b>")) %>%
  pull(Styled) %>%
  str_flatten(collapse = ", ")

vowel_key %>% str_wrap(32) %>% str_view()

1: Vowel column is the list of vowels (unicode codes). Color column is the hex codes from the Bold palette in rcartocolor, the color set we’ve been using so far. (Using the first 11 values from the full palette, so the last color isn’t gray.)
2: Encase with HTML code, so that hex code becomes a color argument for the vowel character.
3: Merge into 1 string, with each value separated by a comma + space. 4.. Print, wrapping lines on each item.

[1] │ <b style='color:#7F3C8D'>i</b>,
    │ <b style='color:#11A579'>ɪ</b>,
    │ <b style='color:#3969AC'>ɛ</b>,
    │ <b style='color:#F2B701'>æ</b>,
    │ <b style='color:#E73F74'>ə</b>,
    │ <b style='color:#80BA5A'>ʊ</b>,
    │ <b style='color:#E68310'>u</b>,
    │ <b style='color:#008695'>o</b>,
    │ <b style='color:#CF1C90'>ʌ</b>,
    │ <b style='color:#F97B72'>ɔ</b>,
    │ <b style='color:#4B4B8F'>ɑ</b>

Which will render like this:

i, ɪ, ɛ, æ, ə, ʊ, u, o, ʌ, ɔ, ɑ

(Note that ggplot will throw a warning like Warning in text_info(label, fontkey, fontfamily, font, fontsize, cache): unable to translate '<U+0251>png215' to native encoding, but it renders correctly, so the warnings are turned off in those code chunks.)

points <- formants %>%
  ggplot(aes(x = F2, y = F1, color = Vowel, label = Vowel)) +
  geom_point(size = 1.5) +
  facet_grid(Speaker ~ List) +
  scale_color_manual(values = carto_pal(12, "Bold")) +
  scale_x_reverse(breaks = c(750, 1250, 1750, 2250, 2750)) +
  scale_y_reverse(breaks = c(250, 500, 750, 1000)) +
  theme_classic() +
  lingthusiasm_theme +
  theme(plot.subtitle = element_markdown(family = "sans")) +
  labs(title = "Individual Data Points", subtitle = vowel_key) +
  guides(color = guide_none())

points

1: Passing the full data set, not the means by Speaker + Vowel + List, to ggplot.
2: Instead of geom_text(), geom_point() is a layer drawing scatterplot (size making the points slightly bigger than default).
3: Use the Bold color palette from the rcartocolor package to color-code the vowels. (There are 11 vowels, but I specify 12 colors here so the grey gets skipped.)
4: Limits need to be slightly bigger than plots with vowel means, and then breaks adjusted so that that Y axis labels don’t overlap with each other between the two panels.
5: element_markdown() from ggtext will render the HTML string. Use default sans serif font because Josefin Sans doesn’t have all of the IPA symbols.
6: Add the color-coded list of vowels as a subtitle.
7: Turn the default legend off.

3.4 Plot Word Means

The data for each vowel consists of 3 different words. How different are they? First, let’s look at the Wells Lexical Set.

These next plots use the ggrepel package to make the word labels not overlap with each other or with the scatterplot points.

words_ls <- formants %>%
  filter(List == "Wells Lexical Set") %>%
  group_by(Speaker, Vowel, Word) %>%
  summarise(F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, color = Vowel, label = Word)) +
  geom_point(size = 1.25) +
  geom_label_repel(
    min.segment.length = 0,
    size = 4,
    force = 75,
    family = "sans_alt",
    seed = 2024
  ) +
  facet_wrap(~Speaker) +
  scale_color_manual(values = carto_pal(12, "Bold")) +
  scale_x_reverse(breaks = c(750, 1250, 1750, 2250, 2750)) +
  scale_y_reverse(breaks = c(250, 500, 750, 1000)) +
  theme_classic() +
  lingthusiasm_theme +
  theme(plot.subtitle = element_markdown(size = 15, family = "sans")) +
  labs(title = "Means By Word: Wells Lexical Set", subtitle = vowel_key) +
  guides(color = guide_none())

words_ls

1: Only include Wells Lexical Set word list.
2: The data for this plot is the means by Speaker, Vowel, AND Word.
3: All layers of this plot have F2 on the X axis, F1 on the Y axis, are color-coded by Vowel, and are labelled by Vowel.
4: Draw a point at each Speaker*Vowel*Word mean (slightly bigger than default).
5: Draw text box labels offset from each point. geom_label_repel() makes sure none of the boxes overlap with the points or with each other.
6: Always draw a line from the text box to the scatterplot point,
7: Text size. Note this is on a different scale than the text for the title/axis labels.
8: Increase the amount of space required between the text boxes.
9: Make font Josefin sans.
10: Set a seed so the results are consistent.
11: Put Gretchen on the left and Lauren on the right.
12: Specify limits and locations of labels/breaks, since the defaults aren’t even.
13: Include color-coded vowel string as a subtitle, instead of the default legend for color.

Figure 6: Mean for each word in the Wells Lexical Set word list.

Now let’s look at the Lingthusiasm episode words:

words_ep_1 <- formants %>%
  filter(List == "Lingthusiasm Episodes") %>%
  group_by(Speaker, Vowel, Word) %>%
  summarise(F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, color = Vowel, label = Word)) +
  geom_point(size = 1.25) +
  geom_label_repel(
    min.segment.length = 0, force = 75, seed = 2024,
    size = 4, family = "sans_alt"
  ) +
  facet_wrap(~Speaker, ncol = 1) +
  scale_color_manual(values = carto_pal(12, "Bold")) +
  scale_x_reverse(breaks = c(750, 1250, 1750, 2250, 2750)) +
  scale_y_reverse(breaks = c(250, 500, 750, 1000)) +
  theme_classic() +
  lingthusiasm_theme +
  theme(plot.subtitle = element_markdown(size = 15, family = "sans")) +
  labs(title = "Means By Word: Lingthusiasm Episodes", subtitle = vowel_key) +
  guides(color = guide_none())

words_ep_1

1: Only include Lingthusiasm Episode word list.
2: The panels are stacked vertically, because the word labels take up more space. fig-asp: 1.25 in this code chunk’s header makes it render tall enough.

Figure 7: Mean for each word in the Lingthusiasm Episode word list.

One thing that makes this plot a bit hard to interpret is that it’s not immediately clear which vowel in the word is the one being plotted. So, let’s make the vowel bold relative to the rest of the word.

So far we’ve been using ggtext to format text, but that doesn’t work with ggrepel. The workaround, like with the IPA vowels, is to just enter the unicode characters directly.

These are the codes for the Mathematical Sans Serif capital letters, in regular and bold faces. They’re copy-pasted in here manually even though the pattern is predictable, because procedurally generating strings with the \u prefix is a pain.

alphabet_reg <- c(
  "\U1D5A0", "\U1D5A1", "\U1D5A2", "\U1D5A3", "\U1D5A4", "\U1D5A5",
  "\U1D5A6", "\U1D5A7", "\U1D5A8", "\U1D5A9", "\U1D5AA", "\U1D5AB",
  "\U1D5AC", "\U1D5AD", "\U1D5AE", "\U1D5AF", "\U1D5B0", "\U1D5B1",
  "\U1D5B2", "\U1D5B3", "\U1D5B4", "\U1D5B5", "\U1D5B6", "\U1D5B7",
  "\U1D5B8", "\U1D5B9"
)

alphabet_bold <- c(
  "\U1D5D4", "\U1D5D5", "\U1D5D6", "\U1D5D7", "\U1D5D8", "\U1D5D9",
  "\U1D5DA", "\U1D5DB", "\U1D5DC", "\U1D5DD", "\U1D5DE", "\U1D5DF",
  "\U1D5E0", "\U1D5E1", "\U1D5E2", "\U1D5E3", "\U1D5E4", "\U1D5E5",
  "\U1D5E6", "\U1D5E7", "\U1D5E8", "\U1D5E9", "\U1D5EA", "\U1D5EB",
  "\U1D5EC", "\U1D5ED"
)

names(alphabet_reg) <- letters[1:26]
names(alphabet_bold) <- letters[1:26]

1: Name vectors with regular letters, to access similar to a python dictionary.

First, convert the whole word to the regular letters:

to_unicode_caps <- function(word, alphabet_reg) {
  letters <- str_split(word, pattern = "")
  converted <- ""
  for (l in letters) {
    new_word <- str_c(converted, alphabet_reg[l])
  }
  return(str_flatten(new_word))
}

1: Function takes word as a string and alphabet_reg as a named list.
2: Split the word into individual letters.
3: Start string for the converted word.
4: For each letter, use the fact that alphabet_reg is named with the regular letters to get the unicode string for the current letter. Concatenate the letter pulled from alphabet_reg to the converted string.
5: Combine list of letters back into one string.

words_unicode <- map(formants$Word, to_unicode_caps, alphabet_reg)

formants %<>% mutate(.after = Word, Word_Label = words_unicode) %>%
  unnest(Word_Label)

1: For each item in the Word column of the formants dataframe, call the function to_unicode_caps() (defined in previous code chunk) on it. Pass alphabet_reg as the second argument to to_unicode_caps().
2: Insert the words_unicode into the formants dataframe as a column called Word_Label, after the Word column.
3: Convert the items in Word_Label from lists containing 1 string to just strings.

Which renders as:

𝖠𝖬𝖮𝖭𝖦, 𝖠𝖭𝖮𝖳𝖧𝖤𝖱, 𝖡𝖠𝖫𝖫, 𝖡𝖠𝖭𝖦, 𝖡𝖤𝖠𝖳, 𝖡𝖤𝖫𝖨𝖤𝖵𝖤, 𝖡𝖤𝖳, 𝖡𝖨𝖳, 𝖡𝖫𝖴𝖤, 𝖡𝖮𝖴𝖦𝖧𝖳, 𝖡𝖴𝖳, 𝖢𝖮𝖱𝖤, 𝖢𝖮𝖴𝖫𝖣, 𝖥𝖠𝖬𝖮𝖴𝖲, 𝖥𝖠𝖳𝖧𝖤𝖱, 𝖥𝖨𝖭𝖨𝖲𝖧, 𝖥𝖮𝖮𝖳, 𝖥𝖴𝖭, 𝖦𝖴𝖤𝖲𝖳, 𝖧𝖠𝖭𝖣, 𝖧𝖮𝖭𝖮𝖱𝖨𝖥𝖨𝖢, 𝖫𝖠𝖴𝖦𝖧, 𝖯𝖤𝖮𝖯𝖫𝖤, 𝖯𝖨𝖭, 𝖯𝖴𝖳, 𝖲𝖠𝖸𝖲, 𝖲𝖴𝖯𝖯𝖮𝖱𝖳, 𝖳𝖧𝖱𝖮𝖴𝖦𝖧, 𝖶𝖧𝖮, 𝖶𝖱𝖮𝖭𝖦, 𝖡𝖠𝖳𝖧, 𝖢𝖫𝖮𝖳𝖧, 𝖢𝖴𝖱𝖤, 𝖣𝖱𝖤𝖲𝖲, 𝖥𝖫𝖤𝖤𝖢𝖤, 𝖥𝖮𝖱𝖢𝖤, 𝖦𝖮𝖠𝖳, 𝖦𝖮𝖮𝖲𝖤, 𝖪𝖨𝖳, 𝖫𝖮𝖳, 𝖭𝖤𝖠𝖱, 𝖯𝖠𝖫𝖬, 𝖲𝖰𝖴𝖠𝖱𝖤, 𝖲𝖳𝖠𝖱𝖳, 𝖲𝖳𝖱𝖴𝖳, 𝖳𝖧𝖮𝖴𝖦𝖧𝖳, 𝖳𝖱𝖠𝖯

Then convert the corresponding vowels to bold face.

formants %<>% mutate(
  Word_Label = ifelse(
    Word %in% c(
      "ball", "bang", "bath", "beat", "father", "famous", "goat", "hand",
      "laugh", "near", "palm", "says", "square", "start", "trap"
    ),
    str_replace(Word_Label, alphabet_reg["a"], alphabet_bold["a"]),
    Word_Label
  ),
  Word_Label = ifelse(
    Word %in% c(
      "beat", "bet", "blue", "dress", "fleece", "guest", "near", "people"
    ),
    str_replace(Word_Label, alphabet_reg["e"], alphabet_bold["e"]),
    Word_Label
  ),
  Word_Label = ifelse(
    Word %in% c("bit", "finish", "kit", "pin"),
    str_replace(Word_Label, alphabet_reg["i"], alphabet_bold["i"]),
    Word_Label
  ),
  Word_Label = ifelse(
    Word %in% c(
      "among", "another", "bought", "cloth", "core", "could",
      "force", "goat", "honorific", "lot", "people", "thought",
      "through", "who", "wrong"
    ),
    str_replace(Word_Label, alphabet_reg["o"], alphabet_bold["o"]),
    Word_Label
  ),
  Word_Label = ifelse(
    Word %in% c(
      "bought", "blue", "but", "could", "fun", "guest", "laugh",
      "put", "strut", "square", "support", "thought", "through"
    ),
    str_replace(Word_Label, alphabet_reg["u"], alphabet_bold["u"]),
    Word_Label
  ),
  Word_Label = ifelse(
    Word == "believe",
    str_replace(
      Word_Label,
      str_c(alphabet_reg["i"], alphabet_reg["e"]),
      str_c(alphabet_bold["i"], alphabet_bold["e"])
    ),
    Word_Label
  ),
  Word_Label = ifelse(
    Word %in% c("goose", "foot"),
    str_replace_all(
      Word_Label,
      str_c(alphabet_reg["o"], alphabet_reg["o"]),
      str_c(alphabet_bold["o"], alphabet_bold["o"])
    ),
    Word_Label
  ),
  Word_Label = ifelse(
    Word == "fleece",
    str_replace(Word_Label, alphabet_reg["e"], alphabet_bold["e"]),
    Word_Label
  )
)
formants$Word_Label %<>% as.factor()

1: Mutating the Word_Label column multiple times, because several words have multiple vowels to swap. Swapping one vowel at a time is shorter than swapping one category of word at time.
2: First modification to Word_Label is all the words where “A” gets bolded.
3: If the value in the Word column is one of these items
4: Then pass the value of the Word_Label column to str_replace(). Replace the “a” from the regular-face set with the “a” from the bold-face set.
5: If the value in the Word column is not any of those words, keep the value of Word_Label the same.
6: Same logic for all the words where “E” gets bolded.
7: Same logic for all the words where “I” gets bolded.
8: Same logic for all the words where “O” gets bolded.
9: Same logic for all the words where “U” gets bolded.
10: There are a couple of exceptions: “believe”, because that’s the word where the second instance of the vowel gets bolded, not the first one. Replace the consecutive “I” and “E” from the regular-face set with the “I” and “E” from the bold-face set.
11: “Goose” is the only word where both O’s need to be bolded.
12: “Fleece” needs the first two, but not the third E bolded.
13: Convert Word_Label from character to factor.

Which renders as:

𝖠𝖬𝗢𝖭𝖦, 𝖠𝖭𝗢𝖳𝖧𝖤𝖱, 𝖡𝖤𝖫𝗜𝗘𝖵𝖤, 𝖡𝖫𝗨𝗘, 𝖡𝗔𝖫𝖫, 𝖡𝗔𝖭𝖦, 𝖡𝗔𝖳𝖧, 𝖡𝗘𝖳, 𝖡𝗘𝗔𝖳, 𝖡𝗜𝖳, 𝖡𝗢𝗨𝖦𝖧𝖳, 𝖡𝗨𝖳, 𝖢𝖫𝗢𝖳𝖧, 𝖢𝖴𝖱𝖤, 𝖢𝗢𝖱𝖤, 𝖢𝗢𝗨𝖫𝖣, 𝖣𝖱𝗘𝖲𝖲, 𝖥𝖫𝗘𝗘𝖢𝖤, 𝖥𝗔𝖬𝖮𝖴𝖲, 𝖥𝗔𝖳𝖧𝖤𝖱, 𝖥𝗜𝖭𝖨𝖲𝖧, 𝖥𝗢𝖱𝖢𝖤, 𝖥𝗢𝗢𝖳, 𝖥𝗨𝖭, 𝖦𝗢𝗔𝖳, 𝖦𝗢𝗢𝖲𝖤, 𝖦𝗨𝗘𝖲𝖳, 𝖧𝗔𝖭𝖣, 𝖧𝗢𝖭𝖮𝖱𝖨𝖥𝖨𝖢, 𝖪𝗜𝖳, 𝖫𝗔𝗨𝖦𝖧, 𝖫𝗢𝖳, 𝖭𝗘𝗔𝖱, 𝖯𝗔𝖫𝖬, 𝖯𝗘𝗢𝖯𝖫𝖤, 𝖯𝗜𝖭, 𝖯𝗨𝖳, 𝖲𝖰𝗨𝗔𝖱𝖤, 𝖲𝖳𝖱𝗨𝖳, 𝖲𝖳𝗔𝖱𝖳, 𝖲𝗔𝖸𝖲, 𝖲𝗨𝖯𝖯𝖮𝖱𝖳, 𝖳𝖧𝖱𝗢𝗨𝖦𝖧, 𝖳𝖧𝗢𝗨𝖦𝖧𝖳, 𝖳𝖱𝗔𝖯, 𝖶𝖧𝗢, 𝖶𝖱𝗢𝖭𝖦

Now we can see which vowels are being plotted more clearly (but with a reminder of how messy English orthography is):

words_ep_2 <- formants %>%
  filter(List == "Lingthusiasm Episodes") %>%
  group_by(Speaker, Vowel, Word_Label) %>%
  summarise(F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, color = Vowel, label = Word_Label)) +
  geom_point(size = 1.25) +
  geom_label_repel(min.segment.length = 0, force = 75, seed = 2024, size = 4) +
  facet_wrap(~Speaker, ncol = 1) +
  scale_color_manual(values = carto_pal(12, "Bold")) +
  scale_x_reverse(breaks = c(750, 1250, 1750, 2250, 2750)) +
  scale_y_reverse(breaks = c(250, 500, 750, 1000)) +
  theme_classic() +
  lingthusiasm_theme +
  theme(plot.subtitle = element_markdown(size = 15, family = "sans")) +
  labs(title = "Means By Word: Lingthusiasm Episodes", subtitle = vowel_key) +
  guides(color = guide_none())

words_ep_2

1: Replace Word with Word_Label.
2: Replace Word with Word_Label here too.

Figure 8: Mean for each word in the Lingthusiasm Episode word list.

This was one of the points where I (from the northeast US) realized how I don’t have a lot of experience with Australian accents, because I wasn’t entirely sure how much of the messiness in Lauren’s back vowel data was because her back vowels are in different locations, or because I picked words where she uses a different vowel than Gretchen and I do.

3.5 Plot Vowel Boundaries

One of the the most common types of vowel plots draws ellipses around the vowel boundaries.

ellipses <- formants %>%
  ggplot(aes(x = F2, y = F1, group = Vowel)) +
  geom_mark_ellipse(
    aes(color = Vowel),
    expand = 0
  ) +
  geom_textbox(
    data = . %>%
      summarise(.by = c(Speaker, List, Vowel), F1 = mean(F1), F2 = mean(F2)),
    aes(label = Vowel, fill = Vowel),
    color = lingthusiasm_navy, size = 4.5, halign = 0.5, valign = 0.5,
    alpha = 0.8, box.color = NA,
    width = unit(0.10, "snpc"), height = unit(0.10, "snpc"),
    box.padding = unit(c(0, 0, 0, 0), "snpc"), box.r = unit(0.01, "snpc")
  ) +
  facet_grid(Speaker ~ List) +
  scale_color_manual(values = carto_pal(12, "Bold")) +
  scale_fill_manual(values = carto_pal(12, "Bold")) +
  scale_x_reverse(breaks = c(750, 1250, 1750, 2250, 2750)) +
  scale_y_reverse(breaks = c(250, 500, 750, 1000)) +
  theme_classic() +
  lingthusiasm_theme +
  labs(title = "Vowel Boundaries") +
  guides(color = guide_none(), fill = guide_none())

ellipses

1: Specify group = Vowel because all layers are going to be grouped by vowel, but wait to specify fill and color because those will vary by geom.
2: ggforce::geom_mark_ellipse() draws an ellipse around the points in each Vowel group. stat_ellipse() is also an option, but not all the vowels in the Wells Lexical Set have enough observations for it.
3: Set the outline around the ellipse to be color-coded by Vowel, but keep the fill of the ellipses white.
4: expand = 0 draws the ellipse exactly around the edges of the data, vs. the default value of expanding the ellipse out by 5mm.
5: If you give geom_mark_ellipse() a value for label, it will draw labels for each ellipse. But I want the labels centered on the mean for each vowel, not drawn to the side, so to do that I’m using geom_textbox() again (like for the first vowel means plots).
6: geom_textbox() needs the means for each Speaker*List*Vowel, otherwise it will draw a label over every data point. We can summarize the data already passed into the plot (formats) by specifying data = . %>% summarise(). So this geom (but not the ellipse geom) uses the mean of F1 and F2 grouped by Speaker, List, and Vowel. If you only specified Vowel here, it would plot the same means on each panel when we facet_wrap() by Speaker and Vowel on the next line.
7: Set the label and fill (but not color) to vary by Vowel for just geom_textbox().
8: Set the text size, alignment, and color.
9: Remove the outlines around the boxes and make the fill color slightly transparent.
10: Set the box dimensions (see vowel means plots above).

Figure 9: Vowel means (labels) and boundaries (ellipses).

Ellipse plots are hard to make work for this data set, since there’s just too much going on in the Lingthusiasm Episode data. It would work better with cleaner/more controlled data (like the Wells Lexical Set recordings), or if you weren’t trying to compare all of the vowels. For now, I think this just shows how complex speech comprehension is! There isn’t a straightforward way to visualize boundaries in the naturalistic data, even though we’re able to perceive boundaries without much trouble.

For more info on ellipse vowel plots and how to tweak their appearance, I like this tutorial.

Save the plots created so far:

ggsave(
  means_1, path = "plots", filename = "1_means_original.png",
  width = 8, height = 5, unit = "in", device = png
)
ggsave(
  means_2, path = "plots", filename = "2_means_flipped.png",
  width = 8, height = 5, unit = "in", device = png
)
ggsave(
  points, path = "plots", filename = "3_individual_points.png",
  width = 8, height = 5, unit = "in", device = png
)
ggsave(
  words_ls, path = "plots", filename = "4_words_lexical_set.png",
  width = 8, height = 5, unit = "in", device = png
)
ggsave(
  words_ep_2, path = "plots", filename = "4_words_episodes.png",
  width = 8, height = 8, unit = "in", device = png
)
ggsave(
  ellipses, path = "plots", filename = "5_ellipses.png",
  width = 8, height = 5, unit = "in", device = png
)

1: Need to specify device = png (not leave default or device = "png") to get Josefin Sans font to render correctly.

3.6 Stylized Versions

The plots so far would work for a scientific presentation/paper, but the original goal was to make plots that are a little more artistic. Now that we understand what vowel data is going into the plots and how the vowel plots correspond to the IPA vowel chart, we can play around with removing some of the axis and label information for a more minimalist look. Starting at this step, without making the complete plot first, is possible, but it would be easy to end up with errors.

Basically, if you want to remove axes for scientific presentations: don’t. If you want to remove axes for artistic license: make sure you understand what you’re removing first.

Word Means for Wells Lexical Set

The first pair of stylized plots just includes the Wells Lexical Set data.

ggrepel::geom_label_repel(), which draws the label boxes offset from the points, doesn’t currently support having the label text a different color than the box outline, so the trick with this one is to call it twice—once to draw everything in navy, and a second time to draw just the text in green on top of the navy text.

Here’s Gretchen’s version:

gretchen_words_ls <- formants %>%
  filter(Speaker == "Gretchen" & List == "Wells Lexical Set") %>%
  summarise(.by = Word, F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, label = Word)) +
  geom_point(color = lingthusiasm_navy, size = 2) +
  geom_label_repel(
    color = lingthusiasm_navy,
    family = "sans_alt", size = 5,
    min.segment.length = 0, segment.size = 0.75, label.size = 1,
    label.padding = unit(0.4, "lines"), label.r = unit(0.4, "lines"),
    force = 65, seed = 84
  ) +
  geom_label_repel(
    color = lingthusiasm_green,
    family = "sans_alt", size = 5,
    min.segment.length = 0, segment.size = NA, label.size = NA,
    label.padding = unit(0.4, "lines"), label.r = unit(0.4, "lines"),
    force = 65, seed = 84
  ) +
  scale_x_reverse(limits = c(3100, 500), expand = c(0, 0)) +
  scale_y_reverse(limits = c(1200, 325), expand = c(0, 0)) +
  theme_classic() +
  theme(
    axis.line = element_blank(), axis.text = element_blank(),
    axis.ticks = element_blank(), axis.title = element_blank()
  ) +
  annotate(
    geom = "text", label = "Gretchen",
    family = "sans_alt", size = 11, color = lingthusiasm_green,
    x = 2525, y = 1070
  ) +
  inset_element(
    p = lingthusiasm_logo,
    left = unit(0.05, "snpc"), right = unit(0.15, "snpc"),
    top = unit(0.2, "snpc"), bottom = unit(0.1, "snpc")
  )

ggsave(
  gretchen_words_ls, path = "plots", filename = "gretchen_words_ls.png",
  width = 8, height = 5, unit = "in", device = png
)

1: Just include Gretchen’s data from the Wells Lexical Set list.
2: Calculate the mean of F1 and F2 for each word.
3: Like the rest of the plots, F1 goes on the Y axis, F2 goes on the X axis, and Word is the label.
4: Draw a point at each word mean, and make it the Lingthusiasm navy and slightly larger than the default.
5: First layer of geom_text_repel() to get the navy lines.
6: Make the text color and the outline around the box navy.
7: Josefin Sans font and larger text size.
8: Always draw a line between the label and the point, and make the line a bit thicker than default.
9: Increase the padding around the text and the amount of curve on the corners.
10: Increase the amount the labels are repelled from the points (play around with different values until it looks good), and set a seed so the output is consistent.
11: Second layer of geom_text_repel() to get the green text.
12: Make the text color (and also the outline for now) the Lingthusiasm green.
13: Josefin Sans font and larger text size. Needs to be the same as the previous layer so there’s no navy text visible below the green text.
14: Set segment.size and label.size to 0 so that this layer doesn’t draw a green box outline or line connecting to the point.
15: Same box padding and shape, repel value, and seed so that the labels are in the same place.
16: Reverse the axes, and set the limits to look right before removing the labels in the next step.
17: Remove the axis lines, ticks, numbers, and title.
18: Add a layer writing “Gretchen.”
19: Set the font to Josefin Sans, the color to Lingthusiasm green, and the size to be about double the size of the word labels.
20: The location is on the scale of the axes.
21: Use patchwork to add a layer for the Lingthusiasm logo. This layer needs to be last, and the image needs to be loaded as a raster (see Lingthusiasm Theme section at the beginning).
22: Locations for the logo. (0, 0) is the bottom left corner of the plot, and (1, 1) is the top right corner. The distance between the top/bottom and left/right are even, and the units are snpc (squared normalized parent coordinates), so the logo is square and 10% of the height of the plot.
23: Save the plot. The text/line sizes and locations of the annotation layers are set to look right at this size and aspect ratio. Instead of rendering the plot in this chunk like before, the saved image is inserted directly below, to keep the sizing exact.

Figure 10: Gretchen: Means for each word in the Wells Lexical Set data.

And here’s Lauren’s data:

lauren_words_ls <- formants %>%
  filter(Speaker == "Lauren" & List == "Wells Lexical Set") %>%
  summarise(.by = Word, F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, label = Word)) +
  geom_point(color = lingthusiasm_navy, size = 2) +
  geom_label_repel(
    color = lingthusiasm_navy,
    family = "sans_alt", size = 5,
    min.segment.length = 0, segment.size = 0.75, label.size = 1,
    label.padding = unit(0.4, "lines"), label.r = unit(0.4, "lines"),
    force = 65, seed = 84
  ) +
  geom_label_repel(
    color = lingthusiasm_green,
    family = "sans_alt", size = 5,
    min.segment.length = 0, segment.size = NA, label.size = NA,
    label.padding = unit(0.4, "lines"), label.r = unit(0.4, "lines"),
    force = 65, seed = 84
  ) +
  scale_x_reverse(limits = c(3000, 500), expand = c(0, 0)) +
  scale_y_reverse(limits = c(1000, 325), expand = c(0, 0)) +
  theme_classic() +
  theme(
    axis.line = element_blank(), axis.text = element_blank(),
    axis.ticks = element_blank(), axis.title = element_blank()
  ) +
  annotate(
    geom = "text", label = "Lauren",
    family = "sans_alt", size = 11, color = lingthusiasm_green,
    x = 2500, y = 865,
  ) +
  inset_element(
    p = lingthusiasm_logo,
    left = unit(0.05, "snpc"), right = unit(0.15, "snpc"),
    top = unit(0.25, "snpc"), bottom = unit(0.15, "snpc")
  )

ggsave(
  lauren_words_ls, path = "plots", filename = "lauren_words_ls.png",
  width = 8, height = 5, unit = "in", device = png
)

1: Change to include Lauren’s data instead of Gretchen’s.
2: To get a plot where the edge around the data are even, the limits for Lauren’s data are slightly different than the limits for Gretchen’s.
3: The location of the name layer also needs to be slightly different for Lauren.
4: And finally so does the location of the logo (but the size/aspect ratio stay the same).

Figure 11: Lauren: Means for each word in the Wells Lexical Set data.

Vowel Means for Lingthusiasm Episode Words

The second pair of stylized plots uses the data from the Lingthusiasm episodes.

I want to draw a trapezoid around vowel space, with the axes the same for Gretchen and Lauren’s data so you can compare them. geom_path() connects a series of points, so we need the four corners, plus the starting corner again. Here’s a visualization of the trapezoid coordinates, relative to the vowel means:

vowel_polygon <- tibble(
  x = c(800, 800, 3000, 2000, 800),
  y = c(1000, 275, 275, 1000, 1000)
)

formants %>%
  summarise(.by = c(Vowel, Speaker), F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1)) +
  geom_path(data = vowel_polygon, aes(x = x, y = y)) +
  geom_point() +
  scale_x_reverse() +
  scale_y_reverse() +
  theme_classic() +
  annotate(
    geom = "label", label = "1/5",
    x = as.numeric(vowel_polygon[1, "x"]), y = as.numeric(vowel_polygon[1, "y"])
  ) +
  annotate(
    geom = "label", label = "2",
    x = as.numeric(vowel_polygon[2, "x"]), y = as.numeric(vowel_polygon[2, "y"])
  ) +
  annotate(
    geom = "label", label = "3",
    x = as.numeric(vowel_polygon[3, "x"]), y = as.numeric(vowel_polygon[3, "y"])
  ) +
  annotate(
    geom = "label", label = "4",
    x = as.numeric(vowel_polygon[4, "x"]), y = as.numeric(vowel_polygon[4, "y"])
  )

Now that we have the layout of the trapezoid, plot Gretchen’s data. geom_textbox() and geom_label_repel() can only draw squares/rectangles, so to get circles like the Lingthusiasm logo, I use geom_mark_circle() to draw a circle, then geom_text() to write the vowel on top of it.

gretchen_vowels_ep <- formants %>%
  filter(Speaker == "Gretchen" & List == "Lingthusiasm Episodes") %>%
  summarise(.by = Vowel, F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot() +
  geom_path(
    data = vowel_polygon,
    aes(x = x, y = y),
    color = lingthusiasm_navy, linewidth = 2, lineend = "round"
  ) +
  geom_mark_circle(
    aes(x = F2, y = F1, group = Vowel),
    radius = unit(0.048, "snpc"), n = 1000,
    fill = lingthusiasm_green, alpha = 1, color = NA
  ) +
  geom_text(
    aes(x = F2, y = F1, label = Vowel),
    color = "white", size = 11, nudge_y = 4
  ) +
  scale_x_reverse(
    limits = c(max(vowel_polygon$x), min(vowel_polygon$x)),
    expand = c(0.01, 0.01)
  ) +
  scale_y_reverse(
    limits = c(max(vowel_polygon$y), min(vowel_polygon$y)),
    expand = c(0.01, 0.01),
    position = "right"
  ) +
  theme_classic() +
  theme(
    axis.line = element_blank(),
    axis.text = element_blank(),
    axis.ticks = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_text(
      size = 30, hjust = 1,
      color = lingthusiasm_navy, family = "sans_alt"
    )
  ) +
  labs(y = "Gretchen")

ggsave(
  gretchen_vowels_ep, path = "plots", filename = "gretchen_vowels_ep.png",
  width = 7.25, height = 5, unit = "in", device = png
)

1: Just include Gretchen’s data from the Lingthusiasm episode word list, and calculate the mean F1 and F2 for each Vowel.
2: Instantiate the plot, but set the values for aes() separately in each geom below.
3: Draw the trapezoid, using the data from vowel_polygon not the vowel means passed into the plot. This draws a line connecting each of the coordinates listed in the x and y columns.
4: Make the line navy, thicker than default, and rounded at the corners.
5: Draw a circle around each vowel mean. Specify group = Vowel not label = Vowel, so that there is a circle for each vowel, but no labels.
6: Make the radius of the circle 4.8% of the plot’s height (again using squared normalized parent coordinates, just small enough to not overlap), and increase the number of points used to draw the circle from 100 so it looks smoother.
7: Fill the circle with Lingthusiasm green, make it not transparent (the default alpha is 0.3 not 1), and remove the black outline.
8: Draw the vowel label at each mean.
9: Make the text white, large enough to just fit inside the circle, and nudge it up just a bit (because most of the IPA symbols are sized as lowercase).
10: Set the axis limits to be 1% larger than the borders of the trapezoid.
11: Move the title of the Y axis from the left to the right.
12: Remove the lines, text, and ticks from the X and Y axes and the title from the X axis.
13: Make the Y axis title size 30, aligned to the bottom not the middle, navy, and Josefin Sans.
14: Set the Y axis title to be Gretchen instead of F1.

Figure 12: Gretchen: Means for each vowel in the Lingthusiasm episode data.

Do the same thing for Lauren’s data:

lauren_vowels_ep <- formants %>%
  filter(Speaker == "Lauren" & List == "Lingthusiasm Episodes") %>%
  summarise(.by = Vowel, F1 = mean(F1), F2 = mean(F2)) %>%
  ggplot(aes(x = F2, y = F1, group = Vowel)) +
  geom_path(
    data = vowel_polygon,
    aes(x = x, y = y, group = NA),
    color = lingthusiasm_navy, linewidth = 2, lineend = "round"
  ) +
  geom_mark_circle(
    radius = unit(0.048, "snpc"), n = 1000,
    fill = lingthusiasm_green, alpha = 1, color = NA
  ) +
  geom_text(aes(label = Vowel), color = "white", size = 11, nudge_y = 4) +
  scale_x_reverse(
    limits = c(max(vowel_polygon$x), min(vowel_polygon$x)),
    expand = c(0.01, 0.01)
  ) +
  scale_y_reverse(
    limits = c(max(vowel_polygon$y), min(vowel_polygon$y)),
    expand = c(0.01, 0.01),
    position = "right"
  ) +
  theme_classic() +
  theme(
    axis.line = element_blank(),
    axis.text = element_blank(),
    axis.ticks = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_text(
      size = 30, hjust = 1,
      color = lingthusiasm_navy, family = "sans_alt"
    )
  ) +
  labs(y = "Lauren")

ggsave(
  lauren_vowels_ep, path = "plots", filename = "lauren_vowels_ep.png",
  width = 7.25, height = 5, unit = "in", device = png
)

1: Only things that need to change from Gretchen’s version are the filter on the data.
2: And the axis label.

Figure 13: Lauren: Means for each vowel in the Lingthusiasm episode data.

To combine the two, I’m going to flip Gretchen’s horizontally, so it looks like two speakers facing each other.

paired_vowels_ep <- gretchen_vowels_ep +
  scale_x_continuous(
    limits = c(min(vowel_polygon$x), max(vowel_polygon$x)),
    expand = c(0.01, 0.01)
  ) +
  scale_y_reverse(
    limits = c(max(vowel_polygon$y), min(vowel_polygon$y)),
    expand = c(0.01, 0.01),
    position = "left"
  ) +
  theme(
    axis.title.y = element_text(hjust = 0),
    plot.margin = margin(t = 10, l = 10, r = 30, b = 10)
  ) +
  lauren_vowels_ep +
  theme(plot.margin = margin(t = 10, l = 30, r = 10, b = 10)) +
  lingthusiasm_tagline +
  plot_layout(design = c(
    area(t = 1, l = 1, b = 5, r = 4),
    area(t = 1, l = 5, b = 5, r = 8),
    area(t = 4, l = 4, b = 5, r = 5)
  ))

ggsave(
  paired_vowels_ep, path = "plots", filename = "paired_vowels_ep.png",
  width = 15, height = 5, unit = "in", device = png
)

1: Use the patchwork package to combine plots. Start with Gretchen’s plot on the left.
2: Replace the x axis scale with the non-reversed version scale_x_continuous, flip the limits accordingly, and keep the expansion the same.
3: Keep the y axis scale reversed, but put the axis title back on the left.
4: Shift the Y axis title (“Gretchen”) to be aligned to the bottom.
5: Add some space to the right side of the plot.
6: Add Lauren’s plot.
7: Add space to the left side of the plot. Based on how patchwork works, this only applies to Lauren’s plot added last, not Gretchen’s.
8: Add the Lingthusiasm tagline image.
9: Specify the locations of the three pieces on a grid to have Gretchen’s plot on the left, Lauren’s on the right, and the Lingthusiasm logo on the bottom.
10: Gretchen’s plot: full height, left side.
11: Lauren’s plot: full height, right side. 12 Lingthusiasm tagline logo: 20% of height, overlapping each of the other plots by 25%.

Figure 14: Gretchen (left) & Lauren (right): Means for each vowel in the Lingthusiasm episode data.

Citation

BibTeX citation:

@online{gardner2024,
  author = {Gardner, Bethany},
  title = {Lingthusiasm {Vowel} {Plots}},
  date = {2024-02-09},
  url = {https://bethanyhgardner.github.io/lingthusiasm-vowel-plots},
  doi = {10.5281/zenodo.10642632},
  langid = {en}
}

For attribution, please cite this work as:

Gardner, Bethany. 2024. “Lingthusiasm Vowel Plots.” February 9, 2024. https://doi.org/10.5281/zenodo.10642632.