library(knitr)
library(magrittr)
library(tidyverse)
library(glue)
Task 1: Generating Oxford Comma Triples
The central problem
Based on a fun conversation with my statistics cohort over dinner we got to discussing the famous Oxford Comma (or Serial Comma depending on your persuasion). I’ve never really adopted the use but my friends made a compelling argument on it’s apparent general lack of ambiguity when applied appropriately.
We will use the Oxford comma on the famously ambiguous phrase (here used without the Oxford Comma before leaves):
Eats, shoots and leaves
After adding in the Oxford Comma this would become:
Eats, shoots, and leaves
Goal: A fun experiment would be to generate all permutations of this phrase with and without the Oxford Comma using R
and specifically the tidyverse
packages.
Generating all word-triple permutations the tidy
way
First, let’s load our required packages.
Let’s also define our unique global word values used to construct the required phrases:
<- c("eats", "shoots", "leaves") WORD_VALS
Generate all unique 3-word permutations without replacement from the three unique words. We’ll create a helper function to check that a vector of words is unique.
<- function(word1, word2, word3){
is_unq_perm <- c(word1, word2, word3)
words_vec return(length(words_vec) - length(unique(words_vec)) == 0)
}
We can now simply generate every possible triple with replacement using the tidyr::crossing
function. We proceed to filter these \(3^3 = 27\) triples for unique triples using our is_unq_perm
helper function applied row-by-row using purrr::pmap_lgl
. The _lgl
simply returns a TRUE/FALSE
logical value as intended by the applied function.
# Generate the unique word-triples
<- tidyr::crossing(word1 = WORD_VALS,
all_perms word2 = WORD_VALS,
word3 = WORD_VALS) %>%
mutate(.data = .,
is_unq_perm = purrr::pmap_lgl(.l = .,
%>%
is_unq_perm)) filter(.data = ., is_unq_perm) %>%
select(-is_unq_perm)
# Display output in a nice centered table
%>%
all_perms kable(x = ., align = 'c',
col.names = c("Word 1",
"Word 2",
"Word 3"))
Word 1 | Word 2 | Word 3 |
---|---|---|
eats | leaves | shoots |
eats | shoots | leaves |
leaves | eats | shoots |
leaves | shoots | eats |
shoots | eats | leaves |
shoots | leaves | eats |
Great - that part is done! Now we just need to generate for each triple of words an oxford comma and non-oxford comma version. This is done easily using the amazing glue
package as seen below:
<- all_perms %>%
exprs mutate(non_oxford_comma =
glue_data(.x = .,
"{word1}, {word2} and {word3}"),
oxford_comma =
glue_data(.x = .,
"{word1}, {word2}, and {word3}")) %>%
select(non_oxford_comma, oxford_comma)
We can display the side-by-side output of the Non-Oxford Comma vs. Oxford comma for the \(6\) generated triples as follows:
# Display output in a nice centered table
%>%
exprs kable(x = .,
align = 'c',
col.names = c("Non-Oxford Comma",
"Oxford Comma"))
Non-Oxford Comma | Oxford Comma |
---|---|
eats, leaves and shoots | eats, leaves, and shoots |
eats, shoots and leaves | eats, shoots, and leaves |
leaves, eats and shoots | leaves, eats, and shoots |
leaves, shoots and eats | leaves, shoots, and eats |
shoots, eats and leaves | shoots, eats, and leaves |
shoots, leaves and eats | shoots, leaves, and eats |
So there you have it. Have fun generating your own version of Oxford Comma triples to engage in civil discussions with your fellow grammar focused friends 😄.
Task 2: Generating Sequentially Numbered BibTeX Entries
The central problem
In this case I needed to generate several BibTeX entries of the form:
@misc{doe2019_lec1,
author = {Doe, John},
title = {Lecture Note 1 - STAT10A},
month = {March},
year = {2018},
url = {https://statschool/~doe/stats10A/Lectures/Lecture01.pdf},
}
As it can be seen the lectures are numbered sequentially and change in the main BibTeX id
, the title
, and the url
field.
Specifically I needed to construct 30 such sequential entries for lectures 1-30
. Rather than do this manually, I realized that this would be fun scripting exercise with using the tidyverse
packages glue
, purrr
, and stringr
.
Goal: Create 30 such BibTeX entries and print to the console to directly-copy paste to my BibTeX file.
The tidy
approach
First step is to write a function that takes a lecture number (integer) as an input and then outputs a single BibTeX entry for that lecture.
# Generate BibTeX entry for a single lecture number
<- function(lec_num){
get_lec_bibtex # Get the 2 character padded lecture number i.e. 1 -> "01"
<- str_pad(string = lec_num, width = 2,
lec_num_pad side = "left", pad = "0")
# Construct the BibTeX entry
<- glue(
out_bbtex_str "@misc{doe2019_lec<lec_num>,
author = {Doe, John},
title = {Lecture Note <lec_num> - STAT10A},
month = {March},
year = {2018},
url = {https://www.hpg/~doe/st10A/lecs/lec<lec_num_pad>.pdf}}",
.open = "<",
.close = ">")
return(out_bbtex_str)
}
Note that by default glue
allows you to substitute input text in between {
and }
markers. However BibTeX entries already have literal default {}
tags that we need to include in our function output. Rather than escaping them the glue
package conveniently allows us to change the default opening and closing markers 💯! We simply set these to be angle brackets < >
using the .open
and .close
options above.
Let’s just test this out quickly:
<- 1
lec_no get_lec_bibtex(lec_num = lec_no)
@misc{doe2019_lec1,
author = {Doe, John},
title = {Lecture Note 1 - STAT10A},
month = {March},
year = {2018},
url = {https://www.hpg/~doe/st10A/lecs/lec01.pdf}}
Great - looks like it is working as required with the correct string padding in the lecture number in the pdf filename!
Apply to all lectures using purrr
Let’s finish this by creating all the entries using purrr
:
<- c(1, 30)
lec_nums %>%
lec_nums map_chr(.x = ., .f = ~get_lec_bibtex(lec_num = .x)) %>%
cat(., sep = "\n\n")
@misc{doe2019_lec1,
author = {Doe, John},
title = {Lecture Note 1 - STAT10A},
month = {March},
year = {2018},
url = {https://www.hpg/~doe/st10A/lecs/lec01.pdf}}
@misc{doe2019_lec30,
author = {Doe, John},
title = {Lecture Note 30 - STAT10A},
month = {March},
year = {2018},
url = {https://www.hpg/~doe/st10A/lecs/lec30.pdf}}
Yay - this works as expected! We can now paste into BibTeX as required.
Note that we only created it for lectures 1 and 30 for easy scrolling. But for all lectures we can just replace c(1, 30)
with 1:30
in the above code.
Conclusion
This post was for me to document and serve as a guide to automating a couple of fun text-based tasks that I came across in my work (and social life!). Using the tidy
framework can be a fun way to solve these tasks (but certainly not the only way in R
). Have fun playing around with the above and please post in the comments any questions/feedback you may have 👍.
Stay tuned for more blogposts solving more such tasks.
Acknowledgments
I’d like to thank Salil Shrotriya for creating the preview image for this post. The hex sticker png
files were sourced from here.
Reuse
Citation
@online{shrotriya2019,
author = {Shamindra Shrotriya},
title = {Tidyverse {Fun} - {Part} 1},
date = {2019-07-15},
url = {https://www.shamindras.com/posts/2019-07-15-shrotriya2019tidyfunpt1},
langid = {en}
}