stat405

Homework 6

Due, in stat405 mailbox, Thursday Oct 11

Team skills

Perform a team debriefing. What worked well? What didn’t work well? Are you happy with the final project? What would you change next time. Write one page per team. (10 points)

Regular expressions practice

Model answers from Barret

Use regular expressions and the stringr package to solve the following challenges related to variables from the mpg2 data set from project 1. Unless otherwise noted, each challenge should be answered by creating a vector (or vectors) containing the answer.

Cleaning up transmissions: (5 points)

  • Is a car an automatic?

  • Convert Auto to Automatic

  • Eliminate all transmission codes with less than 50 cars

  • Convert (e.g.) “Automatic (S4)” to “Automatic 4-spd”. (Hint: either do this in two steps, or read the help for str_match)

Model names: (5 points)

  • For all Porsche models, extract the name of the model from its options (i.e. the model name of “Panamera S Hybrid” is “Panamera”)

  • Most Mercedes-Benz models have names like XYZ123. Extract the two pieces (alphabetic and numeric) and plot them.

  • How does the Mercedes naming scheme compare to the Infinity naming scheme?

Understanding engine descriptions (10 points)

  • Use regular expressions to clean up the eng_desc field by breaking it down into multiple variables. What are the most common components? What components are so rare that it’s not worthwhile to make a special variable for them?

You will be graded on the quality of your code - focussing on correctness and concision.