Presentation

https://lookerstudio.google.com/reporting/39872504-ac25-4d3f-8436-4e86957c63e2

https://www.mavenanalytics.io/project/3148


Clean - Load Process

Introducing myself to my first Data Analytics Challenge with Maven, I’ll start to explain the process to do for the final dashboard.

  1. Once downloaded the corresponding dataset, I imported them to RStudio.

  2. With a little preview it is possible notice that it is necessary to clean the data before to start any analysis.

    Untitled

  3. At first, identify the wrong characters:

View(as.data.frame(subset(resorts$Resort, grepl("[<.>]", resorts$Resort),)))
# grep('[characters]', table$column) to get the desired characters
# subset to get just the column indexes
# as.data.frame and View to have a new table to visualize in RStudio

Untitled

  1. After getting the special patterns, I removed them.
View(as.data.frame(subset(resorts$Resort, grepl("[?]", resorts$Resort),)))

resort_fixed <- gsub('[?]', '', resorts$Resort)
# using gsub to 1) select the pattern desired 2) replace it with and 3) where replace it
# special patterns to delete: ?, <96>, <e8>, <e9>, <f6>, <fc>, <92>, <c5>, <df>
  1. Once getting the resorts clear, add it to the dataframe, after it is possible delete the resorts column original.
resorts02 <- cbind(resorts, resort_fixed)
  1. Verify that the character columns have or not unusual patterns.
resorts02 %>% distinct(Country)
# do this with all the other columns
# Country, Continent, Season, Child friendly, Snowparks, Nightskiing, Summer skiing
# it's easy to do this because those have a few unique rows

Untitled

  1. With any value considered innapropiate to analyze, keep going to export in a csv file
write.csv(resorts02, file = 'resorts.csv')
write.csv(snow, file = 'snow.csv')