https://lookerstudio.google.com/reporting/39872504-ac25-4d3f-8436-4e86957c63e2
https://www.mavenanalytics.io/project/3148
Introducing myself to my first Data Analytics Challenge with Maven, I’ll start to explain the process to do for the final dashboard.
Once downloaded the corresponding dataset, I imported them to RStudio.
With a little preview it is possible notice that it is necessary to clean the data before to start any analysis.
At first, identify the wrong characters:
View(as.data.frame(subset(resorts$Resort, grepl("[<.>]", resorts$Resort),)))
# grep('[characters]', table$column) to get the desired characters
# subset to get just the column indexes
# as.data.frame and View to have a new table to visualize in RStudio
View(as.data.frame(subset(resorts$Resort, grepl("[?]", resorts$Resort),)))
resort_fixed <- gsub('[?]', '', resorts$Resort)
# using gsub to 1) select the pattern desired 2) replace it with and 3) where replace it
# special patterns to delete: ?, <96>, <e8>, <e9>, <f6>, <fc>, <92>, <c5>, <df>
resorts02 <- cbind(resorts, resort_fixed)
resorts02 %>% distinct(Country)
# do this with all the other columns
# Country, Continent, Season, Child friendly, Snowparks, Nightskiing, Summer skiing
# it's easy to do this because those have a few unique rows
write.csv(resorts02, file = 'resorts.csv')
write.csv(snow, file = 'snow.csv')