This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Project Description: We wanted to find what are the three highest counties per state for Covid-19 cases per capita. We often hear about where there are high numbers of cases, but we wondered about the cases per capita for a county.
We found that the counties with the highest cases per capita don't always correlate to the counties with the highest population.
Project Description - Understanding usage patterns and trends for the bike sharing service in Chicago (Divvy) . We explored 2019 data which provided observations for each ride / rental along with information on the type of rider such as birth year, gender, and user type. For our project, we sought to answer these questions:
Most popular starting station
Most popular ending station
Differences in trip length by Age, Gender, Customer type
Visualize a heat map of popular bike locations to understand usage patterns
Create a Predictive Model for forecasting rides from each station by month
Note: This project was originally completed with full 2019 data and reduced to 2H (Q3/Q4 data) due to data size limitations for uploading and sharing to the community. If you are interested in analyzing the full 2019 data set, you can easily download the Q1/Q2 2019 files from the link above and connect them to the Join tool in the workflow.
Our team chose a data set from the UC Irvine machine learning repository of red and white wines and their corresponding attributes. Within the datasets, there are 12 variables including one on quality. Based on this quality attribute (scale of 1-10), we decided to create a binary classifier of Good (>=7) or Bad. This workflow utilizes data ingestion, feature engineering, the R toolkit, and the python SDK to assess different classification algorithms and their prediction performances (using AUC).
Initially our workflow squirreled away from us, we had four nasty formulas in a row just to get the times into a different format, but then we discovered the Date-Time tool. It was pretty much smooth sailing from there. We ended up with two solutions and both are posted below.
We wanted to look at data to determine which movie streaming service has the best movies based on ratings from Rotton Tomato and IMDb. We then wanted to explore if this changed with age (over 18, and under 18).