Showcase your achievements in the Maveryx Community by submitting a Success Story now!
SUBMISSION INSTRUCTIONSSince I already have a Linear Regression model that helps me predict my next fare, the next logical step is to predict driver profit. To do this, I want to use logistic regression (driver profit = yes or no). But I want help doing some of the heavy "lyfting".
The analytics problems I am solving are: As a predictive analytics newbie, I want guidance so I don’t have to guess if I am on the right trail. Some of the things I want help with include getting my dataset in tip top shape for building a predictive model (this includes data prep), and deciding which columns are good features. I also want to build more than one model and compare them.
As a Lyft driver I am concerned about profit and not knowing if there are Lyft ride factors that add to or subtract from profit. I am particularly interested in getting guided assistance in building a predictive model. I want a guided interface that helps me as a newbie -- I just want to get started.
I am using the Alteryx Predictive Analytics Starter Kit as a resource for some of my data since it is difficult to find customer demographic data. Sources for Lyft ride data and driver profit calculations include personal data and data collected from internet reports and studies including publicly available "vehicle cost of ownership" data. I am using Excel spreadsheets and Alteryx databases.
My workflow demonstrates how I use the beta version of Assisted Modeling tool to take on some of the heavy "lyfting": setting data types, replacing missing values, and selecting features. I want to use the built-in features that come with assisted modeling to take the guesswork out of prep work. That way I don't have to guess what I need to do. I also want help selecting which columns of my dataset would make the best features or predictors.
I start with an already blended dataset that contains data I've collected as a Lyft driver. The data is a test dataset and makes good use of the test data from the Alteryx Predictive Analytics Starter Kit Volume I. I first prep my dataset and create samples - saving 20 percent of the dataset to validate my model.
I sit back and watch as Assisted Modeling beta does some of the heavy "lyfting" for me. For example, there is a data typing step that changes my data types - setting household income to numeric for example - and a cleanup step that replaces a few missing values it found in the age column. This is cool. Then there is a select columns step that points out how columns such as "household income" are pretty good predictors.
Step one: All I have to do here is determine what I want to predict and set it as the “target”. That’s easy so. Now I’m guided to move on to Step two.
Step two - Data Typing: In this step, the guided interface has pre-determined the data types of my columns (features). I accept the recommendations and I’m guided to move on to Step three.
Step three – Clean up dataset: This interface seems to have everything I’m looking for so far. In Step three, the system has found columns where my dataset has missing information. This is something I would not have looked for prior to starting assisted modeling. I am glad the interface is helping me with this. It even recommends a missing value replacement action. After this, I am guided to move on to the next step.
Step four – Select Columns: Step four is one of the actions I really wanted help with – finding out which columns in my dataset would make the best features or predictors. I think I can figure out from the graph and icons which features are at the top of the list. Since I have not seen the model yet, I don’t make any changes here, and I accept the recommendations.
Step five – Select models: The guided steps take me to Step five where I can click a button to start building models. I can clearly see which model is the most accurate, given the selections I’ve made so far and my dataset. There is even a Leaderboard that lets me compare the models I’ve built.
Assisted and automated tools are making it easier for business analysts and citizen data scientists to get started quickly in predictive analytics.
Use of the Modeling tool requires participation in the Alteryx Analytics Beta program. Visit the the Alteryx beta program, also known as the Alteryx Customer Feedback Program, to find out more.
Great Post!