This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
First of all - amazing work in designing the Predictive Server! It's definitely a refreshing enhancement to our product roadmap that catapults us to be a more serious leader in the Data Science space. Please find some of my feedback below with regard to my experience in using Predictive Server so far:
Data Input & Field Selection Needs to be More Intuitive
- If I want to change the dataset to a more updated one, it is not possible at the moment. I assume this will be incorporated in the actual GA?
- When I dropped a field accidentally, I cannot bring back the same field again if I want to repeat the analysis. Assisted Modeling allows me to do this, perhaps consider a checkbox to select/de-select a field as and when a user want?
- ID is not automatically detected, which was the case in Assisted Modeling.
Data Health Should Go Beyond Just “Good to Know”
- Very good addition to the Predictive Server; gives good snapshot that allows users to perform a sense check of their data. However, so what now if I know that my data health score is B? C? D+? What’s the actionable item here?
- Outlier analysis can be developed further - should I keep the outliers? Should I transform the outliers (eg log transformation)? Do outliers only exist in certain conditions (e.g. certain customer personas?)
Correlation Analysis / Chord Diagram / Predictive Power should be Streamlined Further
- Chord diagrams are pretty, but is it necessary? If there are too many variables, the chord diagram may look messy and not user-friendly
- Correlation/Association analysis typically assist users in two scenarios:
1. Remove 1 of 2 highly-correlated explanatory variables
2. Understand which explanatory variables are highly-correlated to the target variable
- The above insights should automatically be teased out so that users can take immediate action on which variables to keep, and which to remove
3. Showing "1.0" correlation for the same variable (e.g. Column A x Column A = 1.0 correlation) is redundant and highly unnecessary - can this be removed?
4. In assisted modeling, users are informed of the variables with the highest / lowest predictive power (using GKT, Gini etc) and recommend which variables to keep and which to drop - this is missing in Predictive Server.
Feature Engineering Needs to be “Emphasized” More
- I can see that FeatureLabs' feature engineering capabilities have been subtly incorporated inside the Predictive Server. User can even select (or de-select) what types of feature engineering can be made - however to do so, they have to click the Advanced configuration button. May be this can be highlighted more prominently?
- When features are finally generated, and I click on Features which are available, it only shows the original features, whereas the engineered features are not shown. Even in the Relative Importance Chart in the next step, the engineered features are also not shown. Is this a bug?
- Feature Engineering can be a huge differentiating factor between us and competitors like DataRobot, Dataiku, H20.ai etc. I believe this is not highlighted / showcased prominently enough in the User Experience of Predictive Server - please highlgiht this! 🙂 It's amazing!
Model Explainability Needs to be More Transparent (something our current R-based Predictive Tools perform really well at)
- Same feedback for Assisted Modeling: while we can see the "relative ranking" of predictive variables from most important to least important, business users cannot assess the extent of impact (e.g. if we increase Variable A by 10%, how much will my target profit increase by? By how many X %?). Maybe the upcoming Simulator & Partial Dependency Plot will address this issue ?
- For regression methods, maybe incorporation of coefficient analysis? Odds Ratio analysis for logistic regression?
- Decision Tree... but no trees to show? This was an important feature that some of my customers love by simply using the R-Based Decision Tree to evalute the different possible decision "route" and take immediate action.
- From the Relative Important Chart, what actionable insights can users make then? E.g. I know that product pricing is super important in predicting customer churn, but should I increase or decrease my product's prices in order to reduce churn? Relative importance chart does not show this magnitude analysis
- This is GOLD MINE because non-developers (aka consumers) can leverage the Predictive Server to understand what's happening in that "black box", and understand why predictions are happening that way, and simulate different scenarios on what are possible (see below).
More Robust & Governed User Story / Persona is Required
- We have a use case by a Global Bank in Singapore where Predictive Server is a good fit because the Bank RMs are now tasked to perform quick simulation to predict client profitability before they onboard them as a client. User story goes like this:
1. Avinash the analyst will use Intelligence Suite to prototype some models in desktop, get a sense check of what can be done
2. Once business case is approved, he then models and productionize the insights in Predictive Server (Developer Persona)
3. Predictive Server will also be exposed to the Bank RMs to log in (Consumer Persona). They can view the insights generated, toggle around with the Simulator & Partial Dependency Plot to simulate different financial scenarios before making a decision to onboard a profitable client. However, they should not have the ability to edit/re-train/re-deploy and can only consume insights (re-training permissions should only be done by the Developer Persona, aka Avinash)
Right now, the Predictive Server only takes into account the Developer Persona. I assume there's a separate user journey/story for the Consumer Persona (e.g. like the Artisan v. Viewer concept in Server/AAH) in the GA release?
Open-Source Integration in the Future?
Out of curiousity, will there be some form of Python/R integration inside the Predictive Server so that at any point in time, Developers can incorporate some of their custom code as part of the pipeline. Will this feature be incorporated as part of the Predictive Server roadmap?
Michael, your feedback is FANTASTIC. Many, actually - I'd venture to say Most of the items are well on their way and planned prior to GA. We will be adding more transparency items, and yes, the Simulator as well as Partial Dependency Plots and even Shapley values are all being added. Your thoughts on the ability to adjust data on the fly and iterate are also something that I'm sure will further develop as the product iterates. Sharing projects will also be something that I'm sure develops more in the future, and I love your viewer persona. I'll be curious how common that is.
As for the Open Source and ability to leverage Predictive Server with code - we will have an SDK at launch to drive the tech through Python code. Note to your opening question about connecting directly to data, you can basically do this today by creating a Designer Workflow and connecting to PS through a Designer tool. You can then run the workflow on a schedule, creating new models and automatically pushing the Output to Designer, or viewing these new models in PS by logging in.