We look forward to your participation!

Predictive Server Beta

Thanks for participating in the Predictive Server Beta for Alteryx associates! Find resources, ask questions, share ideas, and collaborate with fellow participants.

Not sure where to start? CLICK HERE.
Beta Walkthrough

Missed the live session? No problem! Check out the recorded version for an in-depth overview on how to participate.

Watch Video

Feedback after a couple of models

NickJ
Alteryx Alumni (Retired)

Hi everyone,

 

I created a couple of models using the beta over the weekend and here were my notes:

 

First model was aimed at using a non-English dataset of Iraqi tourism sites with an Arabic code-page: https://app.staging.tempo.ayxcloud.com/projects/8de47fcf-7228-4596-b213-ed376e077aae - it didn't parse the Arabic text correctly (whereas Designer did, once the correct code-page was specified). Wont call this a bug, but just be aware of internationalization around data sets. 

 

Second model was around making predictions of absenteeism from work using a dataset from UCI's ML dataset repository. I enriched the dataset in Alteryx Designer by creating a banding category from the raw numeric data for the number of hours an employee was absent from a job. https://app.staging.tempo.ayxcloud.com/projects/08606ae2-212a-4a27-89c4-3826d7a0ecad

 

With this second model, I also compared the PS experience to the Assisted Modeling experience directly within Designer, because it was Saturday night during a lockdown and had very little else to do 😉

 

OK - so, first step: loading the data. When initially loaded, there are 21 cols and 740 rows in the CSV - i found that I had to scroll below the fold in order to find the UI controls to scroll right on the columns - this was clunky and felt like it was designed for a different resolution (greater than 1080p?)

 

With data types, I asked myself what was the difference between an Integer and a WholeNumber data type? It wasn't clear.

 

Once loaded, I instinctively went to the Correlation Matrix view, but very quickly asked myself "how do i get rid of the ID column in here?" - I eventually realized that I could delete/remove columns on the main profile page. 

 

The color palette for the chord diagram is a bit samey - have we done any UX palette design to really help POP this? 

 

The beta PredServer didn't suggest to remove columns that were too highly correlated to target, which was annoying because I could end up wasting a lot of time building models with 'bad' features without guidance here. Contrast this to the experience in Assisted Modeling which is more helpful - worth doing something similar? 

 

PredServer didn't auto-deselect the "ID" field (whereas A/M did) - again, a simple bit of preprocessing that saves me time. 

 

When I built this second model, the score on the screen for all the various classifiers was zero - across the board. This may have been correct (i.e. models are all equally terrible) or may be a bug? Either way, because this was the case, PredServer didn't select a model for me on the dropdown to the left after the model build completed.

 

ROC Curve didn't appear, despite applying holdout data

 

Using the Extra Trees classifier, Band 3 (the big-time absenters from work) got no predictions at all in the confusion matrix. Whereas XGBoost in Assisted Modeling came up with more of a balance. 

 

Will keep playing with additional datasets. I've added the absentee data that I used to this post. (For some reason, it's grumbling about the Iraqi/arabic dataset)

 

Cheers,

Nick

 

 

 

 

 

 

 

 

 

 

Nick Jewell | datacurious.ai
3 REPLIES 3
NickJ
Alteryx Alumni (Retired)

Couple more things - started working on a LendingClub predictive model where there's an enriched field from the raw data called 'bad' and the dataset is sampled 50:50 for balance of good vs bad loans. Around 2300 rows with 50+ columns. https://app.staging.tempo.ayxcloud.com/projects/f12b19f2-0d10-44c1-a096-e48bfe74796d

 

Pred Data screen: From a newbie user perspective - I didn't realise that the stretch window icon to the right of the data actually provided you with the cool, data viz/profiling screen. Need to make this a clearer feature as it's where I'd want to spend more of my time at this point, checking for anomalies, etc.

 

Pred Data screen: With this dataset, there's lots of columns (it DID detect my ID field this time), but PredServ throws lots of warnings and it's quite laborious to visit each column in turn and then include/exclude. I also got this warning about a column... couldn't tell which: 

 

NickJ_0-1610389624767.png

 

When my data has lots of columns/features - it might be nice to be able to pivot to a column of column names rather than the full wide table and then include/exclude from there? Trying to find all these warnings was user-friction (to me). 

 

Echoing Tracey's point about not detecting dates correctly at this stage. 

 

Unfortunately - when I ran the model with fewer columns after pruning down I received this error: 

 

NickJ_1-1610389758461.png

 

and couldn't proceed. 

 

I also ran this dataset through Assisted Modeling too, and got better guidance over column inclusion/imputation - I'd treat this as a better experience just now, and something to achieve UX-parity with for our customer beta? 

 

NickJ_2-1610389868066.png

 

 

Cheers!
Nick

 

Nick Jewell | datacurious.ai
thomasmcgrath
Alteryx Alumni (Retired)
Great feedback Nick. I totally agree with you on the expanding part. I forget to even show it during demos some times! The concept of a pivot table has now been added to our back log. I'd like to dive deep with you on the data checks though. I'll be following up.
AJacobson
Alteryx
Alteryx

It appears the data set has a few columns that are all NULL or nearly all NULL.  When I delete those columns it does run well - although there appears to be target leakage - as I'm getting an AUC of 1.0 with two models.  Looking now to identify where that is coming from - as there was no warning of this.