Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Create a pipeline in Alteryx Designer

Nadia_kraiem
5 - Atom

Hi all,

 

I m solving a linear regression problem wich consist to predict a house price. 

I want to know how to create a pipeline in Alteryx Designer, so that I can prepare my test set in the same way as my train test?

In Intelligence suite, we can do a pipeline, so that if you have new instances they will be prepared in the same way as the train set, but in Alteryx Designer we don't have this possibility.

I thought about creating a macro to prepare my train set (imputation, dealing with outliers, select variables...) and apply the same macro to the test set.

Is there another way to create a pipeline?

Any ideas?

 

Thank you.

9 REPLIES 9
AndrewSu
Alteryx
Alteryx

@Nadia_kraiem ,  I can confirm that Intelligence Suite is an add-on to Designer.

 

Could you clarify what you mean when you say "In Intelligence suite, we can do a pipeline, so that if you have new instances they will be prepared in the same way as the train set, but in Alteryx Designer we don't have this possibility"?

 

You are able to do this in Designer because Intelligence Suite is a part of Designer. 

 

Let me know what I'm missing.  Please provide an example workflow as well if possible.  That will help us work together towards a solution. 

 

Thanks. 

Nadia_kraiem
5 - Atom

Hi @AndrewSu,

Thank you for your answer. 

Actually I don't have Intelligence suite licence.

I want to create a pipeline that prepares the data.

I separed the train and the test before doing the preparation.

I want to know is that the good way to do the preparation.

I will share my worflows.

-House price.yxmd contains the model.

- prep2.yxmc is a macro that contains the data preparation. 

- result.yxmd contains the new data score

Thank you

 

 

OllieClarke
15 - Aurora
15 - Aurora

@Nadia_kraiem

all the parts of your pipeline you mention can be achieved with tools in the preparation tool palette (imputation, variable selection, outlier detection (might need a summarize tool here first)). Once you're happy with your method you can copy and paste these tools between your test and train data set. You could wrap this up in a macro if you want, but don't actually need to...

Nadia_kraiem
5 - Atom

@OllieClarke

 

To do the imputation I calculated the median for the train set and applied it for the missing values of the test set.

If I apply  these tools, like you recommended, for the test set, it will calculate the median for the test set whereas I want  to calculate the median of the train set and replace the missing values of the test set with the value calculated (in the train set)

OllieClarke
15 - Aurora
15 - Aurora

@Nadia_kraiem 

If you append the median of the test set onto the train set, then you can write a simple formula

 

IF ISNULL([value]) THEN [median_value] ELSE [value] ENDIF

 

to impute the missing values. 

 

Does that make sense?

 

Ollie

AndrewSu
Alteryx
Alteryx

@Nadia_kraiem , I agree with @OllieClarke's suggestion, but to accomplish your goal of imputation to apply the median values of the train set to the missing values of the test set, i imagine a simple "find/replace" tool could work for this where you are searching for "null" or "empty" in the test data set, and replacing that with the median value of the training dataset.

 

See the documentation below for some more information on the find/replace tool. 

 

https://help.alteryx.com/20221/designer/find-replace-tool

 

If this resolves your issue, please mark this post as the solution so that other's in the community can benefit from our collaboration. 

 

Thank you. 

OllieClarke
15 - Aurora
15 - Aurora

@AndrewSu I thought the Find and Replace took was limited to string data types?

AndrewSu
Alteryx
Alteryx

@OllieClarke , yes you are correct!  @Nadia_kraiem , either the formula mentioned by @OllieClarke or the find/replace tool can work (but you'd have to change to a string type then back to a number type).  

 

Some people like less formula tools in their workflow while other's don't mind them!  

 

many ways to Rome in Alteryx :)

Nadia_kraiem
5 - Atom

@OllieClarke

 

I wrote the formula like you said in the macro:

 

Nadia_kraiem_1-1664823587547.png

 

I wanted to have a point of view on the datascience side because there are functions that do this pipeline work in python, I wanted to make sure that I had transcribed this with Alteryx.

 

Thank you

 

Labels