Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
Garabujo7
Alteryx
Alteryx

 

Garabujo7_0-1633531823132.png

 

 

Feature engineering           

Feature engineering is the creation of new variables from current variables. It is part of data preparation, but it is also commonly used in machine learning.

 

A new variable is created based on business knowledge, intuition, and technical experience; it is usually a complex process. In most cases, it is a manual process, so it is time consuming and requires knowledge of SQL and programming.

 

Usually, sparse data is extracted from multiple systems and tables and used for machine learning procedures. Therefore, it is necessary to integrate them into a single table.

 

That is why in Alteryx Designer version 2021.1 four new analytical tools were incorporated into the Machine Learning tab, to save time and simplify the process. We'll focus on two:

 

  • Feature types
  • Data Health

Garabujo7_1-1633531863535.png

 

 

The recommendation is to use these new tools to simplify the process, add value to the business, and create high-quality machine learning models.

 

Garabujo7_2-1633531885797.png

 

Feature types

To increase the effectiveness of an analytical model, it is possible to enrich the data that will be used with more information, for that we will use the Feature Types tool that is also very easily configured.

 

The name of each characteristic, the data type, the type of rich data that can be added, and the type of output that it will create can be selected.

 

Garabujo7_3-1633531906709.png

 

The recommendation is to use automatic detection to make it easier. In this way, the process of adding the enriched data type will be carried out automatically.

 

The following is the list of all existing rich data types. With these types of enriched data, the model will have more information when training and will have the ability to give better results.

 

For example, if the data type is date, with the enriched type we can establish that this date is the person's birth (or a delta from another date variable) and with this information the data are better explained.

 

Garabujo7_4-1633531966069.png

 

In the next article I will talk about the Data Health tool. Stay tuned!

 

Banner image by pisauikan-4552082