Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Make Categorical Data Numeric

cpituley
8 - Asteroid

Hi Alteryx Community, 

 

I am trying to take a categorical data set and transform it into a purely numeric. Edit: I should add that this doesn't need to be ordinal since there is no relation between variables in a given column. 

 

I am thinking this involves the following steps: 

 

1) Remove duplicates from each column

2) Assign a unique ID (starting base 0) to each unique value in each column

3) Map the original data set back to the unique values

 

Example as follows: 

 

Before

Field 1 | Field 2 | Field 3

abc      | def       | ghi

abc      | def       |stu

jkl        | def       | ghi

jkl        | pqr       | stu

mno    | pqr       | ghi

mno    | pqr       | zzz

 

After:

Field 1 | Field 2 | Field 3

0          | 0          |0

0          | 0          |1

1          | 0          |0

1          | 1          |1

2          | 1          |0

2          | 1          | 2

 

Looking forward to your help. 

 

Thank you,

Cameron

3 REPLIES 3
CharlieS
17 - Castor
17 - Castor

I've attached a couple of solutions for scenarios like this. 

 

"UniqueCategorical.yxmd" will assign a numeric value (1+) to unique string characters in each field. This solution relies on the Tile tool configured to unique value. To start the assignment with 0, use a Formula tool after the tile tool to update the Tile_SequenceNum field to [Tile_SequenceNum]-1.

 

If you'll be using these values in statistic testing, I have another solution that converts string field values to binary flags for each field/value combination. This is the "DynamicDummies.yxmd" attachment. 

cpituley
8 - Asteroid

@CharlieS

 

This is brilliant. Your unique categorical is exactly what I was looking for. I was missing the tile tool set to unique. I was trying to use the unique tool and then find and replace. 

 

Cheers,

 

Cameron

irmb
7 - Meteor

Hello guys, I am very  new to Alteryx. I also have to convert some categorical values into numerical in order to do a linear regression. I am having difficulty applying the workflow provided. The issue is this:

I have two excel spreadsheets with thousands of data. One spreadsheet has names of people, state, address, country. The other table has names of people, products purchased, product category, sizes. I want to convert the product category, country, product category and size variables into numerical variables.  When I join both tables,  I don't know where the attach the workflow provided . Do I apply it to the join? Or do I apply the workflow to each separate table? The problem is that one of the tables and the combined table has one-to-many relationships. For example, one customer has bought products in different categories, etc. So the "product category" has fields where the categories repeat. How do I approach doing the conversion?

 

Thanks

Labels