Make Categorical Data Numeric
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Alteryx Community,
I am trying to take a categorical data set and transform it into a purely numeric. Edit: I should add that this doesn't need to be ordinal since there is no relation between variables in a given column.
I am thinking this involves the following steps:
1) Remove duplicates from each column
2) Assign a unique ID (starting base 0) to each unique value in each column
3) Map the original data set back to the unique values
Example as follows:
Before
Field 1 | Field 2 | Field 3
abc | def | ghi
abc | def |stu
jkl | def | ghi
jkl | pqr | stu
mno | pqr | ghi
mno | pqr | zzz
After:
Field 1 | Field 2 | Field 3
0 | 0 |0
0 | 0 |1
1 | 0 |0
1 | 1 |1
2 | 1 |0
2 | 1 | 2
Looking forward to your help.
Thank you,
Cameron
Solved! Go to Solution.
- Labels:
- Datasets
- Preparation
- Transformation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I've attached a couple of solutions for scenarios like this.
"UniqueCategorical.yxmd" will assign a numeric value (1+) to unique string characters in each field. This solution relies on the Tile tool configured to unique value. To start the assignment with 0, use a Formula tool after the tile tool to update the Tile_SequenceNum field to [Tile_SequenceNum]-1.
If you'll be using these values in statistic testing, I have another solution that converts string field values to binary flags for each field/value combination. This is the "DynamicDummies.yxmd" attachment.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This is brilliant. Your unique categorical is exactly what I was looking for. I was missing the tile tool set to unique. I was trying to use the unique tool and then find and replace.
Cheers,
Cameron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello guys, I am very new to Alteryx. I also have to convert some categorical values into numerical in order to do a linear regression. I am having difficulty applying the workflow provided. The issue is this:
I have two excel spreadsheets with thousands of data. One spreadsheet has names of people, state, address, country. The other table has names of people, products purchased, product category, sizes. I want to convert the product category, country, product category and size variables into numerical variables. When I join both tables, I don't know where the attach the workflow provided . Do I apply it to the join? Or do I apply the workflow to each separate table? The problem is that one of the tables and the combined table has one-to-many relationships. For example, one customer has bought products in different categories, etc. So the "product category" has fields where the categories repeat. How do I approach doing the conversion?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
It will be good if you provide some sampe data in an excel sheet for us to help us.
