Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Converting zip code into number

ayadav8
8 - Asteroid

Hello,

 

I have a list of zip codes which I am trying to use in my predictive model. Converting zip codes into numbers through select tool makes 06001 into 6001. I need the starting zero in the number. Is there a way I can have them as numbers without losing starting zeros?

 

Thanks

6 REPLIES 6
BenMoss
ACE Emeritus
ACE Emeritus

You should not be using a zip code in a model which requires the variables to be continuous measures!

 

I suggest you look at other models which allow for you to use string fields as predictor variables. 

 

Ben

 

 

ayadav8
8 - Asteroid

@BenMoss it is a transportation problem where start and end destinations play an important role. I wanted to try a decision tree model which doesn't take string predictors.

BenMoss
ACE Emeritus
ACE Emeritus

The forest model does allow you to take in string fields though?

 

Irrespective of whether you want to include it in the model you must ensure you make the right model selection given the data you have, not change the data to suit the model!

 

Ben

BenMoss
ACE Emeritus
ACE Emeritus

Here's a post that may be helpful

 

'One of my favorite uses of zip code data is to look up demographic variables based on zipcode that may not be available at the individual level otherwise...

For instance, with http://www.city-data.com/ you can look up income distribution, age ranges, etc., which might tell you something about your data. These continuous variables are often far more useful than just going based on binarized zip codes, at least for relatively finite amounts of data.

Also, zip codes are hierarchical... if you take the first two or three digits, and binarize based on those, you have some amount of regional information, which gets you more data than individual zips.

As Zach said, used latitude and longitude can also be useful, especially in a tree based model. For a regularized linear model, you can use quadtrees, splitting up the United States into four geographic groups, binarized those, then each of those areas into four groups, and including those as additional binary variables... so for n total leaf regions you end up with [(4n - 1)/3 - 1] total variables (n for the smallest regions, n/4 for the next level up, etc). Of course this is multicollinear, which is why regularization is needed to do this.'

asilva
7 - Meteor

Hi ayadav8,

 

When using zip codes in modeling you want to treat them at a categorical variable. Even though they are numeric the number doesn't actually mean anything other than specifying the area an individual lives. This number isn't like Temperature, for example, when you increase or decrease Temperature it actually means a change in the amount of degrees (F or C). What happens when you increase or decrease a zip code? It does not mean you are changing the amount of something. I like what someone mentioned earlier about using median income or population to describe the zip code as a  numeric value.

 

-Tony

ayadav8
8 - Asteroid

@asilva Yeah I see what you saying. Thanks!

Labels