Hello,
I've been unable to accomplish the following task with joins, unions, and the find a replace function.
Task
Table 1 contains 70, 000 records of customer orders and each record contains a zipcode with other fields (multiple records can have the same zipcode while all the other fields will be different)
example of Table 1 records (72,000 total records):
Customer # | Order # | Zip code |
0001 | 10001 | 30071 |
0002 | 10002 | 30087 |
0003 | 10003 | 30071 |
0004 | 10004 | 30051 |
Table 2 contains 50,000 records and each records contains just a zipcode, FIPS (Federal Information Processing Standards) code, and a Rural/Urban Code (i.e. "1"= rural area with pop less than 20K, "2" = metro area with pop greater than 100K) The zip codes in table 2 are unique, but multiple zipcodes can have the same Rural/Urban code (i.e. zipcodes 30071, 30087, and 30067 have the Rural/Urban code of "1"
Example of Table 2 records (50,000 total records)
Zipcode | FIPS | Rural/Urbanl Code |
30071 | 0087 | 1 |
30087 | 0082 | 1 |
30077 | 0081 | 2 |
What I want is
Table 1
Customer # | Order # | Zip code | Rural/ Urban code |
0001 | 10001 | 30071 | 1 |
0002 | 10002 | 30087 | 1 |
0003 | 10003 | 30071 | 1 |
0004 | 10004 | 30077 | 2 |
What I get is this (using "find and replace" with the "entire field" value checked and "Append Fields to Record")
Customer # | Order # | Zip code | Rural/ Urban code |
0001 | 10001 | 30071 | 1 |
0002 | 10002 | 30087 | 1 |
0003 | 10003 | 30071 | 1 |
0004 | 10004 | 30077 | 1 |
The mapping of Zip codes to Rural/Urban code in table 1 is incorrect. The Rural/Urban codes range in values from 1-9, but when I use the find and replace function not all the values are used. The total number of records (72,000) is correct but the Zip codes to Rural/Urban mapping is wrong.
When I use the join and union function the mapping is correct but the number of records is wrong. After I run the "join" and "union" function I get a total of 115,000 records instead of the original 72,000. I've attached a copy of the Rural/Urban code lookup table. The correct tab is FIPS to Zips. Please let me know what I should do.
Solved! Go to Solution.
It seems to me if you join these two tables on Zip code you should get what you want. The only thing that would mess this up is if there are multiple rows in table 2 for the same zip code.
You should only be "Unioning" Table 1 side and the joined.
Double check to see if there are multiple rows per zip in table 2. I did it with your sample data and it was correct(attached).
Best,
MSalvage
MSalvage,
You are right. Table 2 does contain duplicate zip codes. I downloaded table 2 from the U.S. Department of Housing and Urban Development website and after reading your comments and doing more research I found out that a 1 zip code can be associated with multiple Rural/Urban codes because a zipcode can belong to two counties. So basically, 1 zip code can be associated to multiple Rural/ Urban codes and multiple zip codes can be associated to 1 Rural/Urban code.
Do you have a hierarchy you are supposed to follow? If so I would just go by that.
If it is important to have all the rural/urban codes you can do some manipulation to make them separate fields(attached example). This would allow you to keep all the data but still end with your 72,000 rows.
Not Totally sure what the goal is but I hope this has helped.
Best,
MSalvage
MSalvage,
Thank you this has been a tremendous help. The end goal is to try to predict whether an automobile part will be delivered late or not using logistic regression. I believe the urban/rural value of a customer impacts lateness which is why I wanted to included it as an extra field to run the regression on.