Hi Pro User,
I have a list of cities with names that need to be corrected - extraneous characters after names, wrong spelling, etc. - based on a list the Lookup_Table. Below is a file with two tabs - the Data tab that shows the different stages of cleansing and the Lookup_Table tab that provides the correct info.
In the Data tab, Step 0 is a list of data (a-c city names). Step 1 peels out those items that have the 5 digit zip code. Step 2 removes everything starting with a character after a name. Step 3 looks up the Lookup_Table to find the correct spelling of the city names. Step 4 makes an additional correction for those with incorrect city names.
Here are the steps:
| STEP 0 | STEP 1 | | STEP 2 | | STEP 3 | | STEP 4 | |
| ORIG DATASET | REMOVE ZIPCODE | POSTAL | REMOVE CHARS | POSTAL | TABLE LOOKUP | POSTAL | CORRECTION | POSTAL |
| ***WILL CALL FOR JOBSITE | ***WILL CALL FOR JOBSITE | | | | | | TBD | |
| 0000 | 0000 | | 0000 | | #N/A | | TBD | |
| 29 PALMS, CA | 29 PALMS, CA | | 29 PALMS | | #N/A | | TBD | |
| 93664 | 93664 | 93664 | 93664 | 93664 | #N/A | 93664 | TBD | 93664 |
| ADELANTO | ADELANTO | | ADELANTO | | ADELANTO | | ADELANTO | |
| ADELANTO, CA | ADELANTO, CA | | ADELANTO | | ADELANTO | | ADELANTO | |
| AGOURA | AGOURA | | AGOURA | | #N/A | | AGOURA HILLS | |
| AGOURA HILLS | AGOURA HILLS | | AGOURA HILLS | | AGOURA HILLS | | AGOURA HILLS | |
| AGOURA HILLS, CALIFORNIA | AGOURA HILLS, CALIFORNIA | | AGOURA HILLS | | AGOURA HILLS | | AGOURA HILLS | |
| AGUANA | AGUANA | | AGUANA | | #N/A | | TBD | |
| AHAMBRA | AHAMBRA | | AHAMBRA | | #N/A | | ALHAMBRA | |
| ALHAMBRA | ALHAMBRA | | ALHAMBRA | | ALHAMBRA | | ALHAMBRA | |
| ALHAMBRA CA | ALHAMBRA CA | | ALHAMBRA | | ALHAMBRA | | ALHAMBRA | |
| ALHAMBRA CA. | ALHAMBRA CA. | | ALHAMBRA | | ALHAMBRA | | ALHAMBRA | |
| ALHAMBRA, CA | ALHAMBRA, CA | | ALHAMBRA | | ALHAMBRA | | ALHAMBRA | |
| ALHAMBRA, CA 91803 | ALHAMBRA, CA 91803 | 91803 | ALHAMBRA | 91803 | ALHAMBRA | 91803 | ALHAMBRA | 91803 |
| ALHAMBRA, CA 91806 | ALHAMBRA, CA 91806 | 91806 | ALHAMBRA | 91806 | ALHAMBRA | 91806 | ALHAMBRA | 91806 |
| ALHAMBRA, CA. | ALHAMBRA, CA. | | ALHAMBRA | | ALHAMBRA | | ALHAMBRA | |
| ALHAMBRA, CALIFORNIA | ALHAMBRA, CALIFORNIA | | ALHAMBRA | | ALHAMBRA | | ALHAMBRA | |
| ALHAMBRA,CA | ALHAMBRA,CA | | ALHAMBRA | | ALHAMBRA | | ALHAMBRA | |
| ALISO VIEJO, CA | ALISO VIEJO, CA | | ALISO VIEJO | | ALISO VIEJO | | ALHAMBRA | |
| ALTADENA | ALTADENA | | ALTADENA | | ALTADENA | | ALTADENA | |
| ANAHEIM | ANAHEIM | | ANAHEIM | | ANAHEIM | | ANAHEIM | |
| ANAHEIM, CA | ANAHEIM, CA | | ANAHEIM | | ANAHEIM | | ANAHEIM | |
| ANYWHERE, CA | ANYWHERE, CA | | ANYWHERE | | #N/A | | TBD | |
| ANZA | ANZA | | ANZA | | ANZA | | ANZA | |
| APPLE VALLEY | APPLE VALLEY | | APPLE VALLEY | | APPLE VALLEY | | APPLE VALLEY | |
| APPLE VALLEY, CA | APPLE VALLEY, CA | | APPLE VALLEY | | APPLE VALLEY | | APPLE VALLEY | |
| ARCADIA | ARCADIA | | ARCADIA | | ARCADIA | | ARCADIA | |
| AS REQUESTED | AS REQUESTED | | AS REQUESTED | | #N/A | | TBD | |
| AUBERRY | AUBERRY | | AUBERRY | | AUBERRY | | AUBERRY | |
| AVALON | AVALON | | AVALON | | AVALON | | AVALON | |
| AVALON, CA | AVALON, CA | | AVALON | | AVALON | | AVALON | |
| AVALON, SANTA CATALINA IS | AVALON, SANTA CATALINA IS | | AVALON | | AVALON | | AVALON | |
| AZUSA | AZUSA | | AZUSA | | AZUSA | | AZUSA | |
| BAKERSFEILD | BAKERSFEILD | | BAKERSFEILD | | #N/A | | BAKERSFIELD | |
| BAKERSFIELD | BAKERSFIELD | | BAKERSFIELD | | BAKERSFIELD | | BAKERSFIELD | |
| BAKERSFIELD, CA | BAKERSFIELD, CA | | BAKERSFIELD | | BAKERSFIELD | | BAKERSFIELD | |
| BAKERSFIELD, CA. | BAKERSFIELD, CA. | | BAKERSFIELD | | BAKERSFIELD | | BAKERSFIELD | |
| BAKERSFIELS, CA | BAKERSFIELS, CA | | BAKERSFIELS | | #N/A | | BAKERSFIELD | |
| BAKERSFILED | BAKERSFILED | | BAKERSFILED | | #N/A | | BAKERSFIELD | |
I think I can do steps 1-3, but then, I'm not clear on how to approach steps 3-4. Could someone please guide me on how to approach this problem? Appreciate it.
Thanks,
kwl