This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
But there are still some pretty basic questions that remain. I guess not only for me, but maybe for other users that live outside USA or Canada, who cannot benefit of a CASS tool...
1) Like in the US, here in Brazil we can identify a state by is "full" name or by its two-letter code. Which one is best for using in geocoding?
2) Likewise, like in the US we can use a "compacted" ZIP code (a 5-digit number), or use the "full" version of it, composed of 8-digits. Which one is better?
3) In Portuguese we describe addresses by first mentioning the street name and then saying the street number ("Avenida Engenheiro Luis Carlos Berrini, 1426", for instance). Should I prep data in order to invert this (whenever is it possible), as if I would describe it in English (that case would become "1426, AvenidaEngenheiro Luis Carlos Berrini")?
4) Which processes of "cleanness" should be applied to data prior to submiting it to the geocoder? I usually trim excessive whitespaces, sometimes I apply the "UpperCase" function to "standardize" data with capital letters. Should I rip off punctuation and accents? For instance, I work in "São Paulo". Should I describe it as "Sao Paulo", without the tilde accent over the first "a" (that is a very common in Portuguese words)?
5) Does any of this really interferes with geocoder's ability to get lat/lon for an address?
6) In a broader sense, there is a lot of abreviations when dealing with addresses here in Brazil, I guess this happens all over the world. Just an example, we can mention "R." meaning "Rua" ("Street"). Is there a way of assessing (approximately, not much exact) a quality of an address component (like districts') in order to decide when is better to use it (or not) when submitting data to geocoder tool?
If someone is able to shed some light on these questions (at least the first 5...), I would be really thankful.
I received some feedback from one of our associates here at Alteryx and he offered the following response:
It seems that the geocoding works much better when using a single field address rather than the multiple field address option. With the attached spreadsheet of addresses for example, I was unable to geocode rows 1-6 by specifying multiple fields in the geocoder interface. I then created the last 5 rows by combining the previous cell data. Using the single field option, 3 records geocoded! Pay special attention to that format – the tool seems to like the ‘street, number, city’ format. It may be worth exploring a little more – this could be one of those ‘less is more’ situations.
Attached is the file he used for testing.
Here are my best guesses/estimates regarding your specific questions:
I don't believe there will be any difference with respect to full or abbreviated state names
I would also image that 8 digit postal codes would get you more accurate results
In very limited testing it didn't appear to matter if the street number was at the end or the beginning
I tested some addresses with and without the different accents and didn't get any different results
It wouldn't appear to
What I would strongly stress is that you run some tests with your own data, testing things like the scenarios you have mentioned. Also, as you mentioned, cleaning up the data as much as possible can go a long way.
I hope some of this has been helpful.
Thank you, Bruno.
Dan Chapman Program Manager, Customer Support New to the community? Get started here.
So, this is my favourite tool I have found thus far. I just have one small problem: it doesn't add the fields to my existing flow. It is a 'dead end' as far as the data goes.
Is there a way to modify this to add columns to the data so I can get the details of the sites I've encoded? I just need to pass-through the other data that go into the tool! Any tips would be great; I'm sure it's not that tough to do.
I am re-joining the data on the address string I created, but that seems an awkward step ripe for record explosion. Nevertheless, since I found a solution to my own problem, I wanted to post it so that others can do the same.