An attempt at building an offline OSM geocoder

I'm still a relatively novice users.

As many of you know, openstreetmap is a free mapping service that offers both map layers and geocoding services (through its nominatim web services).

Although free, the drawback is that you cannot do bulk geocoding from it as it would overload the servers and get you banned.


As commercial alternatives can get very expensive pretty fast, an alternative would be to install a nominatim instance on a web server but that requires non-trivial IT support.


Luckily, Openstreetmap data is available for download at

OSM files are basically a kind of XML and this starts our journey at data exploration...




As you can see, opening the file as XML does result in coherent data being read (in my case I'm using the Italian osm dataset).

From what I found so far the tag_*_k fields are what's represented in latitude and longitude, using contains("city") or contains("addr") gives you some addresses although with kinda messy formatting.


As the source file is pretty big, it's strongly advised to use partial reads for initial data exploration.


Has anyone here attempted anything similar?


This topic will be updated with any further progress I will be able to make.


Hi @marco_zara


This is an awesome idea, I am looking forward to hearing more! 


I haven't had much experience with OSM, aside from a couple of data downloads.  In my free time, I've been experimenting with the address data available at In the US, there are around 150 million parcel centroids, and an extensive amount of international address data is also available.  


Keep us posted on your progress. 



I haven't been able to put in as much time as I'd like, however I made some progress.


I converted the OSM file to an yxdb without many changes, basically I just filtered out rows without tags.

After that, what I'm attempting to do is to isolate tag combos that can give us city name and street + civic number:osm part 2.png



It's definitely not the most refined workflow but I still haven't found an equivalent function to Excel's Horizontal Lookup, so any suggestion on how to do this a smarter way would definitely be welcome.

I also did some quick calculations and I'm pretty sure that either the osm dataset is incomplete or I'm missing something, because the maximum possible addresses are short of 2 million VS the roughly 30 million they should be.


It could perhaps be related to the XML format but this will be for later investigation...

Ok, I found a somewhat less precise but definitely easier to handle solution.


Geofabrik (and a few other planet osm mirrors) has a list of openstreetmap files that gives road shapefiles along with other points of interest:

Unfortunately in my case it did not have the city name along with road but that's easily fixed using as most governments have shapefiles with city boundaries so you can easily associate city and street, making possible to have a handy coordinates database.