Buying a house is the biggest decision most people have to take. I have decided this year to get on the property ladder, be a more responsible me and buy my first property. Renting has always made more sense in the past for the flexibility it provides; but as I’m quite settled in a city I quite love and in a company I enjoy being at, it made sense that it was time to make such a big decision.
With such a big investment come nervousness and stress, I want to make sure that the property that I’m buying ticks all the boxes that I have in mind. There is a lot of choice though, tough to really decide between properties that seem the same, or differentiating between all the available. Being a data geek, I want my investment to be a data-driven decision.
There are many websites where properties that are on sale are listed, some have more complete information than others; my first step was to find data, build a database of properties available in areas that are of interest for me and find my dream place within all the listings available.
I need to find a needle within a haystack, and what better platform to help me achieve that than Alteryx.
I could have gone through it manually. As an example, from such a listings website, the following URL provides me with the results of any search I’m interested it. If I’m looking for 2-bed properties within the region of SE1, I need to build my URL as follow:
https://www.[website].co.uk/for-sale/flats/SE1/?beds_max=2&results_sort=newest_listings&pn=1
in this instance the pn element represents the search page, while the other highlighted elements are self-explanatory. The URL may be slightly different based on the website you’re using for the search, but the structure should be the same
Once the result of that search is loaded, I can see the list of properties that satisfy my search. By clicking on each property, I’m able to find all property information in the relative page. The URL for the property page (for a different listing website) is structure as follow (The identifier being the unique string that identifies ant property listed):
https://www.[website].co.uk/properties/[identifier]/
Within the page, I have all elements that describe the property itself and that would allow me to decide to keep it or not in my shortlist of places to consider.
Programmatically speaking, I could be looking through the source code behind the search page (Ctrl+U), it’s then a matter of finding the right information needed, the list of property IDs for listings that satisfy the search condition I have. Going through a source code may be overwhelming though as (you can see below) it's a loooot of text.
Imagine doing that for multiple search criteria, looking through hundreds/thousands of pages to locate the piece of information needed. This is where Alteryx is making my life a lot easier, and why I resolved to use Alteryx to start. I can send multiple search strings, for locations and conditions I have in mind.
But wait, all this sounds super technical and super complex!!!
Well actually NOT; connecting to APIs and data on the web is very easy with Alteryx. Here is a quick tutorial that would introduce you to the main notions and how to set up such connectivity: https://community.alteryx.com/t5/Engine-Works/REST-API-In-5-Minutes-No-Coding/ba-p/8137
Following a similar process; using the Download tool, I can load the source code behind every single search and identify property IDs for listings that I’m interested in. I can then call multiple URLs and extracting all information that is required for every property.
I’m not going to go through the detail of how I generated my database as not to breach Terms of Use that many property listing companies have in place, but I would highly recommend getting in touch directly with those companies/website providers, enquiring if an API can be made available, and utilize the Throttle Tool within Alteryx as a way to limit the number of calls you’re making to such websites.
I would though highlight some of the Alteryx Tools that have helped me get there, and the method I followed.
In my case, I identified the list of Outer Postcodes I was interested in looking for properties at and used Alteryx Preparation Tools to build multiple search strings for each area and get the result of those calls using the Download Tool. I’m using as well the Throttle Tool to limit the number of calls I’m making on the website and ensure I’m not putting a lot of stress on that platform. I've limited it to 1 call per second.
With the results of those search strings available, I’m then using Preparation Tools to extract just the IDs of the properties and therefore build a second URL string that directs to the page of each of the listed properties. The Regex Tools afterwards allow me to extract the information I want from the HTML of each property.
Notice the Throttle Tool I used again to control the number of calls made so as not to put stress on the website.
This is what that full workflow looks like (for reference). It starts by Inputing the list of outer codes, followed by the first call with the Downloads Tool to extract property ID, the building of the URL String using a Formula and a second Download Tool to get the details for each property. My data can then be Output in any format I want. I'm triggering as well an Email, just to let me know that it has run successfully.
I've got myself a workflow that searches through hundreds of pages, extracts the information that I need and appends newly found information to my database. It would be perfect to have this process just run by itself and inform me of how it has executed. This is where I used Alteryx Server.
I published the workflow on my Alteryx Server and have it scheduled to run daily. It updates properties for which variables have changed and adds new properties at the frequency I specified. The fact of pushing the workflow to Alteryx Server allows me to focus on analysing data, as Alteryx then automates my data capturing process.
It ensures as well anyone that has access to my database and wants to use my data can be sure they're working with the latest data even when I'm away
I can follow the previous executions of the workflow and ensure I am aware of any errors that could have happened
This is awesome. I can now relax while the Alteryx platform manages my process and adds new data to my database as it comes. I can go on holiday without worrying anyone that needs the data and is authorized to use it won't have access to important workflows.
I have now data that I can analyze in any way that I want.
The data generated is stored in a table on my SQL database, and contains multiple variables (55 in total) I can use for my property hunt, it is a list of ~40,000 listed properties.
Now is the time for me to limit my search to properties that are of interest to me. I can easily using Alteryx Filter to focus on listings that meet my criteria such as the price of the properties (I don’t think I can afford that 16M flat in Chelsea), and the number of bedrooms and other variables within my data set.
Now I’ve only got 6,000 properties to look at… Nice!!
But still too much!!
I can actually explore my data further. I’m interested in just seeing where these properties are on the map, if they’re all in the same area, I should maybe just fit 6000 viewings. The Mapping Tools in Alteryx are here to help me plot my data and visualize it on a map.
I think I found my dream place; among the list of properties in my database, there is one located in the Indian Ocean; shining sun, sandy beaches and counts as London as that was my initial search string. It is the perfect place.
Or maybe my data is just not so perfect after all.
This just reminds us all of the importance of checking for data quality.
In this case the person in charge of inputting that listing on the website mixed Latitude and Longitude and therefore ended up referring everyone to an inexistent location. So I will just delete those Outliers, rerun my workflow and look at my data on a map again.
It does seem like my focus is all over the place.
I couldn’t actually stick to only one area around London; I don’t want to miss on a great opportunity by just focusing on one area. I need additional variables, additional criteria that I haven’t considered before but that would matter in finding my dream place. Using the string functions in the Formula Tool, I can ensure that I’m only keeping adverts that mention a balcony, terrace or garden because I don’t want to miss out again on a sunny summer in London.
I’m not a morning person
The more efficient my morning commute can be, the better.
We are all working currently from home because of the worldwide pandemic and the tough situation we are in. But I’m excited to the time when the world would come back to some sort of normalcy and we get to travel to work again. I just need to make that trip as efficient as possible and ensure that my future morning commute is as pleasant as possible.
Unfortunately the dataset I had only contains the latitude and longitude of the listings. In order to ensure I'm not committing myself to a lifetime of awful commute, I need to open my favorite Maps application, type the coordinate of every property, check directions to our Alteryx offices, and record both walking and commuting details. Life would be a lot more fun if there was a programmatic way to get that sorted.
I would need to find an API that can allow me to see how far each of those listings is from the Alteryx London Office.
Luckily for me, there is a good API to help me.
I can use the Bing Maps API.
The documentation is really good and it does work very smoothly with the Alteryx download tool.
If you’re using the API for your own personal project, it does provide you with 125,000 cumulative billable transactions per calendar year at no charge, in case you’re going to use the data generated from the API for a commercial purpose I would advise you to get in touch with the Microsoft Team.
Once you’ve set a Dev account with Bing Maps, you’ll be able to generate your own Bing Maps key that you can utilize in your workflows within Alteryx.
I just need to build my Bing Maps API call and ensure I follow instructions listed in the Bing Maps REST API Documentation.
To get commute between multiple points, I just the coordinates of the properties (available in my data set, so perfect), and the coordinates of our Alteryx London Office (if you're interested in visiting us, when things are back to normal, they are 51.527845, -0.090548).
Based on the Bing Maps documentation, I need to build a Walking Route query: https://docs.microsoft.com/en-us/bingmaps/rest-services/examples/walking-route-example
Such a query returns a JSON by default, but I can specify as well an XML as an output. It really depends on which format I prefer as both are supported by Alteryx.
Using the Formula Tool I just had to generate the URL that would be sent to the Bing Maps API via the Download Tool following exactly the same format specified in the documentation. This what my Formula Tool configuration looks like, for walking journey between any given property and our offices.
I chose as a result of this API request to get an XML Output, and would therefore need use the XML Parse Tool to interpret the resulting data. This part of the workflow look like the following
I’m running a different API Call for commute details, so as to know for each of the properties how long it would take me to walk versus to take public transport. For each of the properties I’m getting the following data detailing my future journey to work. It provides me with the time it would be taking as well as the detailed instructions to get there or the transports modes I will be using.
The result of this process adds that extra dimension that was missing. I'm getting those extra geospatial elements that would be critical in my decision. I am able to get for every single property, how far it is distance wise but duration wise as well from our offices when walking or taking public transport.
For an example on one of the properties, I'm getting the following result:
Property ID: XXXXXXXXX
Walking Distance: 6.732 Km
Walking Duration: 4840s (1h20min)
Walking Instructions:
Head south on Warwick Gardens toward Roseberry Gardens
Turn right onto Stanhope Gardens
Turn left onto A105 / Green Lanes
Turn right to stay on A105 / Newington Green
Bear left onto Mildmay Park
Road name changes to B102 / Southgate Road
At roundabout, take 2nd exit
Bear left onto Bridport Place
Turn right onto Mintern Street
Turn left onto A1200 / New North Road
Keep right to stay on A1200 / Vestry Street
Bear right onto Ebenezer Street
Turn left onto Westland Place
Arrive at Westland Place
Commute Distance: 7.435 Km
Commute Duration: 1800s (30min)
Commute Instructions:
Walk: From Source Location to Harringay
Inter-city Rail: Take Great Northern - Moorgate
Walk: From Old Street to Destination Location
Commute modes: Train Walk
Minimum Commute Time: 1800s (30min)
I can see exactly what my commute would be like. That’s perfect, now I can just look at properties that would ensure that my morning commute wouldn’t be exceeding 35min, and that the commute itself wouldn’t require more than 2 means of transportation, and filter out the rest. Easily achieved using Alteryx Preparation Tools and Parsing Tools.
I want to avoid taking the tube as long as possible, as I don’t want this to be my daily routine.
I have actually narrowed my search to merely 50 properties, which made organizing viewings a lot easier. Among those properties I found the one place that would certainly make a good home. Certainly, narrowing down my search scope with Alteryx has made me more confident of my decision and has turned a hectic process into a pleasant project. I used Alteryx as well to compare what average prices are in general in different areas ensuring that the asking price of the one I like is within the average range.
Answering one analytical question usually takes us to needing to answer a lot more.
Luckily, that's easily done.
Certainly my future commute to work would look a lot similar to this instead.
Now there is bureaucracy to get sorted, it seems like finding the right place is just the first step in so many. There is all the stress of getting documentation ready, working with solicitors, sorting searches and surveys.
I’m sure if they were all working with Alteryx, this would be a 1-month process rather than the 4 months it will be taking.
I can unfortunately not share the listings database I have created nor can I provide the details of the workflow I used to generate it. But I want to stress it is possible with a combination of Download, Throttle, and different Preparation and Parsing Tools.
I have converted my Bing Maps workflow into a Macro and have it attached here.
Feel free to make changes to it and adapt it to your own process ideas.
The Macro itself accepts one single input, that's your file containing all locations you're interested in
There are 3 parameters to change within the Macro:
1. Your Bing Maps API Key (as specified above)]
2. The destination you want to compare all your list of locations to
3. The minimum commute Time
The output of the macro is the commute/walking information specified above.
I would be very interested in hearing how you do utilize and what geospatial questions this helps you solve. I hope that my house search experience inspires you to solve some issues and problems you're currently encountering. Do share your stories too 🙂
Thank you & happy solving,
Mo
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.