Problem Statement
As part of address matching and analysis, Ordnance Survey data is required. Initially, this involved downloading the entire UK dataset, which was taking time & effort to run. To save time, we switched to downloading just Scotland instead of the whole UK. However, this change meant having to download the individual 5km x 5km tiles of address data for Scotland. These came as zipped files, comprising one ‘main’ zip file, but within that were 3998 individual zipped files. The ‘main’ zip file took ~2 hours to unzip before it was even possible to get to the next files and unzip them. The initial view was that each of the 3998 files would need to be unzipped to extract the data.
Alteryx Solution
Initially, we realised that whilst Alteryx can handle a Zip file, it couldn't natively handle nested Zip files, so we began looking for options. Utilising some prior knowledge and the Alteryx Community, it was identified that Alteryx could be used for calling the Command Line to perform the unzipping action, and then the data is in the right place to utilise other Alteryx features to assess the files for relevancy, bring them together based on requirements such as needing to separate out Lothian from Scotland and then perform the prep and transformation activities all in in one place.
Solution Impact & Time Savings
Using Alteryx, we were able to unzip 3998 individual zipped files and then manipulate them into the format that was needed in order to perform analysis against properties and create a database of addresses to reference in other workflows, a critical component of required work.
Without Alteryx, the unzipping process for each of the files would not have been possible manually. But when comparing the initial task of UK-wide data, which had a different Zip structure:
Original time: ~1 day of work every 2 months
New time: 12 minutes to run the workflow