This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 05-03-201902:55 PM - edited on 06-19-201909:44 AM by ichand
Dan Farmer (The Information Lab)
Overview of Use Case
I needed a quick way of identifying UK companies that had been through a particular insolvency proceeding. The only way to get this information is to search documents which are filed at Companies House (UK registry of companies) for key terms.
Describe the business challenge or problem you needed to solve
This is a data preparation problem. It uses the Download tool to query APIs and download documents. This problem was a two-stage process. Firstly the way in which the Companies House API works: the API calls need to be made against the Company Registration Number, however the team only had a list of Company names. So the first process was to query the Company names to return their registration number. Secondly, the API call needed to be made against the Company filings and to then return the required document. This document was then saved onto Google Drive which using it's OCR technology made these documents searchable.
This was required to answer a specific client question. We needed to understand how many companies had been through a specified insolvency proceeding. The workflow has subsequently been modified to download a variety of document types.
Describe your working solution
Data: The data comes from an API call against Companies House which returns PDF documents
Alteryx offerings: Designer
Platforms or technologies: The PDF documents are stored on Google Drive which allows scanned PDFs to become searchable.
Deployment: The workflow is hosted on Alteryx Gallery which enables users to download the documents.
Describe the benefits you have achieved
Without using Alteryx this solution would have been put in the 'too hard' / 'not possible' bucket. However using Alteryx meant the workflow could be build in under one hour and the batch macro can retrieve the required documents at a rate of 1 document per second (this is only limited by the API call rate being limited by Companies House). By saving these documents directly to Google Drive and being able to search them meant that in under 2 hours (including the development time) we were able to download over 1,000 documents and identify those that met the search criteria. An exercise that would have previously required an analyst to locate and download each document which would have easily been over 2 days work (assuming approximately 1 minute per company to manual retrieve). The tool has been subsequently updated to retrieve more document types which has been rolled out to other teams to use saving many more hours!
How does Alteryx make you feel? For me Alteryx is a data analyst's Swiss Army knife allowing you to do anything from ETL processes to exercises like this. In my day to day use of Alteryx I'm blown away by all the things you can do with it and so far haven't found a data problem that I've not been able to use Alteryx for.
What are you most excited about when it comes to the future of analytics? I am excited to see analytics becoming more self-serve and commoditized as we are in a world of ever-increasing data so having tools that make investigating and analyzing this data more accessible is important.