Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Maveryx Success Stories

Learn how Alteryx customers transform their organizations using data and analytics.
STORIES WANTED

Showcase your achievements in the Maveryx Community by submitting a Success Story now!

SUBMISSION INSTRUCTIONS

Creating The Next Generation of Data Superstars with Alteryx

SiWalks
5 - Atom
kubrick-logo-1200x440.pngName: Simon Walker
Title: Managing Partner
Company: Kubrick Group
Collaborators: Darren Tran
 
Kubrick + Alteryx Introduction: 

 

We founded Kubrick to offer organisations another way to solve their data challenges. We do this by taking on the best junior professionals and preparing them in our data labs to create the next generation of data specialists. 

 

Only 1% of applications pass our rigorous assessment process. Our unique training programme equips our consultants with the latest skills in data, becoming highly skilled professionals trained in the latest technology and data methodologies.

Following the training course, the consultants stay with Kubrick for 2 years and are deployed client-side offering clients a highly skilled, low-risk, cost-effective consultancy solution.

 

On completion, the consultants become certified as a Data Engineer, Data Analyst, Data Scientist or Data Governance Specialist.

 

The core training programme for Data Engineers before they help our clients solve their data challenges includes:

 

- Data Pipelining, provenance and security

- Modelling on RDBMS and Data Lake technologies

- AWS and Azure cloud skills

- Hadoop Eco-System technologies

- Spark in a clustered environment

- Advanced Python

- Agile Continuous Integration (CI)

- Advanced SQL

- Cutting edge Data Narrative techniques

- Advanced Alteryx

- Data Visualisation: Ploty, Dash, Tableau

- Document and Graph Databases – NoSQL

 

Our clients not only use us a highly skilled data consultancy, due to our unique model, they use us as a staff augmentation model. They have access to highly trained, skilled and polished consultants, where they have the option of adding them to their workforce.

 

Our staff love us, as we take the risk up front and they get to learn the best data technologies and tools, such as, Alteryx. Their career is then accelerated into some of the world's most innovative and exciting organisations.

 

Kubrick was established in 2016 and currently employs 156 people

 

What is the initial business problem that you solved with Alteryx? 

Kubrick has a unique view of the role played by Alteryx. It was fundamentally addressing the large talent gap that exists in data!

 

From a preparation perspective, we develop our staff in the Kubrick Data Processing Lifecycle. This is an intense training period, towards the end we needed a tool that allowed rapid results through this whole process. Alteryx covers the complete life-cycle for us:

 

Data Lifecycle.png

Kubrick required a tool which would be easy to collaborate on, which was fast to pick up (it wasn’t all SQL Loader!) and a tool that allowed our junior consultants to hit the ground running on client sites and win results fast.

 

We first used Alteryx with an international fund services company. Kubrick was tasked with making their client fund reports more intelligent, self-serving and to show valuable insights which they had not been able to perform before. Since then it is our data tool of choice within Kubrick data labs.

 

What Alteryx products, data sources/formats are you using?

 

  1. Alteryx Designer
  2. Alteryx Scheduler

 

What value or impact is your organization experiencing as a result of your solution? 

 

The impact Alteryx has had on Kubrick has enabled our highly skilled consultants to be more productive, quicker. We are solving the “2-year experience conundrum” that exists for many junior professionals looking to enter the data industry. This manifests itself through organisations wanting new talent with some data experience and that same new talent not being able to break into the industry as more often than not the minimum requirement is often 2 years experience and this cycle continues. Kubrick breaks this by investing and training our consultants up front, in specific data skill sets that add value to organisations from Day 1. Therefore, partnering with Alteryx was an obvious choice for Kubrick. A tool that could be used with a small amount of experience, yet yielded incredible results.

 

We are experiencing a huge demand for our consultants and more than 95% of this demand is generated by returning clients. Our first instance of Kubrick using Alteryx at one of our clients, the global fund services company validated why Kubrick needed Alteryx;

 

- “We started by utilising various command line tools, then Kubrick utilised Alteryx. We quickly went from producing 1 proof of concept to now running 10 concurrently at any time.”

 

- “Another benefit is the flexibility Alteryx affords us. We work in an agile environment and can now handle data and value changes, without being disruptively hard to do” Darren Tran, Kubrick Data Engineer.

 
Use Case | Machine Learning Predicting Insurance Claims on Critical Illness:

 

Initial business problem:

 

The team was tasked with producing a machine learning model in order to enhance the ability of a large global underwriter to accurately predict the number of claims made on critical illness cover– 65 critical illnesses were covered by the insurance. The approach taken involved collecting an array of publicly available data and combining it in a data warehouse from which analysis and modelling could be done. These data sets included local demographic data, health data, national census data, shapefiles, text mining and more.

 

Almost all of these datasets required transforming to fit the schema of the warehouse. Furthermore, because we were combing disparate datasets we had the problem that geographic columns did not match.

 

For our task, we required data for each Clinical Commissioning Group (CCG) in England – a geographic region defined by the NHS. However much of our raw data was at a more granular level of detail, in geographic regions that do not map exactly to the CCG. Additionally, CCGs change over time and much of the data that was already at CCG-level needed converting to the most up-to-date form.

 

To load our datasets into the data warehouse we required a tool that enabled a quick and easy transformation of the data and could overcome our geographic misalignment problem.

 

The Working Solution | Geographic Aggregation:

 

Geographic data at the level of county and local authority required conversion to CCG. To aggregate these figures to CCG level, a table was used which contained population data at LSOA level and had columns for the county, local authority and CCG that the LSOAs resided in. This was used as a bridging dataset between the other geographies. The data was aggregated by calculating the population of each CCG and the total number of people that each of the areas contributes to a CCG. This meant that the values associated with those areas could be weighted by population and aggregated across the CCGs. The results were then unioned together and aggregated again by the same process to produce one figure for each CCG.

 

Dia1.png

 

Figure 1: Geographic aggregation

 

 

Pharmaceutical data

 

A data set important to the project was NHS pharmaceutical data dating back to 2010, which shows the number of drug prescriptions in every GP practice. Geographic aggregation allowed this to be summed to CCG-level but this dataset required string manipulation to make it more meaningful.

 

For its effective utilisation, string manipulation was conducted within Alteryx to group prominent non-generic and generic variants of the drugs. The drugs data required sections of the string names to be selected up to a specified index and generic drugs required sections of the strings to be removed after two different delimiters, an underscore and space.

 

Dia2.png

Figure 2: String Manipulation of pharmaceutical data

  

General data transformations:

 

Alteryx’s various import and transformation tools were used to handle data downloaded in various forms, engineer it and upload it into a SQL Server warehouse.

 

The simple workflow below is a good example of how Alteryx was leveraged to make a daunting task relatively straightforward. This data import included 91 CSV files, which together consisted of approximately 1 billion rows. The Directory and Dynamic Input tools were used to import these simultaneously. This large table was then modified slightly and filtered before being uploaded to the data warehouse.

 

A similar process was undertaken on about a dozen other data sets in order to transform them and load them to the data warehouse in a single step.

 

Dia3.png

Figure 3: Loading data into SQL Server

 

 

Describe the benefits you have achieved:

 

Alteryx enabled transforming and loading of data to be undertaken in one fell swoop and complex transformations to be undertaken straightforwardly and systematically. String manipulation tools such as Data Cleansing for removing whitespace made basic engineering steps quicker. The dynamic import feature also saved a great deal of time, allowing 91 files to be imported almost as easily as a single file.

Attachments
Comments
Atabarezz
13 - Pulsar

loved the examples. great work!