Modeling the impact of COVID-19 on healthcare systems with Alteryx
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Notify Moderator
Starting with hospital data, we have sourced hospital data in a JSON format. Using Alteryx we are able to extract this data into a list of 6,700 institutions with their staffed capacity, bed capacity and utilization rates all with geo-coordinates:
To get COVID case data, we are using Alteryx to connect directly to the John Hopkins University data and convert their time series file into an analytics ready format. Vantage uses Snowflake with Azure in this case and built a custom bulk loading macro to load data in bulk to snowflake via Azure blob storage.
Next , Vantage extracted details on cases from Italy and from the state of Florida where patient details including demographics are available. From this data, Vantage has built a model using the Alteryx logistics regression algorithm to determine probability of hospitalization and ICU admission by gender and age. Using data from Italy, Vantage was able to model the length of hospitalization stay required.
Using US census data from each county on demographics, we are able to determine for each county what is the expected hospitalization rate, and ICU rate per confirmed case.
Next onto our hospital data, we can bring in ICU beds and hospital beds for each county in the US. This represented an initial challenge as some counties in the US do not have hospitals and many do not have ICU facilities. To account for this, we used the spatial distance tool and grouped the case count and expected hospitalization rate per case for each county to the nearest facility (either ICU or normal hospital).
Armed with this data, we can now use utilization estimates for hospital capacity pre-Covid and take live case data combined with our modelling on expected hospitalization rates in each county (and surrounding counties in cases where no ICU or hospital capacity exists) to create a live model of critical care shortages and or surpluses. We have used Tableau as our output tool and created some dynamic parameters where a user can change the length of an average hospitalization stay, or change assumptions on the pre-covid bed availability to account for suspension of elective or non urgent procedures.
- As a result of publishing this study, we have had several inquiries from hospital groups, national healthcare schemes and governments (local and national) interested in extending this analysis to their specific countries or geographies. We hope that our efforts and ideas can help other citizen data scientists use their creativity to make other valuable insights that can directly translate into benefits for those who are the front line fighting this pandemic.
- Alteryx has made complex analysis a reality for many of us who are not expert coders which gives me the feeling of empowerment. This use case was compiled end to end in less than one week and involved pulling data from many locations to augment a standard data set.To do this analysis without Alteryx would have required significant coding efforts in numerous platforms with extensive use of various libraries for data engineering (JSON extracts), modeling, geospatial and data manipulation.
- The emerging data exchange market is one of the most exciting developments in the analytics space. This is being led by Snowflake Data Exchange and AWS marketplace where datasets are being made available (under free, paid or limited use) licensing models. These datasets are extremely rich in content and availability is growing exponentially. However the datasets are far from standard and will require data analysts to be creative and flexible in order to maximize the benefits as data comes in all forms. In this use case alone we got geoJSON, JSON, Parquet, Csv and Excel files - which were sourced from ftp sites, API's, s3 buckets, databases and emails.
-
Analysts who are prepared to use tools like Alteryx and become experts in bringing together these valuable but varied and disparate data sources and combine them with their own proprietary data will be able to create massive value for their organizations.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Notify Moderator
This is very impressive! I was wondering what data you used for COVID-19 cases in Italy, and where you extracted it from. Thanks