Hi, so I'm reviewing some data for hospitals, and not to worry, this is all publicly available data, as it is a legal requirement for it to be. My problem is two hospitals, UPenn and UMich have put their data in a format they aren't supposed to (but there's really no punishment for this unfortunately).
I don't have a good way to explain the problem, as I have trouble putting it into words, so the files are attached. But basically UPenn and UMich decided to expand their data across 600+ columns instead of making it into two columns you can filter on. Payers (insurance) have negotiated rates they offer services at, and you SHOULD be able to filter on a Payer and Plan Type column to filter to Payers like Cigna, and then their commercial or medicare or whatever plan to see what the rates are for services.
What UMich and UPenn did was spread this across a buttload of columns and make the data absolutely hideous to read and filter on. I've tried numerous solutions over the last few days to try and fix this, and then I remembered this community exists. I'm not particularly good at cross tab, so this could be a "skill issue" on my end. I'm hoping someone can help me figure out how to crosstab or pivot or whatever this data so that UMich and UPenn look like the provided example of "RR" which is Ronald Reagan hospital.
Few questions:
I don't believe the delimiters in the files themselves are | , but the headers are separated by pipes, so I'm not 100% sure.
As far as I can tell, UMich and UPenn are consistent. Out of all the hospitals I'm working on, these are the only 2 I have in this particular format. The rest follow the other "goal" example I provided.
One more question on the column headers itself, it seems that the first value before “|” is the “main” header and everything after is extra value that appears in the data if it has a “|” also (can’t prove it as I’m on my phone).
Is that the case? Getting more info here so I can attempt it later if I have time. Maybe it’ll help others here as well.
Hopefully this is helpful, it's a bit hard to explain to people who don't work in healthcare (probably). The headers are my point of frustration for sure and are what's stopping me from figuring this out myself.
First, I did double check, the 2 files are comma separated.
Second, I'm going to try and talk through the logic I think applies here based on other hospital data
Example of what is currently a header:
standard_charge|Blue Cross Blue Shield of Michigan|Medicare Advantage|negotiated_dollar
As best I can tell, the prefix and the suffix here should be one header in of themselves in this example, Standard_charge - Negotiated Dollar is how I would split that out.
And then Blue Cross Blue Shield of Michigan is the Payer Name
Medicare Advantage is the Plan Name
So to me it's like this: (1/2 of one header) | Payer Name | Plan Name | (Other 1/2 of the one header)
So once this is "fixed" you should have 3 headers, Standard Charge - Negotiated Dollar / Payer Name / Plan Name
A problem is there are some headers with only 3 Pipes like this: estimated_amount|Blue Cross Blue Shield of Michigan|Medicare Advantage
So I think this example should be: Estimated Amount / Payer Name / Plan Name
It's getting quite late for me, but I came up to here:
For UPenn, I've identified 615 columns, and checked your data and the 614 ~ 615 columns are consistent with the headers. But I just need you to clarify the headers again...
Scenario 1
Your second header is code|1 (Column 2), what does it mean? Code - 1? It corresponds to value: 0000100449J0740 (Column 2)
Scenario 2
standard_charge|Aetna Commercial|Aetna Commercial|negotiated_dollar -> This sounds like your example, but can you show me what it means and how to read it?
In the raw data (non-header), Column 22 is empty. So I suspect the data is in the column itself?
@WishIKnewHowToCode I've attached my workflow here in case you want to continue. But I am looking to split it based on what starts/stops on the delimiter and if it is present IN the data portion (non-header). From there, you should be able to get a complete set of your data. Then it's just filtering from there.
I think I should explain again what I'm trying to accomplish as a whole here. I'm trying to analyze what certain payers are charging for certain procedures at hospitals. The problem is that UPenn and Umich don't have their payers consolidated into 2 columns like everyone else does. This is where I'd refer you to the "RR" Example file to see what these files are supposed to look like.
Scenario 1
Your second header is code|1 (Column 2), what does it mean? Code - 1? It corresponds to value: 0000100449J0740 (Column 2)
- This is the code for the procedure. So for example, Code 016 is a bone marrow transplant. This field is needed for my analysis, but I don't think will affect what we're trying to do here in alteryx/
Scenario 2
standard_charge|Aetna Commercial|Aetna Commercial|negotiated_dollar -> This sounds like your example, but can you show me what it means and how to read it?
- This file is about what is called "price transparency" for hospitals. The government requires hospitals to publish what they charge for procedures and how they charge by payer.
- This header is the problem with the file. Almost all of the headers are the problem. I'll try to explain this again - The data under this header is fine. The data in the column is the "Standard charge - Negotiated Dollar" for the Payer, Aetna Commercial, and the Plan is also Aetna Commercial.
What I need is to Create a handful of columns and transpose or crosstab all of the data in most of the column of this file into the handful of columns.
Here's what I'm trying to say, and I know I'm not doing the best job of explaining it, and these 2 files are terrible data. This is what the Columns currently look like, and below is what they need to be condensed down to. Keep in mind, your example has a payer and plan that happen to share a name. Most of these look like "Blue Cross Blue Shield | Medicare Advantage Plan"
Current Data is in this format, and it repeats for every payer the hospital has. I have included two payers worth of column headers.
standard_charge|Aetna Commercial|Aetna Commercial|negotiated_dollar | standard_charge|Aetna Commercial|Aetna Commercial|negotiated_percentage | standard_charge|Aetna Commercial|Aetna Commercial|negotiated_algorithm | standard_charge|Aetna Commercial|Aetna Commercial|methodology | estimated_amount|Aetna Commercial|Aetna Commercial | additional_payer_notes|Aetna Commercial|Aetna Commercial | standard_charge|Aetna Health Inc|Aetna HMO|negotiated_dollar | standard_charge|Aetna Health Inc|Aetna HMO|negotiated_percentage | standard_charge|Aetna Health Inc|Aetna HMO|negotiated_algorithm | standard_charge|Aetna Health Inc|Aetna HMO|methodology | estimated_amount|Aetna Health Inc|Aetna HMO | additional_payer_notes|Aetna Health Inc|Aetna HMO |
Data needs to be transformed like this (again the RR file shows what the data is supposed to look like if the hospital had done it correctly in the first place)
Aetna Commercial | Aetna Commercial | "Standard charge - Negotiated Dollar" | "Standard Charge - Negotiated Percentage" | "Standard Charge - Negotiated Algorithm" | "Standard Charge - Methodology" | Estimated Amount | Additional Payer Notes |
Aetna Health Inc | Aetna HMO | "Standard charge - Negotiated Dollar" | "Standard Charge - Negotiated Percentage" | "Standard Charge - Negotiated Algorithm" | "Standard Charge - Methodology" | Estimated Amount | Additional Payer Notes |
User | Count |
---|---|
107 | |
85 | |
76 | |
54 | |
40 |