Inspire EMEA 2022 On-Demand is live! Watch now, and be sure to save the date for Inspire 2023 in Las Vegas next May.

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer and Intelligence Suite.
SOLVED

XLS report to clean table of data

YeahMan
8 - Asteroid

Hi Alteryx Community I have a doozie.

 

Please refer to the attached spreadsheet. I have 135 reports with the layout found on the 'Input Tab' and I would like to merge & create a workflow that will output the data as on the 'Preferred Output' tab for further analysis.

 

I'm not (yet!) concerned about looping through the directory to input the 135 spreadsheets as I have a workflow that does something similar, but I'm not even sure where to begin with this one. So far I have a an Input Data, a Sample Tool & a Text to Columns tool :)

 

Notes

1)   all 135 reports are .xls files

2)   the aggregates and each report are based on the Plan column in the 'Preferred Output' tab

 

Recommendations/assistance/ways to tackle this would be greatly appreciated.

Thanks,

Sue

2 REPLIES 2
kelly_gilbert
13 - Pulsar

Whew, @YeahMan, that is a doozie, but a perfect use case for Alteryx! 

 

Here are some thoughts on how to start breaking this down:

  • First, you need to get all of the files into the workflow.
    Do all of your files have exactly the same columns, or are some different?
    If they are all exactly the same column names, you can use a Directory tool to get a list of all of the files, and then feed the paths into a Dynamic Input tool to read in all of the files at once.

  • While you're building out the workflow, you can use that Sample tool (or Select Records) to temporarily limit the number of files feeding into your Dynamic Input, so it won't take so long to run while you're testing/building. You can also right-click on the Dynamic Input tool and choose Cache and Run workflow to temporarily cache the output of the tool (that way, you won't have to wait for it to run as you're testing the rest of your workflow).
  • You can use a Multi-Row Formula tool to "copy down" the metric names (such as Graduation Rate) into each row

  • I didn't see the Plan field in your input. Is that determined based on the row number (e.g, first row = P0101, fifth row = P0105)? If so, you can use another Multi-Row Formula tool to calculate that.

  • You'll need to fill in the missing field names. If your columns are always the same (e.g. first two cols are always Province, second two cols are always College, etc.) then you could just use a Select tool to rename them (e.g. Provice rate, Province count, etc.)

    If they're not always the same, you might need to use something like a Field Info and another Multi Row Formula to generate the missing field names, and then a Dynamic Rename to rename them.

  • You can use a Transpose tool to reshape the data and put the columns into separate rows.

  • At this point, you'll probably need a Formula tool to create the new column headers (e.g. first letter of the old field name + abbreviation of metric name + rate or count)

  • Finally, use Crosstab to reshape the data again, putting your new column name values into separate columns
YeahMan
8 - Asteroid

Wow @kelly_gilbert thank you very much for taking the time to answer my question with so much detail!

To answer some of your questions:

 

I was going to start small (one report) then worry about the directory load after. I have a workflow with a macro that does exactly this for, oddly enough, 135 XML files ;). I will look at the Directory/Dynamic input tools though, especially the Cache and Run workflow option!

 

Yes the output column headers will be the same for all 135 XLS reports, and the input column headers are all the same except for the merged cell H & J, currently labeled 'P0101'. The 'P0101' represents the Plan under consideration. There is one report for each plan and all the data in that report pertains to the plan. One of things I need to do is copy the 'P0101' into the Plan column for all rows in the current report. So, I can either get it from the merged cell H&J or parse it from the file name - which would probably be easier as the plan name is included as well. E.g. KPI SUMMARY - GRAD ONLY~P0101~DENTAL ASSISTING (LEVELS I AND II).xls

 

Again thank you,

--sue

 

 

 

 

Labels