Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

CSV import with differing column layouts

jay_chang
8 - Asteroid

I have a data file that is in CSV format.  It is basically equivalent to multiple tables stacked one after the other, with typically a blank row or two separating each table.  The tricky part is that the tables can have varying numbers of columns like:

 

 H1H2H3
Row1abc
Row2def

 

 H4H5H6H7H8
Row5 12345

 

When I read the CSV into Alteryx, it assumes that the format will follow the layout of the first table and doesn't pull in the additional columns of the 2nd table.  I have roughly 15 tables that are stacked in this manner.  Ideally, I'd like to read all the columns of all of the tables into a single Alteryx table and then manipulate the data via Alteryx.

 

Is there any way to do this easily?  I can say that each table will be either 12 or 13 rows of observations plus a header row for each table.  I'd prefer not to use a SKIP N tool (CReW macro) if at all possible.

 

TIA.

13 REPLIES 13
derekbelyea
12 - Quasar

 

Here is one way to tackle this problem.  I have used a single Excel file with varying matrix sizes but it is essentially the same as reading multiple CSV files.

 

There may be a more elegant solution ... usually is, but I prefer to keep things simple.

 

 

2018-02-26_00006.png  

 

The result this produces looks like this:

 

2018-02-26_00007.png

BenMoss
ACE Emeritus
ACE Emeritus

I've had a number of clients that have faced this issue and I always point them in the direction of this post. Within it there is a batch macro that should allow you to do this task with relative ease. 

 

https://community.alteryx.com/t5/Alteryx-Knowledge-Base/CS-Macro-Dev-Reading-in-multiple-files-with-...

 

All you need to do is have the full paths of each of the files you want to input, perhaps using a text input tool or even the directory tool and then plug this into the macro.

 

Ben

jay_chang
8 - Asteroid

The issue is not multiple files with different layouts - it's within the SAME file that I have different layouts.  I found that macro but it doesn't seem to allow me to read in structures that change within a file.

jay_chang
8 - Asteroid

how are you getting the flow to skip to the next table?  I was thinking something similar but didn't want to hard-code the row # to start import at.

BenMoss
ACE Emeritus
ACE Emeritus

Arr, okay sorry I misunderstood.

 

Could you post an example file so that I can look and try to build/suggest the right solution for you.

 

Ben

BenMoss
ACE Emeritus
ACE Emeritus

I've taken a go at what I believe you are after using your sample data, solution is attached.

It may not be right but It may point you in a better direction. 

 

Ben

jay_chang
8 - Asteroid

@BenMoss - can you explain how the workflow works?  I'm new to alteryx and I'm having trouble understanding why you set it up the way you did.  I am not sure that I could replicate it for my dataset.

BenMoss
ACE Emeritus
ACE Emeritus

In theory it should work with any set of tables with any set of headers within a given file.

If you could create a realistic mock up file of your data then perhaps we could test it on this to make sure it does.

First of all I input the data, specifically with the setting 'First Row Contains Field Names' unchecked. I do this because as you are aware we have multiple tables in this dataset, so the headers in the first row aren't necessarily the correct headers throughout.

I then use a record ID which I can use at various points throughout the flow to acknowledge the differnet rows in our data.

The first multirow formula tool is in place in order to tag 'header' rows, 'table' rows and 'remove' rows. 

A header row is indicated when the field above it is null and the current field is not null. A table row is indicated when the row above it is not null. If a line value is null then it is removed, i.e. it is a space between our tables, which we then filter out with the next step.

My next step is to use the multirow formula tool to indicate the table with which a 'header' and it's data belongs. Everytime a new row is flagged as a header the ID is incremented by one.

We then use a transpose, filter and join to bring the appropriate headers alongside the appropriate field values before cross-tabbing back our resulting table.

I appreciate this is probably difficult to track, my advice would be to run the workflow and track how the data changes at each point.


Ben

jay_chang
8 - Asteroid

My apologies - I've not had time to try out your solution but hope to do so later this week.  I just wanted to let you all know that I'm not ignoring you, just too busy to test this out.  Thank you for your assistance.

Labels