Let’s talk Alteryx Copilot. Join the live AMA event to connect with the Alteryx team, ask questions, and hear how others are exploring what Copilot can do. Have Copilot questions? Ask here!
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Importing Multiple CSVs without standardized headers and data different rows

bchen1
6 - Meteoroid

Hello,

 

I have a directory of 100 CSV files and would to combine them into one table for analysis. There are a few issues with the underlying data.

  • They are originally Excel sheets, converted through Python to CSV files to circumvent the 1904 date system import issue
    • This leads to the "Unnamed" columns and the numbering the leftmost column in the table below
  • The CSVs do not have a standard format and the right column headers sometimes start on row 8 or 9 or 10
  • The table below shows the general format for the CSVs where we have "Unnamed" headers in the first row, but the true "header" I want starts in Row 8
    • As mentioned, some CSV files start on row 9 or 10, so I'd like to dynamically be able to read/import based on the row where "Product ID" is found & remove all rows above it, but keep all rows below
 Unnamed: 0Unnamed: 1Unnamed:2Unnamed: 3
0    
1    
2    
3    
...    
8Product IDProduct NumberNameAddress
91232 XX 

 

I've tried utilizing a batch macro to bring in all the CSVs but the columns will never line up and it ends up being completely disjointed. The goal is an output table like the below:

 

Product IDProduct NumberNameAddressCSV File Name
1232 XX File1
1233 XYZ File1
1234 ABC File2
12345 ABCD File3

 

Appreciate any help / thoughts on how to best approach this.

 

Thank you,

 

4 REPLIES 4
ChrisTX
16 - Nebula
16 - Nebula

Can you post an actual CSV files with a few data rows, mocked up if necessary?

 

You could use a Formula tool to identify your true header row, append the header row number to the original input data, use a filter like [Row Number] >= [Header row number], then use Dynamic Rename, take field names from first row of data.

 

Then for your batch macro, use the macro from option 3 in this article: But what happens if your field schema is different from one file to another?

 

The Ultimate Input Data Flowchart
Community > Designer > Designer Knowledge > The Ultimate Input Data Flowchart
https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/The-Ultimate-Input-Data-Flowchart/t...

 

Chris

Felipe_Ribeir0
16 - Nebula

Hi @bchen1 

 

I would try with the batch macro too, good try. For this case, maybe if you go inside of the batch macro and

 

1)Do not pick the first row as header

2)Filter the dataset in a way that keep the header and the data

3)Use the dynamic rename to pick the first row as header

4)select the proper columns

 

Felipe_Ribeir0_0-1665493664133.png

 

Maybe the output will work as expected.

 

bchen1
6 - Meteoroid

Attaching a few mocked up CSVs to help illustrate thank you! @ChrisTX 

ChrisTX
16 - Nebula
16 - Nebula

You can take coding from the attached workflow, and add it to the beginning of the batch macro I mentioned above.

 

ChrisTX_0-1665511518164.png

 

 

Chris

Labels
Top Solution Authors