Importing Multiple CSVs without standardized headers and data different rows

Question

Hello,

I have a directory of 100 CSV files and would to combine them into one table for analysis. There are a few issues with the underlying data.

* They are originally Excel sheets, converted through Python to CSV files to circumvent the 1904 date system import issue* This leads to the "Unnamed" columns and the numbering the leftmost column in the table below

* The CSVs do not have a standard format and the right column headers sometimes start on row 8 or 9 or 10
* The table below shows the general format for the CSVs where we have "Unnamed" headers in the first row, but the true "header" I want starts in Row 8* As mentioned, some CSV files start on row 9 or 10, so I'd like to dynamically be able to read/import based on the row where "Product ID" is found & remove all rows above it, but keep all rows below

Unnamed: 0Unnamed: 1Unnamed:2Unnamed: 30    1    2    3    ...    8Product IDProduct NumberNameAddress91232 XX

I've tried utilizing a batch macro to bring in all the CSVs but the columns will never line up and it ends up being completely disjointed. The goal is an output table like the below:

Product IDProduct NumberNameAddressCSV File Name1232 XX File11233 XYZ File11234 ABC File212345 ABCD File3

Appreciate any help / thoughts on how to best approach this.

Thank you,

ChrisTX · Answer

You can take coding from the attached workflow, and add it to the beginning of the batch macro I mentioned above.

Chris

Importing Multiple CSVs without standardized headers and data different rows.yxmd

bchen1 · Answer

Attaching a few mocked up CSVs to help illustrate thank you! @ChrisTX

CSV1.csv

CSV2.csv

Felipe_Ribeir0 · Answer

Hi @bchen1

I would try with the batch macro too, good try. For this case, maybe if you go inside of the batch macro and

1)Do not pick the first row as header

2)Filter the dataset in a way that keep the header and the data

3)Use the dynamic rename to pick the first row as header

4)select the proper columns

Maybe the output will work as expected.