Hello,
I have a directory of 100 CSV files and would to combine them into one table for analysis. There are a few issues with the underlying data.
- They are originally Excel sheets, converted through Python to CSV files to circumvent the 1904 date system import issue
- This leads to the "Unnamed" columns and the numbering the leftmost column in the table below
- The CSVs do not have a standard format and the right column headers sometimes start on row 8 or 9 or 10
- The table below shows the general format for the CSVs where we have "Unnamed" headers in the first row, but the true "header" I want starts in Row 8
- As mentioned, some CSV files start on row 9 or 10, so I'd like to dynamically be able to read/import based on the row where "Product ID" is found & remove all rows above it, but keep all rows below
| | Unnamed: 0 | Unnamed: 1 | Unnamed:2 | Unnamed: 3 |
| 0 | | | | |
| 1 | | | | |
| 2 | | | | |
| 3 | | | | |
| ... | | | | |
| 8 | Product ID | Product Number | Name | Address |
| 9 | 1232 | | XX | |
I've tried utilizing a batch macro to bring in all the CSVs but the columns will never line up and it ends up being completely disjointed. The goal is an output table like the below:
| Product ID | Product Number | Name | Address | CSV File Name |
| 1232 | | XX | | File1 |
| 1233 | | XYZ | | File1 |
| 1234 | | ABC | | File2 |
| 12345 | | ABCD | | File3 |
Appreciate any help / thoughts on how to best approach this.
Thank you,