I've built a set of batch macros to deal with importing data from groups of Excel sheets in a folder.
The schema's, sheet names, and row starting positions are not consistent and so I have a process to analysis the schema's and only pull the consistent fields.
At the core of the process I have a macro that pulls a range of cells from a specified sheet - ignoring that headers exist and in string format.
For some excel (.xlsx) sheets, when I use the range function, the data in the sheet is incorrectly pulled - all null cells. The first example is the sheet with no range, and the second is with a Range applied - notice all the null fields.
Is this a bug? Or does someone have a solution for this?
I have seen some threads that suggested using the Microsoft Excel Legacy format for excel in the Input Tool, and that works to a point. However if there are dates fields formatted in the excel, this Legacy approach extracts the dates the way they are formatted in the Cells (eg if date 2021-03-01 was formatted as "1/3/21" (d/m/yy) it will come through as "1/3/21") and as I'm getting files globally I will need to build extra logic to work out the correct format.
Cheers
Craig
@SharkeyNZ try unchecking the first row has data box and see if that helps in either case
Unfortunately that doesn't work (thanks for the suggestion @aatalai)
Interesting result though - it picks up the headers but still has the Null data - the real data that should be in column 1... appears in column 30 (just after the headers finish).
Coming back to this issue again, it appears that the Excel .xlsx files may actually be .xlsb (so wrong extension). When I change the File format to Microsoft Excel Binary, the items are read correctly.
The challenge is now "how do I identify that and *.xlsx is actually a *.xlsb formatted file with the wrong extension, so that my ETL process can deal with this in an eloquent way?
User | Count |
---|---|
52 | |
27 | |
25 | |
24 | |
21 |