I have around 2000 excel files in a folder. I need one single output file of with the last row of each file. Please note, columns are same in all excel, but number of lines/rows will vary in each excel file.
Hello @Bhavani_A ,
Is there a date field that you can sort on or something to identify that sort to least and most.
No. There is no date field.
@Bhavani_A I think Dynamic Input and Sample Tool should do the work.
Hey @Bhavani_A
I see 4 steps in this:
a) load up all the data: using the input tool, with the filename set to *.xlsx, and make sure you turn ON the option called "Output filename as field"
- Now you have all rows in; and each row will tell you what file it comes from
b) Add a unique row ID: preferrably per file. two ways to do this
- Use a multi-row tool
- Use the tile tool - @MattD has a great article here that tells you how to do unique rows: https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-Tile/ta-p/30235
c) Reverse the order of the data: sort the data by filename and rowID Descending - that way the last row appears first.
d) Take only the highest RowID for each file: use a sample tool with a group-by on the filename to only return the first row.
that should get you what you need in 4 tools or so.
@SeanAdams ,
Let's give a few words of warnings about Excel and how to ensure good data. I wouldn't trust Excel with my weekend. If I'm reading multiple excel files, I could hope that the metadata is compatible between the files. History has proven that you can't rely upon trust that each file will be consistent. Even if columns are not changed, it is possible that the data contents might be altered in a way that makes the data type change.
I agree with your calculation of "last", though I would handle it differently. First, I would use a CReW Data Cleanse (you can use the Alteryx version if you're so inclined) and eliminate empty ROWs. If there are blank rows at the bottom because someone opened, touched and saved the workbook, you don't want empty rows interfering with real data. I also skip the SORT of the data by using SAMPLE tool functionality: Last N Records (N=1) and Group by column (Filename).
So, in the end you have:
DIRECTORY -> DYNAMIC INPUT -> DATA CLEANSE -> SAMPLE
You can use:
INPUT -> DATA CLEANSE -> SAMPLE
But for sleeping purpose, you might want to include a SUMMARIZE to count records from each filename and compare the record count to a DIRECTORY tool pointing to the same data and making sure (TEST tool) that the summarize counts have the same record count as the directory listing.
Cheers,
Mark