So I have this case and can't figure out a way to overcome the situation.
I'm reading different excel files from one directory and each excel contains X number of tabs (might differ from file to file) (Using this solution), turns out that I need to read parts of each tab and union them into one single output. The challenge is that each tab has a different line to start (Row number) the reading of the rows cause the first N rows could contain information that is not important for my purposes, the only common thing that each Tab has is that the portion of the information that I need to start has the same Title on top.
As a brief, I have , X number of Files, Y number of tabs. Each Tab has a Text that Reads "Examples Below" and after that text is the information I need to copy into one single master file, caveat here: Title "Examples Below" could be in Row 10 in one Tab, and Row 400 in the next one and so on..
As you can imagine my first approach was to try to use the Start data Import on Line feature in an input data tool, but I see this as a hard number not able to change dynamically.
Thanks in advance, I know this one might be difficult to achieve.
Solved! Go to Solution.
I have overcome this before by getting creative with the formula tool, a running total, and a filter. I use a formula tool that checks a certain column for text that would indicate it is the first row of data and if it does, puts the number 1 into a new column. Then, I use a running total tool on that column, and finally a filter that says running total is greater than or equal to 1. You can also do a "group by" in the running total tool, so if you output the file name as a field, you could group by the file name and create the running total as described above, so only the records following the starting point row would be output. Please let me know if you would like further clarification!
HI,
Thank you for the input, I am kinda new to Alteryx but I will try to follow the logic in the comment and come back with results or questions.
Thank you again!
Step by step walkthrough:
1.) Data starts on row 5 in this example
Step 2.) Create formula that puts a flag where the field 1 contains text that would indicate the field name at the start of your data
Step 3.) Create a running total that works off of the flag column. This is useful because every row after the start of your data will have a value at least equal to 1 which sets us up to use a filter tool after.
Step 4.) Filter rows where the run total flag is greater than or equal to 1
Step 5.) Use a select tool to uncheck the flag and running total columns because they are no longer needed.
Finally, a dynamic rename tool can be used at the end with the setting "Take Field Names from First Row of Data" if you want to make the first row of data your new column headers.
Wow,
Thank you very much, this totally helps me with the case I presented.
Thanks a lot
Hey Brandon,
Quick question here. Does some solution comes to your mind when we don't know the name of the Column to look for the Start of Data value?
Was implementing the solution and I realized that I am looping through different Excel Files and Tabs and each Tab will have a different Header name for the column where the text I need is located. So using your example, my first Tab's column might be "Field1", the second could be "Donald Duck", the next one "Field1" again and so on every time will be unknown.
Any ideas around this one?
Thanks again!
Worked as expected.. thank you
Great answer! I used your logic and added a formula to update the record I needed the import to stop at to -1. The running total there would then go back to 0. I love this community!