This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I'm new here, somewhat new to Alteryx. Not sure if I chose the right labels for this (please let me know if I should change any of them). I'd really appreciate any help solving the below.
I have an input report that changes every few weeks that I need to run Alteryx on to do a variety of things. The number of rows (generally in the thousands) and the contents of the records change but certain things stay the same. There are anywhere from 30 to 40 fields in each report (this is variable) but never more than 40 columns.
Field A has a lot of different words and numbers depending on the record (it is a PDF i save as an excel spreadsheet to use in alteryx) but what is important is that in anywhere from 5 to about 40 of these records what it says in column A is either "report file" or "finalreport file". This is the key thing.
What I need is to select/filter out all of and only those records that follow "more recently" after the nearest row above it that says "report file" than it does after the nearest row above it that says "finalreport file". Basically all the rows that are in the sections titled "report file" I need to select or filter but I need to weed out the ones in the sections called "finalreport file". I presume the best thing would be to be able to identify whether each record is in a "finalreport file" section or a "report file" section.
Many things change in each report each week:
1. Which rows are labelled "report file" or "finalreport file" change from report to report each week.
2. Also how many records are in each section changes each week.
3. Furthermore, how many sections called "report file" and how many sections are called "finalreport file" also change each week.
4. Also when one section is called "report file" (or "finalreport file") the next section may say "finalreport file" or "report file" again; it's highly variable. Sometimes I get 2 sections called "finalreport file" in a row; sometimes 10 in row. Ditto for "report file.
5. How many rows until the first section (whether it is "finalreport file" or "report file" changes each week/report.
I have done some research and the simplest method I have been able to come up with (though I am having trouble executing on this method for reasons I can elucidate) is as follows:
1. Create two multirow formula that say the following (or similar iterations, I've had trouble getting it to work - I keep getting parse error or other error "malformed formula"):
If[F1]=="report file" THEN [Row-1: A]+1 ELSE [Row-1: A] ENDIF
where A is a new field.
If[F1]=="finalreport file" THEN [Row-1: B]+1 ELSE [Row-1: B] ENDIF
where B is a new field.
So basically making each of those columns count how many times it says "report file" OR "finalreport file" in field F1 above it for equation for field A and then how many times it says "finalreport file" in field F1 above it for field B in rows above it. So if in record 763 there are 3 sections called "finalreport file" above it and 7 sections above it called "report file" then that record 763's Column A (which would actually probably be field 41) will say 10 and its Column B (field 42) will say 3.
2. A new field C is simply = value in field A - value in field B. So this # changes from row to row IFF there is a new section called "report file" but not a new section called "finalreport file".
3. If equation in a new field D - having trouble with how to write this formula so here is the basic idea in words:
If [(value in field A of record before current record minus value in field A of current record does NOT = 0) AND (value in column C of record before current row minus value in column C of current row = 0)] THEN "finalreport file"
This is basically saying that IF there's BOTH (a change in value of field A and NOT a change in value of field C) IFF there has been a new section called "finalreport file"
4. If equation in a new field E: IF [(change in field A from previous row to current row does not equal 0) AND (change in field C from previous row to current row does not equal 0)] THEN "report file"
This is saying that if there is a change detected in field A and there is a change in field C then the new section must have been titled "report file" which i believe makes sense as corrolary of reasoning behind step 3.
5. New field F that is equal to what is in field E for that row or if nothing/null then what is in field D.
6. New field G that is = to what is in field F or if null/nothing then what is in previous record's field G.
Then with step 6 this means every record will say if it is in a section that is titled "report file" or "finalreport file". Every row will say either "report file" or "finalreport file" in field G (column 47) - at least in theory, right?
I hope this makes sense. please let me know if anything here is wrong/illogical in my process above.
My two main questions are:
1. Is there an easier way to do this by any chance? Like is there another method to identify whether each record is in a section with such similarly worded titles like this? Specifically, one of the section types contains the words of the other section type exactly to the letter, with a little extra letters in addition though.
2. Could someone please advise how do I write these equations out? because I keep getting parse errors even when just trying to write the first step formula:
If[F1]=="report file" THEN [Row-1: A]+1 ELSE [Row-1: A] ENDIF
Attached is a set of data that is 100% different in content from what I'm using but pretty similar in terms of its formatting. The attached is much smaller than the actual input file I am using in terms of both # of fields and # of records but it's still gives a good idea of how the data are structured/ordered and formatted.
We can use the attached to do a test run.
Can you please what specific multi-row formula I should type to achieve what I outlined above? Thank you so much! Would really appreciate the help.
I'm unable to attach it but in the below picture you can see the file data. Again, this is 100% different in content from the actual file I am working with but the type of formatting and order of different types of data (words, numbers, etc) is generally similar.
Could you please advise what the multi-row formula should be to achieve what it is I outlined above? Would really appreciate it, thanks a lot!
Thanks for that workflow idea. I added an output function and ran it twice - once connecting from the True and once from false. When connecting from the True I got the first of the two screenshots below and when connecting from the False I got the second.
When running from True I should be getting the rows starting with 4, 5, 9, 33, 652, 908, 75, and 65 since those are the rows in the "finalreport file" sections but (as you can see in the first screenshot below) I'm getting 4, 5, 9, 33, 652, 908, 543, and 809. Any idea why I'm getting 543 and 809 instead of 75 and 65?
Also I should be getting the rows starting with 7, 55, 21, 97, 543, and 809 when running from False (since those are the rows in the two "report file" sections) is that correct? But instead i get ONLY two rows in the output: 75 and 809. This makes even less sense since they are not even in the same sections (one is a "finalreport file" section and one is in a "report file" section).
Any idea why this is happening and how to fix please? I can't figure it out. Thank you! Appreciate it
Okay so long story short I did some more testing/fiddling and was able to figure it out. Your solution works on the sample I created so a modified version should work on the actual one I am working with. Thank you so much! Really appreciate your help & quick responses