This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 10-18-2017 04:45 PM - edited on 08-03-2021 09:47 AM by csalgado5
Thisarticle is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Multi-Row Formula Tool on our way to mastering the Alteryx Designer:
The Multi-Row Formula Tool functions much like the normal Formula Tool but adds the ability to reference multiple rows of data within one expression. Say, for example, someone was on the ground floor of a house and had a Formula Tool. They would only be able to talk to the people also on the ground floor. If they had a Multi-Row Formula Tool, though, they would also be able to talk to the people upstairs, in the attic, and in the basement as well.
The Multi-Row Formula Tool allows the user to either update an existing field or create a new one. Both options can be very helpful, and there are many use cases for each. The use cases we detail below are attached in the v11.0 workflowMulti-Row.yxmd.
Update Existing Field
Often times, data can be arranged in a way that would look good in Excel but is not realistic to work with in Alteryx. In these instances, you might want to “fill down” a column that has missing values.
In this case, only the first person in each family has their family name populated. Therefore, we would like to modify the Family field where it is currently Null(). This can easily be accomplished like so:
To easily get Variables and Functions into the Expression box, simply double click them from their corresponding tab in the Configuration.
Create New Field:
Now that the data is cleaned up, we want to add a unique ID to each person according to the family they belong to. In order to accomplish this, we will configure the Multi-Row Formula Tool as follows:
When creating a new field, the user has the ability to select what field type to give it. Here, Int 16 was chosen. Note that we are grouping by Family because we want the ID to be unique within each family, but it can start over with different families. This new ID field could be used later as an identifier with tools like Filter, Summarize, and more.
What is a good Multi-Row Tool without the ability to reference more than just the rows directly before or directly after it? By changing the value in Num Rows in the Configuration, you are able to increase the number of rows you can use in your expressions.
In this example, we increased the Num Rows to 2. Notice how there are now more variables to select from in the middle section on the Configuration. Because we want the running total by family, we again group by this field.
This method can be extended to generate running totals in many different scenarios including monetary sales, item counts, and more.
Now we want to calculate the average amount of total screen time by family. In this Multi-Row Formula Tool Expression box, we write:
iif([ID]=3,average([Row-2:Total Screen Time],[Row-1:Total Screen Time],[Total Screen Time]),iif([ID]=2,average([Row-1:Total Screen Time],[Total Screen Time],[Row+1:Total Screen Time]),average([Row+2:Total Screen Time],[Row+1:Total Screen Time],[Total Screen Time])))
This checks each row’s ID value and customizes the formula to generate an average by using the correct rows.
The last part of the configuration for this tool that has not yet been changed is the Values for Rows that don’t Exist drop down. This tells Alteryx what to use when a formula needs to use the value of a row that doesn’t exist (hence the name of the drop down). This would happen, for example, when calculating a value for the first row of data when the expression contains a Row-1 variable. The user can choose what that non-existent row value is.
In this example, the row that doesn’t exist is set to the closest valid row. That way, when the formula tries to use [Row -1: Total Screen Time] on row 1 and sees that it doesn’t exist, it uses row 1’s value instead. This results in the difference of 0 that you see in the output.
A Sort was placed before this Multi-Row Formula Tool to sort Total Screen Time in descending order. The above formula created a field that is the difference between the current row’s total screen time and the previous row’s total screen time. This makes it easy to see how much more screen time each person needs in order to catch up to the person ahead of them.
Many times the Multi-Row Formula Tool will be used in combination with the Cross Tab Tool. This is particularly helpful if you are trying to transform a couple columns of data into a table. One member of the Alteryx Community had this exact problem and was able to get it solved in this post.
The Multi-Row Formula Tool can only update one field per tool instance. If you would like to update multiple fields (and you feel comfortable with both the Multi-Row Formula Tool and the Multi-Field Formula Tool), try out this macro that was posted on the public Alteryx Gallery.
By now, you should have expert-level proficiency with the Multi-Row Formula Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know atcommunity@alteryx.comif you’d like your creative tool uses to be featured in the Tool Mastery Series.
Stay tuned with our latest posts every#ToolTuesdayby following@alteryxon Twitter! If you want to master all the Designer tools, considersubscribingfor email notifications.
Additional Information
Click on the corresponding language link below to access this article in another language -
Great examples!
@Syarifhidayat Yes! For example, this expression starts at the number 2:
iif([Row-1:ID]+1=1,[Row-1:ID]+2,[Row-1:ID]+1)
Another way to do this would be to use the regular [Row-1:ID]+1 expression then add a separate Formula tool that just adds 1 to each ID number. Hope this helps!
Thanks for this post and information on how to create an ID field. This was a useful crosswalk for the SQL rank function and easier to create without worrying about the partition syntax. Also, this helps with the lag function that I remember from my SAS history. Either way, this is easier.
This is very helpful. I was struggling with adding ID based on Business Name and this tool really solved my problem.
Thank you for the feedback! I'm glad this post has been helpful!
Hi,
i dont see any multi row tool for indatabase objects?
is that true if yes then do i need to stream data out of db and use normal multi row?
pls clear this.
BR
Brij
This article was very helpful, but what about a multi row formula that's dynamic across multiple rows?
For example, someone is in a company for 10 years and we want to set years 2-10 to the value that is present for year 1, which in this case is the earliest or lowest year? There could be different values in years 2-10?
Additionally, there could be some people who are only there for 2-6 years for example. Every person has a unique ID for each iteration of the year.
Hey @tomunger848, thanks for the question. I would sort your data first then use the Group By feature in the Multi-Row Formula tool. See the below photos for an example I made up. If you need additional help for a specific use case you have, I would suggest adding a new topic in one of the discussion boards.
Is there a way to refer to the last preceding row with a non-null value?
What is your use case @DHB? Could you 'fill down' like in the example above?
I managed to find a solution to this problem which I posted here. I had to filter out the rows with a null value in order to make the Multi-Row formula work before Unioning those rows back in and sorting them back into the right order.
Hi , can any explain the logic behind the
Row -1
Row +0
Row +1
How it be consider in the Expression window.
Thank-you.
Hi @dondapati
The Multi-Row Formula tool goes row by row for the field you selected and performs the statement you type in the Expression window. In this expression, you can refer to different rows of data. This differs from the normal Formula tool in that the normal Formula tool can only look at the current row's values.
In your expression, using "Row -1" will look at the field's value in the row just above the current row. "Row +0" is the active row and will look at the value for the row currently being evaluated. "Row +1" references the value in the row just after the current row.
For example, say Alteryx is running through the rows and performing the expression in a Multi-Row Formula tool. Say it is currently evaluating row 20. If the expression contains a field referencing Row -1, it will take the value from that field in row 19, Row +0 will take the value from row 20, and Row +1 will take the value from row 21.
By using the Num Rows toggle, you can change the amount of rows away from the current that you can reference in your expression.
Hope this helps clear things up for you!
by the way, is it just me or did anyone notice the Game of Thrones theme in the data?
awesome!
Hi,
I have an identical problem to the Filling Down example for the Family column, where I want to replace nulls with the valid value above. I have used the same IIf statement, but the logic only works for the first row after a valid value. (i.e. the second null remains unchanged etc.). Is there a setting I need to amend so that the fill down works for more than one row?
Formula used: IIF(IsNull([BusArea]),[Row-1:BusArea],[BusArea])
Note: The formula gives an error if I try to update the existing field (The field "" is not contained in the record), so I have had to create a new field.
Regards
Jason
Hey @jnew8
Based just on the formula you provided, it seems as though you should be getting the correct result. I would check your selection for the 'Values for Rows that don't Exist' drop down. Also, are you grouping by any fields that might affect your results?
If you check these things and you are still having troubles, you can either post a screenshot of your workflow with the input data and the Multi-Row Formula tool configuration here, send me a private message with that screenshot, or post a new discussion in the Designer discussion board with an attachment to your actual workflow so I can download it and run it on my machine. If you post a new discussion, you can tag my name if you want, and I will see it right away to get back with you.
Hope this helps!
Hi BarnesK,
Thanks for responding to my question. I have partially resolved the issue myself, as I sent the Alteryx file to a colleague and he ran it on his machine without an error and having made no changes to the code. So it appears there might be a problem with my installation? I am about to get a new pc with a new version of Alteryx, so once I have the new pc I will try out the code again and see what happens.
Regards
Jason
I'm trying to fill down a field exactly as shown and am getting this error message,
"Error: Multi-Row Formula (5): The field "" is not contained in the record."
Here's my formula,
iif(isnull([Course]),[Row-1:Course],[Course])
Any idea as to what I'm doing wrong here?
I've got it now. You also have to specify the field at the top as well as in the expression.
Hi @DHB
Thanks for reading this article, and I'm glad you were able to solve your issue! If you run into any more problems, definitely reach out and I would be happy to help out. 😊
Hi BarnesK,
This is a very great article.
I have a quick question. I'm doing a bit test on Multi-Row fomula and the result is not what I expected....
I was mean to fill in the Year, but it seems the tool re-arrange the rows firstly then put all 0s in. Did I configure it wrong?
Thanks.
Hello @Jes_Li , your formula works, you just need to uncheck the Group, by checking Year, the data gets sorted by this field to apply the formula.
Hi Margarita, really appreciate your reply. Problem now is solved. Thanks
Hi BarnesK,
A quick question, is there a way to fill "up" the value like I have below data:
a | 20 | |
b | 30 | |
c | 40 | |
Cash | Total | 90 |
d | 200 | |
e | 0 | |
Non-cash | Total | 200 |
And I want to fill the empty becoming this:
Cash | a | 20 |
Cash | b | 30 |
Cash | c | 40 |
Cash | Total | 90 |
Non-cash | d | 200 |
Non-cash | e | 0 |
Non-cash | Total | 200 |
Found the answer: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Multi-Row-Formula-to-quot-copy-up-quot...
Sorry .
Hey Guys,
I want to apply different formula on different rows and the formulas to be used in each successive row contains different column ,how can I parse through different columns one by one on each row ?
@Kenda
SPC | GDCPOR-a | GDCPOR-b | GDCPOR-c | GDCIND-a | GDCIND-b | GDCIND-c |
1 | 0 | 0 | 99 | 23 | 33 | 0 |
2 | 44 | 0 | 567 | 0 | 4 | 0 |
3 | 7 | 12 | 0 | 0 | 879 | 0 |
4 | 140 | 0 | 12 | 0 | 12 | 0 |
bl&task | Most used spc for GDC POR | Most used spc for GDC IND |
a | 4 | 1 |
b | 3 | 3 |
c | 2 | - |
As u can see im trying to find the maximum used Spc from different columns ,for example- GDCPOR-a contains 140 in 4th row and ts spc is 4 .
I have mocked up a workflow for you to get to your desired output. It actually uses no Multi-Row Formula tools. Instead, the key was to transform the data first. Because I can not include attachments to this post, I will provide a screenshot and some details explaining what each tool is doing. If you still have questions, feel free to send me a private message and I can assist you further. You could also post a new discussion in the Designer Discussions section of the community and one of us will be able to help you further there, as well.
1. Transpose the input data selecting SPC as your Key Column and selecting the rest as Data columns.
2. Use the Text To Columns tool to split your Name field with - as your delimiter. Only generate 2 columns.
3. Cross Tab your data with SPC & Name2 as the fields you group by, Name1 as your column headers, Value as the Values for new columns, and Sum as the aggregating method.
4. Create two streams of data - one for the IND group and one for the POR group. For each, sort Descending, then keep only the first one with the Sample tool (grouping by Name2).
5. Modify the SPC field with this expression:
iif([GDCIND]=0,null(),[SPC])
(use GDCPOR here for the other stream of data)
6. Join both streams by Name2, and only keep Name2 and the 2 SPC fields (renaming with the Join tool as well).
Hope this helps!
Hi Kenda,
Your advice regarding this tool has been very insightful - I'm trying to pull data into respective columns but am unsure how to go about it.
In my scenario, I have a data set that is providing 3 different sets of ID's but am unable to pull them out except for the first row in the normal formula tool. I hope the visual below helps with what I am trying to explain. Have you ever run into this type of problem before?
Input Info | A | B | C | D |
1234 | 1234 | 5678 | 91011 | 121314 |
5678 | ||||
91011 | ||||
121314 |
A,B,C and D would be the output columns.
Hi @Ben_Kohler Just to be clear, do you start out with just the column of Input Info then you want to create columns A, B, C, D that look like the picture you provided?
Hi @Kenda,
It's actually being pulled from a text file. I'm able to get the first row easily with a Left (Field_Name, #) pretty easily but
the additional data shows up directly below the first one.
So in the text file, ill see something like the below:
Acct # 1234
5678
91011
The problem I'm currently experiencing is that 1234 will appear in my subsequent columns as 1234 instead of what is below. The text file will always have the same format, just different data from day to day, and I've used the multi row formula to "fill in" dates and other info until the info changes, but never like this before.
I think I know what you're getting at @Ben_Kohler
If I'm understanding correctly, you actually may not need the Multi-Row Formula tool. This solution would make your workflow dynamic enough to handle any amount of input rows that may come in. If your input/output look like this, then this solution should work for you:
First create a RecordID then CrossTab your data using the RecordID as the headers and the Acct #'s as the values. Next use a Join and join the output of the CrossTab with your text input by record position. Then use a Union to keep the J plus L (here, L is input data) and make sure to sort by a specific order (J input first).
@Kenda 🙂 Thanks!!! I'll give this a try and see how it works out. I hadn't thought of this approach/used the transpose tool very much. Much appreciated.
@Kenda - I've worked this a few times. I don't think this will be my solution unfortunately. It has allowed me to play with the cross tab tool more and learn it's functions. When attempting this, it did exactly what you had explained, however on my end my records were much larger than anticipated.
Upon running my data, the information started to appear as so: (The column headers go all the way into 400+ in this pattern).
Initial Row | A | B | C | D | E | F | G | H | I |
1 | 1 | ||||||||
2 | 2 | ||||||||
3 | 3 | ||||||||
4 | 4 | ||||||||
5 | 5 | ||||||||
6 | 6 | ||||||||
7 | 7 | ||||||||
8 | 8 | ||||||||
9 | 9 |
Unsure if there is a way to grab specific records based on position and then place them into a field similar to the example below? I figure then I could use a formula based on the field.
Initial Row | A | B | C |
1 | 1 | 2 | 3 |
2 | 4 | 5 | 6 |
3 | 7 | 8 | 9 |
4 | |||
5 | |||
6 | |||
7 | |||
8 | |||
9 |
Sorry if this has evolved more than expected. I know I need to work on this some more.
I'm having issues where the multi-row tool seems to stop working after ~51,000 rows. The use case is somewhat intensive as it is looking if sequential row centroid buffers overlap, along with two other conditions to generate an identifier.
Are you aware of any limitations on this tool in terms of maximum rows/conditions/both it can handle?
Thanks for any help.
@apqueen I am not aware of anything like that. Is it consistent where it stops working? Is there any way that you could share your workflow/data so we could troubleshoot?
@Kenda - I eventually found the issue, which was my field type selection.
To conserve space in my new field I originally set the field type to Int16.
This worked fine with smaller data sets of about 10000 rows, however when I opened up the inputs (100,000+ rows) this resulted in more outputs than could be handled by Int16 and multirow formula calculated integer values over ~31000 returned Null.
After changing to Int64 my issues went away.
Hello @Kenda,
Is there a way to dynamically select all preceding values in formula?
Example: for the third value I want to sum the first and second, the fourth I want to sum the first, second, and third, fifth I want to sum the first, second, third, and fourth, and so on and so forth?
Thanks,
Nick
Hey @nickgomez610
For this use case, I think your best bet might be to use the Running Total tool in the Transform category.
Thanks for the response @Kenda.
I have tried to use the running total tool but the only problem is that I need the cumulative sum I calculated in the prior row to calculate the value in the current row. Therefore, I think I would need to rerun that running total for every row's calculation. Unless is there a way to make the running total dynamically recalculate the total for every row?
Yes, the Running Total tool calculates a cumulative sum for each record so all of the previous records will be taken into account. See the second field below.
If you want to do this in the Multi-Row Formula tool, you could. Just create a new field with this expression (where New Field is the new field you're creating and Field2 is the field with your existing numeric values you want to add):
[Row-1:New Field]+[Field2]
You get the same output as above.
If neither of these solutions work for you, please provide some sample data and what you would like your output to look like, and I can provide you with more direct assistance.
Hi, can I fill a number of different rows simultaneousely? I just have to fill 12 rows and I'm thinking of appliying the formula for all of them at the same time, not just create 12 diferent Multi-Row-Formulas.
Hey @andreguichard
If you are looking to apply an expression to multiple fields at once using multiple rows, try out this macro created by my friend @patrick_digan
https://gallery.alteryx.com/#!app/Multi-Row-Multi-Field/59639862effc2a0c04c7e653
Perfect Kenda. Can you give me some insights on how to use the macro. Is not working for me as I input the data.
@andreguichard The syntax would be similar to that in other formula tools using standard functions and variables. This tool has the added row functionality (similar to multi-row), column functionality (new), and currentfield functionality (similar to multi-field). For those pieces, here is an example of how to build the syntax:
[Row+1:FieldName:Column+1]
You can leave the Row+1 part or the Column+1 parts blank if you don't need them.
Hi Kenda,
I have the below dataset, I want the difference between two periods for a particular part no. based on its type, company and division (e.g. 201902 vs 201901 so on and so forth),is there a way I can do this with the help of Multi Rows formula tool?
Thanking you in Advance,
Priyanka
Hi there @psinghania
Depending on how you would like your output to look, there are two different ways to approach this.
If you want to use the Multi-Row Formula tool, you could sort your data first by Part No and Period then use the Multi-Row Formula tool do find the difference. You'll want to group by type, company, and division. This would look as follows:
On the other hand, you could cross tab your data first so that your amounts for each type, company, part no, div combo are in one row and each period is in a different field. From there, you could use a normal Formula tool to create a field to calculate the difference.
I hope this is helpful!
Hello all, I am attempting to run compares on massive CSV files with 300+ columns. An example I am trying to do below.
I'm attempting the tool you mentioned above, As you can see there will always be Two "IDS" the same, so I'm trying to compare the same two ID's on specific fields such as Originator, the problem is when I'm doing record zero and +1, it's not really accurate when the data is spitting out because after the initial first row, it will be comparing two ID's that aren't the same, I need logic so it first determines if the ID's are the same then "analyze" those two rows and then write the "reason" to the new column for both of those ID's that it analyzed, any help would be appreciated!!! I'm stuck.