Hi all,
I am currently looking for a more complex "countifs" formula.
I would like to identify in each row if the combination IDa_IDb can be found in the past five years. As an example (please see my table below):
Unfortunately, I have a dataset of more than 10 million rows and I do not know how to create such "count" column. Does anyone have an idea how to solve it? I would really appreciate it!
Thanks a lot!
Best,
Alexander
ID | IDa | Year | IDb | Count |
1 | 20 | 2016 | 3 | 0 |
1 | 20 | 2016 | 4 | 0 |
2 | 20 | 2017 | 3 | 1 |
3 | 20 | 2018 | 3 | 1 |
3 | 20 | 2018 | 4 | 1 |
3 | 20 | 2018 | 5 | 0 |
4 | 20 | 2019 | 5 | 1 |
5 | 22 | 2016 | 3 | 0 |
5 | 22 | 2016 | 4 | 0 |
Solved! Go to Solution.
multi-row formula. This is pretty straight-forward to implement.
Off the top of my head i'd use something like:
if [row-1:id]=null() then -1 elseif [iDa]=20 and [IDb]=3 then [row-1:newfield]+1 else [row-1:newfield] endif
a few quick things - my multi-row works if you make sure rows that don't exist are listed as null().
I couldn't exactly follow what number should be 0 - but it seemed like pythonesuqe first match should be labeled as 0. I labeled rows before that match as -1 - but null() would work just as well.
if you want the first match to be 1 - change the -1 to 0. If you want it to be null() change the -1 to null() or empty().
oh and the base new field name in a multi-row is new field (with a space) - change that to newfield (no space) to play nice with my formula.
Thanks a lot for your fast reply.
I have already tried to implement a multi-row formula but that did not work for me. We can have the following case: We got the combination "20_2019_5" (IDa_Year_IDb). So, we need to search if we can find: 20_2018_5, 20_2017_5, 20_2016_5, 20_2015_5 or 20_2014_5. If we do so, count should be 1.
Important to notice is also that IDa, IDb and Year can be different than IDa = 20 and IDb = 3. I only pointed them out to show an example.
The column "count" does not exist yet, I want to create it and cannot find a way to do so.
For a total count (as opposed to a granular count at each entry) you can do a summarize - group by concat field and then count concat field. obviously if you want the first entry to be called 0 - you just subtract 1 from that figure.
Unfortunately, I cannot simply count the pairs as we also need to keep in mind that only the past 5 years are relevant. Imagine we have a combo "20_2025_5". In that case we need to search for the past 5 years again (20_2024_5, 20_2023_5, 20_2022_5, 20_2021_5, 20_2020_5). The count should be 0 as we do not have that values in the table. If we only counted the pair "20_5", we would also count the years 2019 and before.
that's no problem - but it's easier if you explain these qualifications PRIOR to asking for solutions.
Is there a specific time frame this match is supposed to occur in? Adding a year comparison statement to the multi-formula is fine - but I start looking for 2016-2021 and you say 2013-2018 it's a bit pointless for me to look at it without knowing exactly what you want.
In addition if you want 5 years from today - you can do that if you want to the year column as a specific date. If you want a rolling 5 years from teh start of your workflow ie a 2014 entry is vaild in 2017 but not in 2020 - that's doable but a) more complicated and b) needs a ton more data and explanation then you've provided.
I am sorry for the misunderstanding. My examples I listed above were not enough to make it clear.
To answer your questions:
I hope that makes it a bit more clear.
ID | IDa | Year | IDb | Count |
1 | 20 | 2016 | 3 | 0 |
1 | 20 | 2016 | 4 | 0 |
2 | 20 | 2017 | 3 | 1 |
3 | 20 | 2018 | 3 | 1 |
3 | 20 | 2018 | 4 | 1 |
3 | 20 | 2018 | 5 | 0 |
4 | 20 | 2019 | 5 | 1 |
5 | 22 | 2016 | 3 | 0 |
5 | 22 | 2016 | 4 | 0 |
6 | 20 | 2009 | 3 | 0 |
7 | 20 | 2010 | 3 | 1 |
8 | 20 | 2012 | 5 | 0 |
Hi @drh
The trick with these kinds of question is to get them sorted in the correct order before using a multirow tool.
After adding a RecordID, sort by IDa, IDb and Year. This puts the records in the correct order so that
if isnull([Row-1:Calculated Count]) then
0
elseif [Year] -[Row-1:Year] <=5 then
1
else
0
endif
in a Multi-Row tool, grouped by IDa and IDb gives you the correct answer. The final sort puts back the original order
Dan