We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

COUNTIFS - Dynamic conditions

drh
6 - Meteoroid

Hi all,

 

I am currently looking for a more complex "countifs" formula.

I would like to identify in each row if the combination IDa_IDb can be found in the past five years. As an example (please see my table below):

  • In row 1, we find the combination: IDa = 20, IDb = 3, in 2016. As we do not know anything about previous years, the count should be 0.
  • In row 3, we find the combination: IDa = 20, IDb = 3, again. As we also find it in 2016 (row 1), the count should be 1.

Unfortunately, I have a dataset of more than 10 million rows and I do not know how to create such "count" column. Does anyone have an idea how to solve it? I would really appreciate it!

 

Thanks a lot!

 

Best,

Alexander

 

IDIDaYearIDbCount
120201630
120201640
220201731
320201831
320201841
320201850
420201951
522201630
522201640
10 REPLIES 10
apathetichell
20 - Arcturus

multi-row formula. This is pretty straight-forward to implement.

 

Off the top of my head i'd use something like:

 

if [row-1:id]=null() then -1 elseif [iDa]=20 and [IDb]=3 then [row-1:newfield]+1 else [row-1:newfield] endif

apathetichell
20 - Arcturus

a few  quick things - my multi-row works if you make sure rows that don't exist are listed as null().

 

I couldn't exactly follow what number should be 0 - but it seemed like pythonesuqe first match should be labeled as 0. I labeled rows before that match as -1 - but null() would work just as well.

 

if you want the first match to be 1 - change the -1 to 0. If you want it to be null() change the -1 to null() or empty().

 

oh and the base new field name in a multi-row is new field (with a space) - change that to newfield (no space) to play nice with my formula.

 

drh
6 - Meteoroid

Thanks a lot for your fast reply.

I have already tried to implement a multi-row formula but that did not work for me. We can have the following case: We got the combination "20_2019_5" (IDa_Year_IDb). So, we need to search if we can find: 20_2018_5, 20_2017_5, 20_2016_5, 20_2015_5 or 20_2014_5. If we do so, count should be 1.

 

Important to notice is also that IDa, IDb and Year can be different than IDa = 20 and IDb = 3. I only pointed them out to show an example.

The column "count" does not exist yet, I want to create it and cannot find a way to do so.

apathetichell
20 - Arcturus

so you are just counting pairs?

Is there a reason you can't make a concat field of the two values and just count on that?

 

apathetichell
20 - Arcturus

For a total count (as opposed to a granular count at each entry) you can do a summarize - group by concat field and then count concat field. obviously if you want the first entry to be called 0 - you just subtract 1 from that figure.

drh
6 - Meteoroid

Unfortunately, I cannot simply count the pairs as we also need to keep in mind that only the past 5 years are relevant. Imagine we have a combo "20_2025_5". In that case we need to search for the past 5 years again (20_2024_5, 20_2023_5, 20_2022_5, 20_2021_5, 20_2020_5). The count should be 0 as we do not have that values in the table. If we only counted the pair "20_5", we would also count the years 2019 and before.

apathetichell
20 - Arcturus

that's no problem - but it's easier if you explain these qualifications PRIOR to asking for solutions.

 

 

Is there a specific time frame this match is supposed to occur in? Adding a year comparison statement to the multi-formula is fine - but I start looking for 2016-2021 and you say 2013-2018 it's a bit pointless for me to look at it without knowing exactly what you want.

 

In addition if you want 5 years from today - you can do that if you want to the year column as  a specific date. If you want a rolling 5 years from teh start of your workflow ie a 2014 entry is vaild in 2017 but not in 2020 - that's doable but a) more complicated and b) needs a ton more data and explanation then you've provided.

drh
6 - Meteoroid

I am sorry for the misunderstanding. My examples I listed above were not enough to make it clear.

To answer your questions:

  • The list includes data from 2000 to 2020 (column: Year).
  • The five year horizont is row-specific and driven by the column Year. If Year is 2016, past 5 years are from 2011-2015. If Year is 2020 the relevant years are from 2015-2019. So, the time horizont is not the same in all rows. I guess that is what you mean with rolling 5 years.
  • I added more rows to my examples to clarify what I mean with 5 past years. "20_2016_3" should still get a Count of 0, the last entry of "20_3" is in 2010, more than 5 years ago.
  • Still, "20_2010_3" should get a Count of 1 because we have another "20_3" entry in 2009.

I hope that makes it a bit more clear.

 

 

IDIDaYearIDbCount
120201630
120201640
220201731
320201831
320201841
320201850
420201951
522201630
522201640
620200930
720201031
820201250
danilang
19 - Altair
19 - Altair

Hi @drh 

 

The trick with these kinds of question is to get them sorted in the correct order before using a multirow tool.  

 

danilang_0-1619698124261.png

 

After adding a RecordID, sort by IDa, IDb and Year.  This puts the records in the correct order so that

if isnull([Row-1:Calculated Count]) then
	0
elseif [Year] -[Row-1:Year] <=5 then
	1
else
	0
endif

in a Multi-Row tool, grouped by IDa and IDb gives you the correct answer.  The final sort puts back the original order

 

danilang_1-1619698369888.png

Dan

 

Labels
Top Solution Authors