Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

COUNTIFS - Dynamic conditions

drh
6 - Meteoroid

Hi all,

 

I am currently looking for a more complex "countifs" formula.

I would like to identify in each row if the combination IDa_IDb can be found in the past five years. As an example (please see my table below):

  • In row 1, we find the combination: IDa = 20, IDb = 3, in 2016. As we do not know anything about previous years, the count should be 0.
  • In row 3, we find the combination: IDa = 20, IDb = 3, again. As we also find it in 2016 (row 1), the count should be 1.

Unfortunately, I have a dataset of more than 10 million rows and I do not know how to create such "count" column. Does anyone have an idea how to solve it? I would really appreciate it!

 

Thanks a lot!

 

Best,

Alexander

 

IDIDaYearIDbCount
120201630
120201640
220201731
320201831
320201841
320201850
420201951
522201630
522201640
10 REPLIES 10
apathetichell
20 - Arcturus

multi-row formula. This is pretty straight-forward to implement.

 

Off the top of my head i'd use something like:

 

if [row-1:id]=null() then -1 elseif [iDa]=20 and [IDb]=3 then [row-1:newfield]+1 else [row-1:newfield] endif

apathetichell
20 - Arcturus

a few  quick things - my multi-row works if you make sure rows that don't exist are listed as null().

 

I couldn't exactly follow what number should be 0 - but it seemed like pythonesuqe first match should be labeled as 0. I labeled rows before that match as -1 - but null() would work just as well.

 

if you want the first match to be 1 - change the -1 to 0. If you want it to be null() change the -1 to null() or empty().

 

oh and the base new field name in a multi-row is new field (with a space) - change that to newfield (no space) to play nice with my formula.

 

drh
6 - Meteoroid

Thanks a lot for your fast reply.

I have already tried to implement a multi-row formula but that did not work for me. We can have the following case: We got the combination "20_2019_5" (IDa_Year_IDb). So, we need to search if we can find: 20_2018_5, 20_2017_5, 20_2016_5, 20_2015_5 or 20_2014_5. If we do so, count should be 1.

 

Important to notice is also that IDa, IDb and Year can be different than IDa = 20 and IDb = 3. I only pointed them out to show an example.

The column "count" does not exist yet, I want to create it and cannot find a way to do so.

apathetichell
20 - Arcturus

so you are just counting pairs?

Is there a reason you can't make a concat field of the two values and just count on that?

 

apathetichell
20 - Arcturus

For a total count (as opposed to a granular count at each entry) you can do a summarize - group by concat field and then count concat field. obviously if you want the first entry to be called 0 - you just subtract 1 from that figure.

drh
6 - Meteoroid

Unfortunately, I cannot simply count the pairs as we also need to keep in mind that only the past 5 years are relevant. Imagine we have a combo "20_2025_5". In that case we need to search for the past 5 years again (20_2024_5, 20_2023_5, 20_2022_5, 20_2021_5, 20_2020_5). The count should be 0 as we do not have that values in the table. If we only counted the pair "20_5", we would also count the years 2019 and before.

apathetichell
20 - Arcturus

that's no problem - but it's easier if you explain these qualifications PRIOR to asking for solutions.

 

 

Is there a specific time frame this match is supposed to occur in? Adding a year comparison statement to the multi-formula is fine - but I start looking for 2016-2021 and you say 2013-2018 it's a bit pointless for me to look at it without knowing exactly what you want.

 

In addition if you want 5 years from today - you can do that if you want to the year column as  a specific date. If you want a rolling 5 years from teh start of your workflow ie a 2014 entry is vaild in 2017 but not in 2020 - that's doable but a) more complicated and b) needs a ton more data and explanation then you've provided.

drh
6 - Meteoroid

I am sorry for the misunderstanding. My examples I listed above were not enough to make it clear.

To answer your questions:

  • The list includes data from 2000 to 2020 (column: Year).
  • The five year horizont is row-specific and driven by the column Year. If Year is 2016, past 5 years are from 2011-2015. If Year is 2020 the relevant years are from 2015-2019. So, the time horizont is not the same in all rows. I guess that is what you mean with rolling 5 years.
  • I added more rows to my examples to clarify what I mean with 5 past years. "20_2016_3" should still get a Count of 0, the last entry of "20_3" is in 2010, more than 5 years ago.
  • Still, "20_2010_3" should get a Count of 1 because we have another "20_3" entry in 2009.

I hope that makes it a bit more clear.

 

 

IDIDaYearIDbCount
120201630
120201640
220201731
320201831
320201841
320201850
420201951
522201630
522201640
620200930
720201031
820201250
danilang
19 - Altair
19 - Altair

Hi @drh 

 

The trick with these kinds of question is to get them sorted in the correct order before using a multirow tool.  

 

danilang_0-1619698124261.png

 

After adding a RecordID, sort by IDa, IDb and Year.  This puts the records in the correct order so that

if isnull([Row-1:Calculated Count]) then
	0
elseif [Year] -[Row-1:Year] <=5 then
	1
else
	0
endif

in a Multi-Row tool, grouped by IDa and IDb gives you the correct answer.  The final sort puts back the original order

 

danilang_1-1619698369888.png

Dan

 

Labels
Top Solution Authors