Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Select groups of records based on conditional test of fields in one record in each group

TimBoeve
5 - Atom

I need to regularly process a large flat file (csv) containing a few thousand rows.  The data is messy, but generally organized by [unit], [group], and [activity] where each file has up to 10 units, each unit has upwards of 300 groups and each group has 8 or so activities.  The number of records (rows) for each activity is highly variable and not known in advance.  I am interested in all the rows of any particular [unit] AND [group] AND [activity] WHEN [activity]=”act_nn_end” AND [result1]=”val1” AND [result2]=”val2” (for example the act_2_* rows in the table below)

This could probably done in a database by a SELECT for the condition and retaining the [unit], [group], and [activity] (wildcard the act_nn*) in a result table that I could then JOIN with the input to get the rows I need.  But I am wondering if this could also be done in Alteryx.  However I cannot see how to do this and thought I would ask the community for some advice or suggestions here.

I have an example of the file below.  Since I will get 4-5 flat files a week it would certainly be helpful to automate selecting the portions of interest out of each file without dropping them into a db every time.  And there will be multiple combinations of [result1] and [result2] that will be interesting – certain combinations will indicate a particular action to take for example.

Thanks for looking at this.  Any hints or suggestions would be appreciated,

Tim

timestamp

unit

group_id

Activity

result1

result2

other fields …. (60)

hh:mm:00

u1

gid1

act_1_start

[Null]

[Null]

[Null]

hh:mm:10

u1

gid1

act_1

[Null]

[Null]

[Null] or data

hh:mm:20

u1

gid1

act_1

[Null]

[Null]

[Null] or data

hh:mm:30

u1

gid1

act_1

[Null]

[Null]

[Null] or data

hh:mm:40

u1

gid1

act_1

[Null]

[Null]

[Null] or data

hh:mm:50

u1

gid1

act_1

[Null]

[Null]

[Null] or data

hh:mm:52

u1

gid1

act_1_end

val1

val2

[Null] or data

hh:mm:60

u1

gid1

[Null]

[Null]

[Null]

hh:mm:ss

u1

gid1

..

hh:m2:00

u1

gid1

act_2_start

[Null]

[Null]

[Null]

hh:m2:10

u1

gid1

act_2

[Null]

[Null]

[Null] or data

hh:m2:20

u1

gid1

act_2

[Null]

[Null]

[Null] or data

hh:m2:38

u1

gid1

act_2_end

val1

val4

[Null] or data

hh:m2:40

u1

gid1

[Null]

[Null]

hh:mm:ss

u1

hh:m3:00

u1

gid2

act_1_start

[Null]

[Null]

[Null]

 

7 REPLIES 7
michael_treadwell
ACE Emeritus
ACE Emeritus

Unless I misunderstand your issue, it sunds like all you need is a Filter tool to view the rows of interest

 

[activity] == 'act_nn_end'
AND
[result1] == 'val1'
AND
[result2] == 'val2'

 

Capture.PNG

 

In this example, the rows of interest would come from the T output of the Filter tool.

 

You can connect any file to this tool using the Input tool, then filter to see the relevant records.

jdunkerley79
ACE Emeritus
ACE Emeritus

you'll need to use a REGEX_Match for act_nn_end

 

Something like:

REGEX_Match([activity], 'act_\d+_end')

Otherwise exactly what I was about to post

michael_treadwell
ACE Emeritus
ACE Emeritus

Absolutely, I wasn't looking carefully at the data format.

 

Here is a combined version of my post and @jdunkerley79

TimBoeve
5 - Atom

Oops.  Jumped too soon.  This only selects the last record of the group.  What I want to do is get the other records for that group - including the _start and _end and the variable number of records between them.  Thanks.

michael_treadwell
ACE Emeritus
ACE Emeritus

Try this:

 

It parses the digits out of [activity] and then when it finds act_nn_end that meets the criteria it joins to pull all the data from that series

jdunkerley79
ACE Emeritus
ACE Emeritus

Try using a regex tool to match the id of the act section

Then filter then summarise

 

Attached a demo

TimBoeve
5 - Atom

This is very close.  I need to scope the JOIN to only the current [group_id] and then I think it will work.  I will try prepending the [group_id] in the parse tool.  thanks!   

Labels