Alteryx Designer Desktop Discussions

TimBoeve · ‎02-18-2016

I need to regularly process a large flat file (csv) containing a few thousand rows. The data is messy, but generally organized by [unit], [group], and [activity] where each file has up to 10 units, each unit has upwards of 300 groups and each group has 8 or so activities. The number of records (rows) for each activity is highly variable and not known in advance. I am interested in all the rows of any particular [unit] AND [group] AND [activity] WHEN [activity]=”act_nn_end” AND [result1]=”val1” AND [result2]=”val2” (for example the act_2_* rows in the table below)

This could probably done in a database by a SELECT for the condition and retaining the [unit], [group], and [activity] (wildcard the act_nn*) in a result table that I could then JOIN with the input to get the rows I need. But I am wondering if this could also be done in Alteryx. However I cannot see how to do this and thought I would ask the community for some advice or suggestions here.

I have an example of the file below. Since I will get 4-5 flat files a week it would certainly be helpful to automate selecting the portions of interest out of each file without dropping them into a db every time. And there will be multiple combinations of [result1] and [result2] that will be interesting – certain combinations will indicate a particular action to take for example.

Thanks for looking at this. Any hints or suggestions would be appreciated,

Tim

timestamp	unit	group_id	Activity	result1	result2	other fields …. (60)
hh:mm:00	u1	gid1	act_1_start	[Null]	[Null]	[Null]
hh:mm:10	u1	gid1	act_1	[Null]	[Null]	[Null] or data
hh:mm:20	u1	gid1	act_1	[Null]	[Null]	[Null] or data
hh:mm:30	u1	gid1	act_1	[Null]	[Null]	[Null] or data
hh:mm:40	u1	gid1	act_1	[Null]	[Null]	[Null] or data
hh:mm:50	u1	gid1	act_1	[Null]	[Null]	[Null] or data
hh:mm:52	u1	gid1	act_1_end	val1	val2	[Null] or data
hh:mm:60	u1	gid1	[Null]	[Null]	[Null]	…
hh:mm:ss	u1	gid1	..	…	…	…
hh:m2:00	u1	gid1	act_2_start	[Null]	[Null]	[Null]
hh:m2:10	u1	gid1	act_2	[Null]	[Null]	[Null] or data
hh:m2:20	u1	gid1	act_2	[Null]	[Null]	[Null] or data
hh:m2:38	u1	gid1	act_2_end	val1	val4	[Null] or data
hh:m2:40	u1	gid1	…	[Null]	[Null]	…
hh:mm:ss	u1	…	…	…	…	…
hh:m3:00	u1	gid2	act_1_start	[Null]	[Null]	[Null]

michael_treadwell · ‎02-18-2016

Unless I misunderstand your issue, it sunds like all you need is a Filter tool to view the rows of interest

[activity] == 'act_nn_end'
AND
[result1] == 'val1'
AND
[result2] == 'val2'

In this example, the rows of interest would come from the T output of the Filter tool.

You can connect any file to this tool using the Input tool, then filter to see the relevant records.

jdunkerley79 · ‎02-18-2016

you'll need to use a REGEX_Match for act_nn_end

Something like:

REGEX_Match([activity], 'act_\d+_end')

Otherwise exactly what I was about to post

michael_treadwell · ‎02-18-2016

Absolutely, I wasn't looking carefully at the data format.

Here is a combined version of my post and @jdunkerley79

TimBoeve · ‎02-18-2016

Oops. Jumped too soon. This only selects the last record of the group. What I want to do is get the other records for that group - including the _start and _end and the variable number of records between them. Thanks.

michael_treadwell · ‎02-18-2016

Try this:

It parses the digits out of [activity] and then when it finds act_nn_end that meets the criteria it joins to pull all the data from that series

jdunkerley79 · ‎02-18-2016

Try using a regex tool to match the id of the act section

Then filter then summarise

Attached a demo

TimBoeve · ‎02-18-2016

This is very close. I need to scope the JOIN to only the current [group_id] and then I think it will work. I will try prepending the [group_id] in the parse tool. thanks!

Alteryx Designer Desktop Discussions

Select groups of records based on conditional test of fields in one record in each group