Alteryx Designer Desktop Discussions

hamzam · ‎08-17-2020

Hi!

I am looking to build a workflow with:

1. If the 'Speed' column has 4 consecutive 0's and the difference between the first 0 and last 0 is at least 3 minutes

a. then create a group containing records from the start index to the end index (before the first 0).

b. delete those rows that have speed = 0 above, and point the new start index to the next row after this deletion

I have the following python code, but I am having trouble vectorizing it -- since the data is 1Million+ rows, it is taking too long to run. I am wondering if there is a better way to do this; I tried this using the multi-row formula in alteryx but was having issues with that. I appreciate your help.

import tqdm
start_idx = 0
shift = 0
df['Group'] = np.nan

for idx in tqdm.tqdm(df.index):
    if df['Group'][idx] == 0 and df['Group'][idx+1] == 0 and df['Group'][idx + 2] == 0:
        if df['timediff'][idx] + df['timediff'][idx+1] + df['timediff'][idx+2] >= 3:
            df['Group'][start_idx:idx-1] = shift
            shift+=1
            start_idx = idx + 3
            df.drop([idx, idx +1, idx+2])

Sample dataset attached

Greg_Murray · ‎08-17-2020

@hamzam,

I am not sure I completely understand A and B, but I think the logic in the attached workflow should handle the first set of qualifications. What should the output look like after A and B are complete?

hamzam · ‎08-17-2020

Nice!

Sorry I was not clear enough -- I'd like the original rows to remain intact. I think you removed the other rows where speed != 0. Each of the current Groups contains 4 records, but this is not necessarily the case.

There should be a new group created once there are 4 consecutive 0's in the speed column. ( we can assign these rows containing the 4 consecutive 0's the previous group's value, and forget about what I said in B.)

Please let me know if this makes sense.

Thanks again.

ImadZidan · ‎08-17-2020

Hello @hamzam ,

based on your description and the code

3 consequent records,

if they have a speed of 0 then

check timediff

if >= 3 , then

this is a match , mark and execlude from the next check

if the record is not a match leave as is.

have a look at the workflow, it will give you an idea. there is no deletion but exclusion. At the end, you will have a filter with the matched records and unmatched. you can then do what you desire with them.

I hope you find it helpful or at least it will give you an idea.

ImadZidan · ‎08-17-2020

Hello @hamzam ,

Apologies, I read your reply after I posted. However, we are still aligned.

Is it possible to provide some pseudo code of the exact logic. like the one in the first reply may be.

it is a nice and cool challenge. 🙂

hamzam · ‎08-18-2020

Hi @ImadZidan ,

I have this updated code snippet:

import tqdm #just to see the progress of the for loop
start_idx = 0
group = 0 #groups start from 0
df['Group'] = np.nan

for idx in tqdm.tqdm(df.index):
    if df['Group'][idx] == 0 and df['Group'][idx+1] == 0 and df['Group'][idx + 2] == 0 and df['Group'][idx+3] == 0:
        if df['timediff'][idx] + df['timediff'][idx+1] + df['timediff'][idx+2] + df['Group'][idx+3] >= 4:
            df['Group'][start_idx:idx+3] = shift
            group+=1
            start_idx = idx + 4 # start index is the one after the 4th zero value
            #df.drop([idx, idx +1, idx+2])# no longer needed

ImadZidan · ‎08-18-2020

Hello @hamzam ,

I see the logic now.

Have a look at the attached. Hopefully it looks better.

hamzam · ‎08-18-2020

Hi @ImadZidan

Thanks for your help. But, as I had mentioned to @Greg_Murray , I'd like the original rows to remain intact. I think you removed the other rows, the ones that are not "hit".

Each row in the original dataset should be assigned a group -- the group number can start from 0; it will increase to 1, etc. after there are 4 consecutive 0's in speed (and the timediff condition is met)

hamzam · ‎08-18-2020

@ImadZidan

I think your first workflow was more aligned with the goal.

It seems like we need to figure a way out to assign like this:

Null ---- group 1

Hit ---- group 1

Null -- group 2

...

Null -- group 2

Hit -- group 2

ImadZidan · ‎08-18-2020

@hamzam ,

Got you. We will do it.

It helps that you fully understand your data which is perfect. makes things very clear

Alteryx Designer Desktop Discussions

compare rows and create labels