Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

compare rows and create labels

hamzam
8 - Asteroid

Hi!

 

I am looking to build a workflow with:

 

1. If the 'Speed' column has 4 consecutive 0's and the difference between the first 0 and last 0 is at least 3 minutes

      a. then create a group containing records from the start index to the end index (before the first 0).

      b. delete those rows that have speed = 0 above, and point the new start index to the next row after this deletion

 

I have the following python code, but I am having trouble vectorizing it -- since the data is 1Million+ rows, it is taking too long to run. I am wondering if there is a better way to do this; I tried this using the multi-row formula in alteryx but was having issues with that. I appreciate your help.

 

 

import tqdm
start_idx = 0
shift = 0
df['Group'] = np.nan

for idx in tqdm.tqdm(df.index):
    if df['Group'][idx] == 0 and df['Group'][idx+1] == 0 and df['Group'][idx + 2] == 0:
        if df['timediff'][idx] + df['timediff'][idx+1] + df['timediff'][idx+2] >= 3:
            df['Group'][start_idx:idx-1] = shift
            shift+=1
            start_idx = idx + 3
            df.drop([idx, idx +1, idx+2])

 

 

 

Sample dataset attached

12 REPLIES 12
Greg_Murray
12 - Quasar

@hamzam,

 

I am not sure I completely understand A and B, but I think the logic in the attached workflow should handle the first set of qualifications. What should the output look like after A and B are complete?

 

 

Greg_Murray_0-1597693748864.png

 

hamzam
8 - Asteroid

Nice!

 

Sorry I was not clear enough -- I'd like the original rows to remain intact. I think you removed the other rows where speed != 0. Each of the current Groups contains 4 records, but this is not necessarily the case.

 

There should be a new group created once there are 4 consecutive 0's in the speed column. ( we can assign these rows containing the 4 consecutive 0's the previous group's value, and forget about what I said in B.)

Please let me know if this makes sense.

Thanks again.

ImadZidan
12 - Quasar

Hello @hamzam ,

 

based on your description and the code 

 

3 consequent records, 

if they have a speed of 0 then

   check timediff

      if >= 3 , then 

         this is a match , mark and execlude from the next check

if the record is not a match leave as is.

 

have a look at the workflow, it will give you an idea. there is no deletion but exclusion. At the end, you will have a filter with the matched records and unmatched. you can then do what you desire with them.

 

I hope you find it helpful or at least it will give you an idea.

 

 

 

ImadZidan
12 - Quasar

Hello @hamzam ,

 

Apologies, I read your reply after I posted. However, we are still aligned. 

 

Is it possible to provide some pseudo code of the exact logic. like the one in the first reply may be.

 

it is a nice and cool challenge. 🙂

hamzam
8 - Asteroid

Hi @ImadZidan ,

 

I have this updated code snippet:

 

import tqdm #just to see the progress of the for loop
start_idx = 0
group = 0 #groups start from 0
df['Group'] = np.nan

for idx in tqdm.tqdm(df.index):
    if df['Group'][idx] == 0 and df['Group'][idx+1] == 0 and df['Group'][idx + 2] == 0 and df['Group'][idx+3] == 0:
        if df['timediff'][idx] + df['timediff'][idx+1] + df['timediff'][idx+2] + df['Group'][idx+3] >= 4:
            df['Group'][start_idx:idx+3] = shift
            group+=1
            start_idx = idx + 4 # start index is the one after the 4th zero value
            #df.drop([idx, idx +1, idx+2])# no longer needed
ImadZidan
12 - Quasar

Hello @hamzam ,

 

I see the logic now.

Have a look at the attached. Hopefully it looks better.

 

 

hamzam
8 - Asteroid

Hi @ImadZidan 

 

Thanks for your help. But, as I had mentioned to @Greg_Murray , I'd like the original rows to remain intact. I think you removed the other rows, the ones that are not "hit".

 

Each row in the original dataset should be assigned a group  -- the group number can start from 0; it will increase to 1, etc. after there are 4 consecutive 0's in speed (and the timediff condition is met)

hamzam
8 - Asteroid

@ImadZidan 

 

I think your first workflow was more aligned with the goal. 

 

It seems like we need to figure a way out to assign like this:

 

Null ---- group 1

Null ---- group 1

Hit ---- group 1

Null -- group 2

...

Null -- group 2

Hit -- group 2

 

 

ImadZidan
12 - Quasar

@hamzam ,

 

Got you. We will do it. 

 

It helps that you fully understand your data which is perfect. makes things very clear

Labels