Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

compare rows and create labels

hamzam
8 - Asteroid

Hi!

 

I am looking to build a workflow with:

 

1. If the 'Speed' column has 4 consecutive 0's and the difference between the first 0 and last 0 is at least 3 minutes

      a. then create a group containing records from the start index to the end index (before the first 0).

      b. delete those rows that have speed = 0 above, and point the new start index to the next row after this deletion

 

I have the following python code, but I am having trouble vectorizing it -- since the data is 1Million+ rows, it is taking too long to run. I am wondering if there is a better way to do this; I tried this using the multi-row formula in alteryx but was having issues with that. I appreciate your help.

 

 

import tqdm
start_idx = 0
shift = 0
df['Group'] = np.nan

for idx in tqdm.tqdm(df.index):
    if df['Group'][idx] == 0 and df['Group'][idx+1] == 0 and df['Group'][idx + 2] == 0:
        if df['timediff'][idx] + df['timediff'][idx+1] + df['timediff'][idx+2] >= 3:
            df['Group'][start_idx:idx-1] = shift
            shift+=1
            start_idx = idx + 3
            df.drop([idx, idx +1, idx+2])

 

 

 

Sample dataset attached

12 REPLIES 12
Greg_Murray
12 - Quasar

@hamzam,

 

I am not sure I completely understand A and B, but I think the logic in the attached workflow should handle the first set of qualifications. What should the output look like after A and B are complete?

 

 

Greg_Murray_0-1597693748864.png

 

hamzam
8 - Asteroid

Nice!

 

Sorry I was not clear enough -- I'd like the original rows to remain intact. I think you removed the other rows where speed != 0. Each of the current Groups contains 4 records, but this is not necessarily the case.

 

There should be a new group created once there are 4 consecutive 0's in the speed column. ( we can assign these rows containing the 4 consecutive 0's the previous group's value, and forget about what I said in B.)

Please let me know if this makes sense.

Thanks again.

ImadZidan
12 - Quasar

Hello @hamzam ,

 

based on your description and the code 

 

3 consequent records, 

if they have a speed of 0 then

   check timediff

      if >= 3 , then 

         this is a match , mark and execlude from the next check

if the record is not a match leave as is.

 

have a look at the workflow, it will give you an idea. there is no deletion but exclusion. At the end, you will have a filter with the matched records and unmatched. you can then do what you desire with them.

 

I hope you find it helpful or at least it will give you an idea.

 

 

 

ImadZidan
12 - Quasar

Hello @hamzam ,

 

Apologies, I read your reply after I posted. However, we are still aligned. 

 

Is it possible to provide some pseudo code of the exact logic. like the one in the first reply may be.

 

it is a nice and cool challenge. 🙂

hamzam
8 - Asteroid

Hi @ImadZidan ,

 

I have this updated code snippet:

 

import tqdm #just to see the progress of the for loop
start_idx = 0
group = 0 #groups start from 0
df['Group'] = np.nan

for idx in tqdm.tqdm(df.index):
    if df['Group'][idx] == 0 and df['Group'][idx+1] == 0 and df['Group'][idx + 2] == 0 and df['Group'][idx+3] == 0:
        if df['timediff'][idx] + df['timediff'][idx+1] + df['timediff'][idx+2] + df['Group'][idx+3] >= 4:
            df['Group'][start_idx:idx+3] = shift
            group+=1
            start_idx = idx + 4 # start index is the one after the 4th zero value
            #df.drop([idx, idx +1, idx+2])# no longer needed
ImadZidan
12 - Quasar

Hello @hamzam ,

 

I see the logic now.

Have a look at the attached. Hopefully it looks better.

 

 

hamzam
8 - Asteroid

Hi @ImadZidan 

 

Thanks for your help. But, as I had mentioned to @Greg_Murray , I'd like the original rows to remain intact. I think you removed the other rows, the ones that are not "hit".

 

Each row in the original dataset should be assigned a group  -- the group number can start from 0; it will increase to 1, etc. after there are 4 consecutive 0's in speed (and the timediff condition is met)

hamzam
8 - Asteroid

@ImadZidan 

 

I think your first workflow was more aligned with the goal. 

 

It seems like we need to figure a way out to assign like this:

 

Null ---- group 1

Null ---- group 1

Hit ---- group 1

Null -- group 2

...

Null -- group 2

Hit -- group 2

 

 

ImadZidan
12 - Quasar

@hamzam ,

 

Got you. We will do it. 

 

It helps that you fully understand your data which is perfect. makes things very clear

Labels