Hi!
I am looking to build a workflow with:
1. If the 'Speed' column has 4 consecutive 0's and the difference between the first 0 and last 0 is at least 3 minutes
a. then create a group containing records from the start index to the end index (before the first 0).
b. delete those rows that have speed = 0 above, and point the new start index to the next row after this deletion
I have the following python code, but I am having trouble vectorizing it -- since the data is 1Million+ rows, it is taking too long to run. I am wondering if there is a better way to do this; I tried this using the multi-row formula in alteryx but was having issues with that. I appreciate your help.
import tqdm
start_idx = 0
shift = 0
df['Group'] = np.nan
for idx in tqdm.tqdm(df.index):
if df['Group'][idx] == 0 and df['Group'][idx+1] == 0 and df['Group'][idx + 2] == 0:
if df['timediff'][idx] + df['timediff'][idx+1] + df['timediff'][idx+2] >= 3:
df['Group'][start_idx:idx-1] = shift
shift+=1
start_idx = idx + 3
df.drop([idx, idx +1, idx+2])
Sample dataset attached
Solved! Go to Solution.
@hamzam ,
did you manage to have a look?
Hi @ImadZidan ,
for some reason, alteryx did not email me a notification.
This is awesome! Great, clean logic. Looks like it does what we wanted it to.
Thank you!!