Alteryx Designer Desktop Discussions

wonka1234 · ‎03-21-2022

Hi all,

I am trying to replicate the following functionality with alteryx:

column_list = pd.DataFrame(['Col1','Col2','Col3','Col4', 'Col5',
'Col6','Col7','Col8','Col9','Col10',
'Col11','Col12'])

final_hr = pd.DataFrame()

for column in range(len(column_list)):
hr_new=hr.copy()

#Drop rows containing NAN for column 'col4' for new merge
hr_new.dropna(subset=[column_list.iloc[column,0]], inplace = True)

#Creating a new column for merge
hr_new['ID_final']=hr_new[column_list.iloc[column,0]]

#case folding
hr_new['ID_final']=hr_new['ID_final'].str.strip().str.upper()

#Merge data
merged_data = pd.merge(hr_new, data, how='left', left_on='ID_final', right_on ='OtherID')
#Concatinating all data together
final_hr = final_hr.append(merged_data)

HR_attestation = final_hr.drop_duplicates()

my thinking on the logic: For each iteration of the loop, it's taking one of the user ID fields(Col1 to Col 12) and writing that to the ID_final field. And then it tries to use that to join to "data". Then appends everything to final_hr dataframe.

I am having trouble replicating this in alteryx, what I have tried is:

1) joins between "hr_new" and "data" , joining OtherID to each of the Col 1 to Col 12.

2) Taking the inner join of each and unioning them together

3) taking "Hr_new" and joining to the union joining on record id.

4) a second union taking the left and inner join of this.

any help would be appreciated.

These joins/unions is skipping out on some records so I am wondering if my logic is correct.

Aguisande · ‎03-24-2022

Hi @wonka1234

I'm not very clear about what you need to accomplish, Maybe a further explanation would help, but I think the process is simpler (thinking in Alteryx not in coding), for what I understand:

column_list = pd.DataFrame(['Col1','Col2','Col3','Col4', 'Col5','Col6','Col7','Col8','Col9','Col10','Col11','Col12'])

- This is a Select Tool, where you select which columns of the Dataset you want to keep, which to discard, and you can rename them and change types.

final_hr = pd.DataFrame()

for column in range(len(column_list)): -> You don't need this, alteryx proceses your datastream line by line.
hr_new=hr.copy()

#Drop rows containing NAN for column 'col4' for new merge
hr_new.dropna(subset=[column_list.iloc[column,0]], inplace = True)

- This part sounds like a filter tool with the expression: [Col4]!='NAN'

#Creating a new column for merge
hr_new['ID_final']=hr_new[column_list.iloc[column,0]]

- This is a Formula Tool

#case folding
hr_new['ID_final']=hr_new['ID_final'].str.strip().str.upper()

- Anther formula tool (or a Cleanse Tool)

#Merge data
merged_data = pd.merge(hr_new, data, how='left', left_on='ID_final', right_on ='OtherID')

- This is a Join on [ID_final] and [OtherID]

#Concatinating all data together
final_hr = final_hr.append(merged_data)

HR_attestation = final_hr.drop_duplicates()

- This can be resolved with a Unique Tool or a Summarize

warrensacko · ‎07-12-2022

First consider if you really need to iterate over rows in a DataFrame. Iterating through pandas dataFrame objects is generally slow. Pandas Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a List Comprehensions , vectorized solution or DataFrame.apply() method for iterate through DataFrame.

Pandas DataFrame loop using list comprehension

result = [(x, y,z) for x, y,z in zip(df['Name'], df['Promoted'],df['Grade'])]

Pandas DataFrame loop using DataFrame.apply()

result = df.apply(lambda row: row["Name"] + " , " + str(row["TotalMarks"]) + " , " + row["Grade"], axis = 1)

Alteryx Designer Desktop Discussions

Python For Loop, Trying to replicate functionality with Alteryx