Let’s talk Alteryx Copilot. Join the live AMA event to connect with the Alteryx team, ask questions, and hear how others are exploring what Copilot can do. Have Copilot questions? Ask here!
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Python For Loop, Trying to replicate functionality with Alteryx

wonka1234
10 - Fireball

Hi all,

 

I am trying to replicate the following functionality with alteryx:

 

column_list = pd.DataFrame(['Col1','Col2','Col3','Col4', 'Col5',
'Col6','Col7','Col8','Col9','Col10',
'Col11','Col12'])


final_hr = pd.DataFrame()

 

for column in range(len(column_list)):
            hr_new=hr.copy()

          #Drop rows containing NAN for column 'col4' for new merge
          hr_new.dropna(subset=[column_list.iloc[column,0]], inplace = True)

          #Creating a new column for merge
           hr_new['ID_final']=hr_new[column_list.iloc[column,0]]

           #case folding
           hr_new['ID_final']=hr_new['ID_final'].str.strip().str.upper()

           #Merge data
          merged_data = pd.merge(hr_new, data, how='left', left_on='ID_final', right_on ='OtherID')
           #Concatinating all data together
            final_hr = final_hr.append(merged_data)

 

HR_attestation = final_hr.drop_duplicates()

 

my thinking on the logic: For each iteration of the loop, it's taking one of the user ID fields(Col1 to Col 12) and writing that to the ID_final field. And then it tries to use that to join to "data". Then appends everything to final_hr dataframe.

 

I am having trouble replicating this in alteryx, what I have tried is:

 

1) joins between "hr_new" and "data" , joining OtherID to each of the Col 1 to Col 12.

2) Taking the inner join of each and unioning them together

3) taking "Hr_new" and joining to the union  joining on record id.

4) a second union taking the left and inner join of this.

 
 

sample.jpg

 

any help would be appreciated. 

These joins/unions is skipping out on some records so I am wondering if my logic is correct.

2 REPLIES 2
Aguisande
15 - Aurora
15 - Aurora

Hi @wonka1234 

I'm not very clear about what you need to accomplish, Maybe a further explanation would help, but I think the process is simpler (thinking in Alteryx not in coding), for what I understand:

 

column_list = pd.DataFrame(['Col1','Col2','Col3','Col4', 'Col5','Col6','Col7','Col8','Col9','Col10','Col11','Col12'])

- This is a Select Tool, where you select which columns of the Dataset you want to keep, which to discard, and you can rename them and change types.


final_hr = pd.DataFrame()

 

for column in range(len(column_list)): -> You don't need this, alteryx proceses your datastream line by line.
            hr_new=hr.copy()

          #Drop rows containing NAN for column 'col4' for new merge
          hr_new.dropna(subset=[column_list.iloc[column,0]], inplace = True)

- This part sounds like a filter tool with the expression:  [Col4]!='NAN'

 

          #Creating a new column for merge
           hr_new['ID_final']=hr_new[column_list.iloc[column,0]]

- This is a Formula Tool

 

           #case folding
           hr_new['ID_final']=hr_new['ID_final'].str.strip().str.upper()

- Anther formula tool (or a Cleanse Tool)

           #Merge data
          merged_data = pd.merge(hr_new, data, how='left', left_on='ID_final', right_on ='OtherID')

- This is a Join on [ID_final] and [OtherID]


           #Concatinating all data together
            final_hr = final_hr.append(merged_data)

 

HR_attestation = final_hr.drop_duplicates()

- This can be resolved with a Unique Tool or a Summarize

 

 

warrensacko
5 - Atom

First consider if you really need to iterate over rows in a DataFrame. Iterating through pandas dataFrame objects is generally slow. Pandas Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a List Comprehensions , vectorized solution or DataFrame.apply() method for iterate through DataFrame.

 

Pandas DataFrame loop using list comprehension

 

result = [(x, y,z) for x, y,z in zip(df['Name'], df['Promoted'],df['Grade'])]

 

Pandas DataFrame loop using DataFrame.apply()

 

result = df.apply(lambda row: row["Name"] + " , " + str(row["TotalMarks"]) + " , " + row["Grade"], axis = 1)

 

Labels
Top Solution Authors