Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Python For Loop, Trying to replicate functionality with Alteryx

wonka1234
10 - Fireball

Hi all,

 

I am trying to replicate the following functionality with alteryx:

 

column_list = pd.DataFrame(['Col1','Col2','Col3','Col4', 'Col5',
'Col6','Col7','Col8','Col9','Col10',
'Col11','Col12'])


final_hr = pd.DataFrame()

 

for column in range(len(column_list)):
            hr_new=hr.copy()

          #Drop rows containing NAN for column 'col4' for new merge
          hr_new.dropna(subset=[column_list.iloc[column,0]], inplace = True)

          #Creating a new column for merge
           hr_new['ID_final']=hr_new[column_list.iloc[column,0]]

           #case folding
           hr_new['ID_final']=hr_new['ID_final'].str.strip().str.upper()

           #Merge data
          merged_data = pd.merge(hr_new, data, how='left', left_on='ID_final', right_on ='OtherID')
           #Concatinating all data together
            final_hr = final_hr.append(merged_data)

 

HR_attestation = final_hr.drop_duplicates()

 

my thinking on the logic: For each iteration of the loop, it's taking one of the user ID fields(Col1 to Col 12) and writing that to the ID_final field. And then it tries to use that to join to "data". Then appends everything to final_hr dataframe.

 

I am having trouble replicating this in alteryx, what I have tried is:

 

1) joins between "hr_new" and "data" , joining OtherID to each of the Col 1 to Col 12.

2) Taking the inner join of each and unioning them together

3) taking "Hr_new" and joining to the union  joining on record id.

4) a second union taking the left and inner join of this.

 
 

sample.jpg

 

any help would be appreciated. 

These joins/unions is skipping out on some records so I am wondering if my logic is correct.

2 REPLIES 2
Aguisande
15 - Aurora
15 - Aurora

Hi @wonka1234 

I'm not very clear about what you need to accomplish, Maybe a further explanation would help, but I think the process is simpler (thinking in Alteryx not in coding), for what I understand:

 

column_list = pd.DataFrame(['Col1','Col2','Col3','Col4', 'Col5','Col6','Col7','Col8','Col9','Col10','Col11','Col12'])

- This is a Select Tool, where you select which columns of the Dataset you want to keep, which to discard, and you can rename them and change types.


final_hr = pd.DataFrame()

 

for column in range(len(column_list)): -> You don't need this, alteryx proceses your datastream line by line.
            hr_new=hr.copy()

          #Drop rows containing NAN for column 'col4' for new merge
          hr_new.dropna(subset=[column_list.iloc[column,0]], inplace = True)

- This part sounds like a filter tool with the expression:  [Col4]!='NAN'

 

          #Creating a new column for merge
           hr_new['ID_final']=hr_new[column_list.iloc[column,0]]

- This is a Formula Tool

 

           #case folding
           hr_new['ID_final']=hr_new['ID_final'].str.strip().str.upper()

- Anther formula tool (or a Cleanse Tool)

           #Merge data
          merged_data = pd.merge(hr_new, data, how='left', left_on='ID_final', right_on ='OtherID')

- This is a Join on [ID_final] and [OtherID]


           #Concatinating all data together
            final_hr = final_hr.append(merged_data)

 

HR_attestation = final_hr.drop_duplicates()

- This can be resolved with a Unique Tool or a Summarize

 

 

warrensacko
5 - Atom

First consider if you really need to iterate over rows in a DataFrame. Iterating through pandas dataFrame objects is generally slow. Pandas Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a List Comprehensions , vectorized solution or DataFrame.apply() method for iterate through DataFrame.

 

Pandas DataFrame loop using list comprehension

 

result = [(x, y,z) for x, y,z in zip(df['Name'], df['Promoted'],df['Grade'])]

 

Pandas DataFrame loop using DataFrame.apply()

 

result = df.apply(lambda row: row["Name"] + " , " + str(row["TotalMarks"]) + " , " + row["Grade"], axis = 1)

 

Labels