General Discussions

Discuss any topics that are not product-specific here.

How to standardize excel files with different schemas

mahendra0393
6 - Meteoroid

Hi All,

 

I need help with below data sets. I want to  standardize various clients file. I need only 3 columns(Unique Number, Amount and Date) to upload client data in application.

Step1: I'm downloading all the files based on input emaildomain and subject from email. This is working fine for me.

 

Step2: I want to extract 3 columns (Unique Number, Amount and Date) and save the file in same folder. Since each client has different schema. How can I create macro which will process all the downloaded files and create standard upload format file for each client. Sample data set given below. Sometimes I get extra rows which contain totals and some calculation in bottom. I want to remove such extra rows. 

Note: Unique Number, Amount and Date mandatory column names changes for each clients.

 

Input to python block for email download:

ClientNameEmailDomainSubjectPasswordAwardDateContributionDateMonthDownloadPath
Client A@microsoft.comContributionsabc2326-Jun-202326-Jun-2023MayC:\\Users\\abc\\OneDrive - Link Group\\Automation\\
Client B@alteryx.comClient B Purchase!abc2226-Jun-202326-Jun-2023MayC:\\Users\\abc\\OneDrive - Link Group\\Automation\\

 

Data Set: Client A

 

Unique NumberSurnameForenameEmployee NumberContribution AmountDeduction Date
AA1233333KyleM11030/11/2022
BA3466666PatelS21530/11/2022
JG6789654SinghMike54030/11/2022

 

Data Set : Client B

EmployeeEmployee Summary: Unique NumberFirst nameSurnamePayroll Number AmountContribution Date
A B WillAV321211AWill1152023-01-31 00:00:00
A J BrownAY322556BrownA2302023-01-31 00:00:00
A K CarrBB166534ACarr3502023-01-31 00:00:00

 

 

 

5 REPLIES 5
caltang
17 - Castor
17 - Castor

Hi @mahendra0393

 

The naming convention of your fields between files is rather drastic. Just want to check, in every file you receive, how many fields (columns) are there in general? 

Are the names between files consistent in any manner? For example, if Amount is what you want, the Amount fields in each file will have the word “Amount” in them. 

For now, you can use a Batch Macro to load them, and configure for name. This will net you all the files, but will provide you with extra columns. 

Once you have the new columns, utilise the Transpose and Cross-Tab tools accordingly to get all your field names in rows, then, use a Filter tool or Regex tool to get the field names that has the words of your interest, e.g Amount, Unique Number, and Date.

 

Then from there, you should be able to pivot your data back to normal to get your fields + data of interest dynamically. 

Sorry for this high level solution, I’m currently going to bed soon - hope this sparks some ideas!

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
caltang
17 - Castor
17 - Castor

This is what I mean:

Screenshot 2023-06-28 112741.png

 

I've made the macro  for you and got your output you desire.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
mahendra0393
6 - Meteoroid

Hi @caltang 

 

Thanks for the quick solution but I don't want to combine the client data. So we process each client data separately. In step 1, all the client raw files go to respective client folder.

In step 2, I want to clean the individual client files and save the clean file with 3 mandatory headers in same folder where I have downloaded the file in step 1. So that later same file can be uploaded to our internal tool to process the client data.

 

Below are the answer for your questions:

 

The naming convention of your fields between files is rather drastic. Just want to check, in every file you receive, how many fields (columns) are there in general? 
No of fields in excel not fixed. It's keep changing for each client. In general, I would say 7-12 columns

Are the names between files consistent in any manner? For example, if Amount is what you want, the Amount fields in each file will have the word “Amount” in them. 

Yes most of the time I have seen files are consistent with 3 column name which I need Unique Number, amount and date. These 3 key words are present in the columns 

caltang
17 - Castor
17 - Castor

In that case, you can build upon my solution with an Output tool that outputs the file of your choice, and to your location of choice. 

 

In the Output Tool, kindly follow the example here: https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Record-ID-for-multiple-Output-...

 

You should be able to do it yourself.

 

Kindly mark my solution as accepted to help others who may or may not have the same issue you have.

 

Thanks,

Calvin

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
caltang
17 - Castor
17 - Castor

Hi @mahendra0393 - have you tried the solution provided above with the link given? If your query has been resolved, kindly close this post by marking the solution as accepted so that it may help others who have the same problem as you.

 

-Cal

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
Labels