Hi All,
I need help with below data sets. I want to standardize various clients file. I need only 3 columns(Unique Number, Amount and Date) to upload client data in application.
Step1: I'm downloading all the files based on input emaildomain and subject from email. This is working fine for me.
Step2: I want to extract 3 columns (Unique Number, Amount and Date) and save the file in same folder. Since each client has different schema. How can I create macro which will process all the downloaded files and create standard upload format file for each client. Sample data set given below. Sometimes I get extra rows which contain totals and some calculation in bottom. I want to remove such extra rows.
Note: Unique Number, Amount and Date mandatory column names changes for each clients.
Input to python block for email download:
ClientName | EmailDomain | Subject | Password | AwardDate | ContributionDate | Month | DownloadPath |
Client A | @microsoft.com | Contributions | abc23 | 26-Jun-2023 | 26-Jun-2023 | May | C:\\Users\\abc\\OneDrive - Link Group\\Automation\\ |
Client B | @alteryx.com | Client B Purchase | !abc22 | 26-Jun-2023 | 26-Jun-2023 | May | C:\\Users\\abc\\OneDrive - Link Group\\Automation\\ |
Data Set: Client A
Unique Number | Surname | Forename | Employee Number | Contribution Amount | Deduction Date |
AA1233333 | Kyle | M | 1 | 10 | 30/11/2022 |
BA3466666 | Patel | S | 2 | 15 | 30/11/2022 |
JG6789654 | Singh | Mike | 5 | 40 | 30/11/2022 |
Data Set : Client B
Employee | Employee Summary: Unique Number | First name | Surname | Payroll Number | Amount | Contribution Date |
A B Will | AV321211 | A | Will | 1 | 15 | 2023-01-31 00:00:00 |
A J Brown | AY322556 | Brown | A | 2 | 30 | 2023-01-31 00:00:00 |
A K Carr | BB166534 | A | Carr | 3 | 50 | 2023-01-31 00:00:00 |
Hi @mahendra0393,
The naming convention of your fields between files is rather drastic. Just want to check, in every file you receive, how many fields (columns) are there in general?
Are the names between files consistent in any manner? For example, if Amount is what you want, the Amount fields in each file will have the word “Amount” in them.
For now, you can use a Batch Macro to load them, and configure for name. This will net you all the files, but will provide you with extra columns.
Once you have the new columns, utilise the Transpose and Cross-Tab tools accordingly to get all your field names in rows, then, use a Filter tool or Regex tool to get the field names that has the words of your interest, e.g Amount, Unique Number, and Date.
Then from there, you should be able to pivot your data back to normal to get your fields + data of interest dynamically.
Sorry for this high level solution, I’m currently going to bed soon - hope this sparks some ideas!
Hi @caltang
Thanks for the quick solution but I don't want to combine the client data. So we process each client data separately. In step 1, all the client raw files go to respective client folder.
In step 2, I want to clean the individual client files and save the clean file with 3 mandatory headers in same folder where I have downloaded the file in step 1. So that later same file can be uploaded to our internal tool to process the client data.
Below are the answer for your questions:
The naming convention of your fields between files is rather drastic. Just want to check, in every file you receive, how many fields (columns) are there in general?
No of fields in excel not fixed. It's keep changing for each client. In general, I would say 7-12 columns
Are the names between files consistent in any manner? For example, if Amount is what you want, the Amount fields in each file will have the word “Amount” in them.
Yes most of the time I have seen files are consistent with 3 column name which I need Unique Number, amount and date. These 3 key words are present in the columns
In that case, you can build upon my solution with an Output tool that outputs the file of your choice, and to your location of choice.
In the Output Tool, kindly follow the example here: https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Record-ID-for-multiple-Output-...
You should be able to do it yourself.
Kindly mark my solution as accepted to help others who may or may not have the same issue you have.
Thanks,
Calvin
Hi @mahendra0393 - have you tried the solution provided above with the link given? If your query has been resolved, kindly close this post by marking the solution as accepted so that it may help others who have the same problem as you.
-Cal
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |