I have two types of files in the same directory that I will use for sending out letters. These two types of files have ELS and ILS in their filenames.
Before sending out letters, I must make sure that there're no duplicate records in these files. I must find out unique values from these files without joining them together because the two types of files have different layouts yet they must go through the same workflow to output letters.
Below are examples of the ELS and ILS files:
ELS_group_1.xlsx
CODE | First_Name | Last_Name | Profession |
10238 | John | White | Teacher |
12049 | Kelly | Kren | Dean |
17253 | Hsiu-wen | Chen | Student |
ILS_group_1.xlsx
CODE | Careers | LastName | FirstName |
39880 | Student | Landsberg | Alan |
10238 | Teacher | White | John |
15042 | Artist | Daniels | Lindsey |
Suppose that before I have these files go through the flows that output letters, I want to ensure that there're no duplicate records in these files, distinguished by CODEs (note 10238 in both files). There could be 20 files in the directory with different numbers of ELS and ILS files. What can I do?
As a "you cannot use Join" challenge - here's a way. Summarize the codes in one list to create a list of codes. Append the list to the other the other list. filter with [ELSE_CODE] in [concat_ILS_code].
You've now found which ELS entries aren't in ILS - but I mean, I'd just use Join.
@apathetichell I guess you can, but how to join so many files together?
@45179902 - You can use "*" or "?" masks in your Input Data tool. The easiest way to achieve that replace filename with ILS_*.xlsx and ELS_*.xlsx and then simply use a Union tool.
Hi @45179902 ,
Please find this logic to get unique records and output.
step 1
step 2
With this output join to the xlsx file to get output.
Simple batch to read xlsx file .
Hope this help you.
oh. interesting. Forgot the 20 file part. union is kind of the way to go. You can union and then summarize/count to see what's repeated. so what i'd do is create a field to signify what the potential workflow is and then do a groupby/count by code and see which are 2 or more.
User | Count |
---|---|
52 | |
27 | |
25 | |
24 | |
21 |