Hi everyone,
Is there a way that can check the positions of columns of two files? e.g.
Data Source A | Data Source B |
First Name | First Name |
Last Name | Last Name |
Address | Phone |
Phone | Address |
I want it to display it side by side or tell me that the column names position doesnt match for the two files.
Another question, is there a way to check the format of all fields in a column ? e.g.
Date |
25/OCT/16 |
17/SEP/16 |
09/APR/15 |
03/JAN/16 |
20/MARCH/16 |
29/APR/2016 |
I want to make sure all date have the same format (DD/MMM/YY).
Note:
I am doing testing/ comparing 2 files (both files are csv) to see if they are matching and what isnt matching so I dont need to correct or manipulate the data. I just want to compare them.
Thank you so much :)
Solved! Go to Solution.
For your first question, I'd use the Field Info tool (which provides the metadata for fields within a file in order of occurrance). If you put a Field Info tool for both files and then Join "by position", and then use a Formula Tool to check equality of the two Name columns.
For your second question, it's difficult to provide a solution without knowing the data and purpose, but strictly for your specific example, you could put the expression...
REGEX_Match([Date], '\d\d/[[:alpha:]]{3}/\d\d')
...in a Formula tool and it will test for the DD/MMM/YY format.
I was going to respond to this post last night, but took a nap instead. I completely agree with your approach to the first question and was going to respond just the same way.
For the second problem, I was going to take a different approach (not a regex solution) and use find & replace (case insensitive). I was going to create a 2 column table of FROM & TO values where All letters A through Z would be listed in the From column and they would convert to a value of Z in the TO column. Then add the numbers 0 through 9 all converting to 9.
With an input of 25/OCT/2016 you get: 99/ZZZ/9999
Now you can take the data through a summarize with a group by the field and see what variations exist.
If you replace each letter (upper and lowercase) with '\l' for lowercase and '\u' for uppercase and each number with '\d' you would get:
\d\d/\u\u\u/\d\d\d\d
That could be used as the basis for a regular expression. The summarize would tell you each pattern to look for.
Cheers,
Mark
Thanks @RodL and @MarqueeCrew for the reply. To give more info about the data. I will be comparing two csv files that contain 13 millions records each. Hence, I will need to make my workflow as efficient and short as possible as I have many test cases in one workflow.
My questions are two of my test cases. And for the date format, I will need to ensure the date DD/MMM/YY format as all data are string in both csv.
Thanks