I'm trying to accomplish three things. I'm not very familiar with regular expression, and I continue to run into errors. I have a data table looks something like this:
CODE | ID | First Name | Last Name | |
E202201 | 20220101 | Jason | Sullivan | jsullivan@gmail.com |
E202201 | 20220102 | Joseph | Raven | jrs@HOTMAIL.com |
E202201 | 20220103 | Callan | McDosky | |
E202201 | 20220104 | George | Lee | z-li@school.edu |
(all data presented are fictional)
1) I'm trying to validate if the email addresses are valid. I want to add a column that includes flags indicating whether the address is empty "Not Applicable", valid "Valid" or invalid ("Not Valid").
2) After that I want to output this data to an Excel workbook that should be created at output. I want the workbook's name to be "report-for-CODE.xlsx". In this example, it should be "report-for-E202201.xlsx".
3) I will connect to different data tables stored in Excel workbooks, and I want all the output files to be stored in the same directory.
How do I use a Formula tool to achieve the first two goals?
Solved! Go to Solution.
Hi @45179902
Here is how you can do it.
1. Validate email: You can use the formula with regex_match like below to check email. If an email address is format abc@xyz.com then its valid or else its not.
IF IsEmpty(Trim([Email]," ")) THEN "Not Applicable"
ELSEIF REGEX_Match([Email], "[\w.]+@\u+\.\u+") THEN "Valid"
ELSE "Not Valid" ENDIF
2. Output each code into a separate file: Create a formula like below creating the filename
"report-for-"+[CODE]+".xlsx|Sheet1"
Configure your output tool like below to output each code to a separate file.
Hope this helps : )
Hi @atcodedog05
Thank you for your quick response with thumb up, solutions, and screenshot! Since I'm not familiar with regular expression, I would appreciate it very much if you can break down your regular expression in the solution to explain what each part means!
Also, I noticed that in your Alteryx Designer screenshot, "z-li@school.edu" a valid email address, is rated invalid in your solution. I believe this is because of the regular expression that you're using, but unfortunately, I lack the knowledge to adjust it.
How to edit the solution (regex) so that all valid addresses are detected correctly? For example, email addresses with subdomains (abc@careers.xyz.com) or country codes (abc@commerce.gov.cn), etc.
Regards,
George
Hi @45179902
I have adjusted the solution. Let me know if any more changes are required.
Workflow:
This article might help you get an better understanding of the Regex https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Tool-Mastery-RegEx/ta-p/37689
Hope this helps : )
@atcodedog05 Thank you for the update and resource! I will check that out. I have another question regarding filename. I just realised that some of the data tables don't have the CODE in a few rows. One of the tables doesn't have a CODE in the first row. The CODE should be the same across the entire table, so CODE value from other rows that have one should work as well.
Does your solution feed only the first row's filename column value into the Output Data tool? If so, how do I adjust your solution if the first row doesn't have CODE, and I need to use CODE from another row? The output file should include only one sheet that includes the email validation result of the entire table.
Hi @45179902
All rows should have the code it groups based on the code and creates a file. If you are saying first always has code and next rows may not. You can fill down code using multi-row formula tool like below.
Input:
CODE | ID | First Name | Last Name | |
E202201 | 20220101 | Jason | Sullivan | jsullivan@gmail.com |
E202201 | 20220102 | Joseph | Raven | jrs@HOTMAIL.com |
20220103 | Callan | McDosky | ||
20220104 | George | Lee | z-li@school.edu | |
E202201 | 20220105 | George | Lee | abc@xyz |
E202202 | 20220101 | Jason | Sullivan | abc@careers.xyz.com |
E202202 | 20220102 | Joseph | Raven | abc@commerce.gov.cn |
Output:
Hope this helps : )
@atcodedog05 Hi, Thank you for your continued support! I understand that there is a multi-row formula that can copy the CODE all the way up or down to the last row of the data table.
The thing is that some of the rows are leaved intentionally blank and should not have a CODE or any value. If I fill down the CODE across the rows, does that mean my output result will be changed?
I want to leave those blank rows as they are in the output file, but I want the filename to be report-for-CODE.xlsx regardless of whether or not there's a CODE in the first row.
My priority is not to help the people who hand over me the data files to fix those, but I want to leave everything untouched and have email validation reports available.
Happy to help : ) @45179902
Happy new year to you and your close ones : ) May it be filled with happiness and prosperity =D
Happy 2022 ~