Cleaning up Excel input with Nickname in Quotes
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi. I have to ingest an Excel file where the send insists on adding quotations around a manager's name. This causes join issues when summarizing later.
I have this:
IF ([MANAGER ] = 'John “Jack” Jones' THEN 'John "Jack" Jones' ELSE [MANAGER ] ENDIF
Note that the nickname of Jack coming in from the Excel file has the inverted quotation marks. Whereas Alteryx is a different format. I tried different suggestions searching around here, but nothing comes close to solving. Any suggestions?
Solved! Go to Solution.
- Labels:
- Workflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Here are two examples of regex expressions which could be helpful. The first will identify where a nickname exists and the second accomplishes the removal of the nickname.
Side note: your if statement above identifies a string, then returns it without any change and returns all other non-matching strings without changes. So, I'm unsure what your expected output is. If my answer is off, then we can move the capturing groups (the parenthesis) in the regex replace expression to get the right answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@AGilbert This is pretty cool and helpful. I also had an epiphany after posting my question - I have a database table I pull each month and the manager Jack is listed with clean quotes around his middle name. The Excel file we get each month updating all the newly promoted managers is manually managed and we use that to check for changes. Every month, the names with nicknames or alias in quotes causes us fits. So, I was trying to just hard code individual names and that is why I asked the community. However, I realized, it would be easier to clean the database name like this:
IF [MANAGER] = 'John “Jack” Jones' THEN 'John "Jack" Jones' ELSE [MANAGER] ENDIF
And for the Excel input, I cleaned up the name like this:
IF Contains([MANAGER ], 'Jack') THEN 'John "Jack" Jones' ELSE [MANAGER ] ENDIF
Using contains made it easier to work with the Excel input and now both Manager name outputs are the same and joins perfectly (I wish they would use employee IDs). This only is the solution if we only have one Jack in our manager pool. So for now, we have to work with this until there is another Jack joining the company.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
What happens when you have a Jackson or Jackie in the excel input? 😬
Preprocessing (cleaning) data is a good strategy, but you have to anticipate some edge cases which may not exist yet. Also, I would advise against hardcoding values in an expression. That is going to become very tedious to keep updated. If you share a workflow we can get something more robust in place.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@AGilbert - 100% agree and I thought of that and right now we have no Jackie or Jackson in the management team. I have asked them to let me create a new column for Aliases or nicknames, but no one is budging. Meanwhile when I run the monthly report, I have to reconcile the issues of mismatches because the quotes around the nickname are not the same format as the quotes around the Excel file nicknames.
Here is a mock up of the challenge. Somehow, the font the person sending me the Excel file forces those upside down quotes so it comes over weird. I took your advice about the chances of a Jackie or Jackson and decided to just replace each side of the quotes. The attached workflow seems to do the trick, but I would feel better addressing any chance of any version of quotes can be cleaned up and use the Alteryx version.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I think that replacing the smart/curly quotes with straight ones would work just fine. That seems to be the only difference in the two formats. Just wrap this up as a formatting macro and have it available each time you have to use data from that source.
