Parsing out Special Characters
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
We have a large report that has several different special characters that is causing our data files to fail when importing into our system.
Is there a way within Alteryx to find and remove these special characters that we are not aware of. There are MANY different types that we are finding. We have tried using the data cleansing tool; however, this is not catching everything. Example of one character we are seeing is :Â.
- Labels:
- Data Investigation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello @baileyblough ,
How about doing this with a RegEx_replace formula in a formula tool.
You can try this formula for what you are looking for:
Regex_replace([Field1],"([^a-z])","")
-> It looks for everything that is not a normal character and removes it.
If you want to keep numbers, but only remove special characters you can change the expression to
Regex_replace(Field1,"([^a-z|^\d])","")
I have tried it with a lot of different samples this far and do not find anything that does not work.
//Best Regards
Elias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @baileyblough ,
Another option there is to use DecomposeUnicodeForMatch(String) function, but in this case, it will transform all to lower cases, which you can treat with Title case funcion for example.
Best,
Fernando Vizcaino
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Nice @fmvizcaino , DecomposeUnicodeForMatch([Field1]) works well for this scenario.
Thanks for sharing @fmvizcaino. Got to learn something new 🙂.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Is there a way to use this expression but also retain a dash character?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
As @fmvizcaino suggested DecomposeUnicodeForMatch([Field1]) keeps the "-". I feel there is less data loss with this approach. But again it depends on your requirement. This will normalize and not remove it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@atcodedog05 , this is the most incredible thing from our community! 😁
I actually have learned more here in the community than in my actual job
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@fmvizcaino wrote:
@atcodedog05 , this is the most incredible thing from our community! 😁
I actually have learned more here in the community than in my actual job
I totally agree and can vouch for that @fmvizcaino 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@fmvizcaino & @atcodedog05 - I love this exchange! 🙂
I am constantly stealing with pride all these wonderful little snippets of workflows from this community. I've learned more through the weekly challenges and attempting to provide solutions to others' posts than I would've in my day to day job.
