Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing CJK Characters to Columns via RegEx or Text to Column

BobSnyder85
8 - Asteroid

Hi All!

 

I am working on a data set that has Chinese, Japanese, Korean characters. 

 

 

Basically I have "안녕하세요" as a value in a column.

 

I need to split this out one character per column. Obviously grabbing each character only once (but if there were multiple letters like in HeLLo it would grab both L's in their own column).

 

There is no spaces in between characters so i can't use that.

 

So far I have tried in regex (.) as a test and it grabbed "any character", I wasn't sure if this was moving the character it grabbed so i did this next test "(.)(.)(.)(.)(.)(.)" (12 year old giggle :P) and that failed, didn't even run the very first instance.

 

Any help or ideas?

 

As a note, this is to try to match different names to each other that are using CJK characters by doing exact match tests on single characters in a variable amount of columns.

 

Thanks in advance!

 

5 REPLIES 5
MarqueeCrew
20 - Arcturus
20 - Arcturus
This is a little help. I am not in front of a computer, but try to use this expression as part of the regex expression.

(?:(.)(?!.*?\1))
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
BobSnyder85
8 - Asteroid

Hi Marquee,

 

Thanks for the response.

 

I ran your RegEx and the result was the first character in my name column was retrieved into a new column, but it didn't grab the subsequent characters into their own columns as well. 

 

I think i am going to look into String functions to get this one solved I believe.

MarqueeCrew
20 - Arcturus
20 - Arcturus

@BobSnyder85,

 

Last night I was focused on the finding of duplicate CJK characters.  Here is a workflow that will parse each character to a row. 

 

Capture.png

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
BobSnyder85
8 - Asteroid

Thank you very much Mark!!

MarqueeCrew
20 - Arcturus
20 - Arcturus

You're very welcome @BobSnyder85.

 

I'm glad that my test data didn't get caught in @LeahK's naught word list.

 

Cheers,
Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Labels