Alteryx Designer Desktop Discussions

JR1 · ‎06-12-2017

I was wondering how to split a column in half in order to speed up a fuzzy match join. Figure A contains all the items. Figure B contains items 1-5 while Figure C contains items 6-10. After the fuzzy matching result has been completed on the separate columns, I want to rejoin them so that they resemble Figure A again with the addition of the Matched result score. Is this possible?

ID	Item a	Item b
1	A	a
2	B	b
3	C	c
4	D	d
5	E	e
6	F	f
7	G	g
8	H	h
9	I	i
10	J	j

Figure A

ID	Item a	Item b
1	A	a
2	B	b
3	C	c
4	D	d
5	E	e

Figure B

ID	Item a	Item b
6	F	f
7	G	g
8	H	h
9	I	i
10	J	j

Figure C

NicoleJohnson · ‎06-12-2017

Do you know how many characters you want in each half ahead of time, or does the split need to be dynamic?

If you know ahead of time (i.e. there are always 20 characters, and you want 10 in each), then you could use the RegEx tool:

Regular Expression: (\w{10})
Output Method: Tokenize
Split to Columns: 2

This will split your characters into two columns of 10. You could also be a little more flexible by using (.{3}) if there are non-alphanumeric characters involved...

If you have a varying number of characters, however, might need to get more creative... let us know! :)

Cheers,
NJ

NicoleJohnson · ‎06-12-2017

Do you know how many characters you want in each half ahead of time, or does the split need to be dynamic?

If you know ahead of time (i.e. there are always 20 characters, and you want 10 in each), then you could use the RegEx tool:

Regular Expression: (\w{10})
Output Method: Tokenize
Split to Columns: 2

This will split your characters into two columns of 10. You could also be a little more flexible by using (.{3}) if there are non-alphanumeric characters involved...

If you have a varying number of characters, however, might need to get more creative... let us know! :)

Cheers,
NJ

JR1 · ‎06-12-2017

The split needs to be dynamic. Some values in certain fields will remain the same e.g. State and ZIP will remain 2 and 5 respectively.

NicoleJohnson · ‎06-12-2017

How about a few formulas?

1. Find the Length of the field:

Length: Length([Field1])

1. If the Length is greater than a certain amount, say 10 characters, split the field in half. For the second portion, if the first Split field matches the original, then second half should be Null, otherwise capture the remaining characters not contained in the first split.

Split1: if [Length]>10 then left([Field1],ceil([Length]/2)) else [Field1] endif
Split2: if [Field1]=[Split1] then null() else Replace([Field1],[Split1],"") endif

Alternatively, if you had another method for identifying fields that needed to be split, you could filter for only those needing to be split, then use the two split formulas shown above, then union them back to your fields that did not need to be split...

NJ

JR1 · ‎06-12-2017

I'm not sure that this is splitting the column in half. It seems that it is just creating a new column entirely with all of the entries repeated in the new column.

NicoleJohnson · ‎06-12-2017

Sorry about that!! I didn't see the image you posted in your original query when the post came through in my email, and it looked like it was set up like this:

ABCDEFGHIJ

abcdefghij

So you are totally correct in that my response would not be helpful in the way you had it set up!!

However, I think to do what you're trying to do, you could find the median record number (use a Summarize tool to find the Median of RecordID), then append the median back onto your original data, then filter to split entries that are greater than the median. Would that do it?

Again, sorry about that. The way tables come across in the email format is WAY different than how they look here on the Community!!

JR1 · ‎06-12-2017

No need to apologize for the wrong formatting. I understand completely. Happens to the best of us.

I think your solution will work, however, I am running into a problem when appending the median back onto the original. I am using a union to rejoin the two, and It gives the column with null as all the values. Should I use a different join?

NicoleJohnson · ‎06-12-2017

Use the Append tool rather than the Union tool - the median will be your Source (bottom input) and your original info will be the Target. This will add the median field on to every record in your original list, whereas Union will try to join your columns together by field/position. Example attached. Does that help?

JR1 · ‎06-12-2017

Yes that helped. Thank you very much.

Alteryx Designer Desktop Discussions

Split Database column in half