Alteryx Designer Desktop Discussions

pcatterson · ‎06-16-2016

I want to identify and replace the third occurrence of the "T" character in each data row

AGCTTAGGCGAGTGCGAGTGCGATA

AGCTAGGCCGTAAAGCGAGGAGCCC

CTAGCATGCATGGGACCTAGGACCA

TAGAGATCGACGATTTACGAGGTTC

to

AGCTTAGGCGAGUGCGAGTGCGATA replaces it at character 12 (base 0)

AGCTAGGCCGTAAAGCGAGGAGCCC only 2 occurrences, replaces nothing

CTAGCATGCAUGGGACCTAGGACCA replaces it at character 10

TAGAGATCGACGAUTTACGAGGTTC replaces it at character 13

This is a simple case. I'm looking for something like this that will work for any occurrence and any character or string more generally. I expect it will require REGEX, but I'm not yet proficient with it.

bgraves · ‎06-16-2016

Use the RegEx Tool

Configuration

Regular Expression ([T])
Replacement Text $3U

Everything else is default

This replaced the 3rd occurance of T in any string with U

pcatterson · ‎06-16-2016

This is what I did:

but, it replaced all the Ts, not the third instance:

Am i missing something?

bgraves · ‎06-16-2016

Woops sorry about that! Thought I had it there... let me keep trying.

MarqueeCrew · ‎06-16-2016

While @bgraves is going down a regex path, here is a brute force way to accomplish the task:

1. Tokenize each value

2. Find 3rd T

3. Replace with U

4. Restructure Data

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

bgraves · ‎06-16-2016

Nice one, @MarqueeCrew

Here's an updated Regex ... it's still not quite right!

(?:.*?(T)+){3}.*?((T)+)

pcatterson · ‎06-16-2016

Couple issues:

I am using v10.1, so I can't open your workflow.
Typical DNA strands are much longer (potentially billions of basepairs), not 25 pairs long. I could see this approach causing the size of the data to explode to brute force this.
I will typically be looking for a string representing an entire gene not one basepair represented by one character.

JohnJPS · ‎06-16-2016

I've racked my brains on RegEx too... surprinsingly difficult! Anyway, here's another brute force approach:

RodL · ‎06-16-2016

@pcatterson, as an FYI...if you get a workflow sent that is "beyond" your version, you can usually just open the YXMD in a text tool (like Notepad) and change the version to your earlier one. It's in the second line of code.

Of course the caveat is if the workflow uses tools that are in the newer version only, it won't work.

jdunkerley79 · ‎06-16-2016

I think the following formula should work:

REGEX_Replace([Input],"(.*?T.*?T.*?)T(.*)","$1U$2")

or a slightly tweaked version which allows for changing instance number more easily (just change the 2!):

REGEX_Replace([Input],"((.*?T){2}.*?)T(.*)","$1U$3")

Alteryx Designer Desktop Discussions

Replacing particular occurrence of a character

Re: Min and Max positive Negative Number Highlight...

Re: Outlook 365 Input tool issue

Re: parsing text to date

Re: If formula to determined a value based on date...

Re: Running multiple alteryx workflows within alte...