Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Replacing particular occurrence of a character

pcatterson
11 - Bolide

I want to identify and replace the third occurrence of the "T" character in each data row

 

AGCTTAGGCGAGTGCGAGTGCGATA

AGCTAGGCCGTAAAGCGAGGAGCCC

CTAGCATGCATGGGACCTAGGACCA

TAGAGATCGACGATTTACGAGGTTC

 

to

 

AGCTTAGGCGAGUGCGAGTGCGATA    replaces it at character 12 (base 0)

AGCTAGGCCGTAAAGCGAGGAGCCC    only 2 occurrences, replaces nothing

CTAGCATGCAUGGGACCTAGGACCA    replaces it at character 10

TAGAGATCGACGAUTTACGAGGTTC    replaces it at character 13

 

This is a simple case.  I'm looking for something like this that will work for any occurrence and any character or string more generally.  I expect it will require REGEX, but I'm not yet proficient with it.

12 REPLIES 12
bgraves
6 - Meteoroid

Use the RegEx Tool

 

Configuration

  1. Regular Expression ([T])
  2. Replacement Text $3U

Everything else is default

 

This replaced the 3rd occurance of T in any string with U

pcatterson
11 - Bolide

This is what I did:

 

A.png

but, it replaced all the Ts, not the third instance:

 

B.png

 

Am i missing something?

bgraves
6 - Meteoroid

Woops sorry about that! Thought I had it there... let me keep trying. 

MarqueeCrew
20 - Arcturus
20 - Arcturus

While @bgraves is going down a regex path, here is a brute force way to accomplish the task:

 

1. Tokenize each value

2. Find 3rd T

3. Replace with U

4. Restructure Data

 

Screen Shot 2016-06-16 at 2.52.16 PM.png

Screen Shot 2016-06-16 at 2.52.39 PM.png

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
bgraves
6 - Meteoroid

Nice one, @MarqueeCrew

 

Here's an updated Regex ... it's still not quite right! 

 

(?:.*?(T)+){3}.*?((T)+)

 

pcatterson
11 - Bolide

Couple issues:

 

  1. I am using v10.1, so I can't open your workflow. 
  2. Typical DNA strands are much longer (potentially billions of basepairs), not 25 pairs long.    I could see this approach causing the size of the data to explode to brute force this.
  3. I will typically be looking for a string representing an entire gene not one basepair represented by one character.
JohnJPS
15 - Aurora

I've racked my brains on RegEx too... surprinsingly difficult!  Anyway, here's another brute force approach:

Capture.PNG

RodL
Alteryx Alumni (Retired)

@pcatterson, as an FYI...if you get a workflow sent that is "beyond" your version, you can usually just open the YXMD in a text tool (like Notepad) and change the version to your earlier one. It's in the second line of code.

Of course the caveat is if the workflow uses tools that are in the newer version only, it won't work.

jdunkerley79
ACE Emeritus
ACE Emeritus

I think the following formula should work:

REGEX_Replace([Input],"(.*?T.*?T.*?)T(.*)","$1U$2")

 

or a slightly tweaked version which allows for changing instance number more easily (just change the 2!):

REGEX_Replace([Input],"((.*?T){2}.*?)T(.*)","$1U$3")
Labels