community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Replacing particular occurrence of a character

Highlighted
ACE Emeritus
ACE Emeritus

I want to identify and replace the third occurrence of the "T" character in each data row

 

AGCTTAGGCGAGTGCGAGTGCGATA

AGCTAGGCCGTAAAGCGAGGAGCCC

CTAGCATGCATGGGACCTAGGACCA

TAGAGATCGACGATTTACGAGGTTC

 

to

 

AGCTTAGGCGAGUGCGAGTGCGATA    replaces it at character 12 (base 0)

AGCTAGGCCGTAAAGCGAGGAGCCC    only 2 occurrences, replaces nothing

CTAGCATGCAUGGGACCTAGGACCA    replaces it at character 10

TAGAGATCGACGAUTTACGAGGTTC    replaces it at character 13

 

This is a simple case.  I'm looking for something like this that will work for any occurrence and any character or string more generally.  I expect it will require REGEX, but I'm not yet proficient with it.

Meteoroid

Use the RegEx Tool

 

Configuration

  1. Regular Expression ([T])
  2. Replacement Text $3U

Everything else is default

 

This replaced the 3rd occurance of T in any string with U

ACE Emeritus
ACE Emeritus

This is what I did:

 

A.png

but, it replaced all the Ts, not the third instance:

 

B.png

 

Am i missing something?

Meteoroid

Woops sorry about that! Thought I had it there... let me keep trying. 

Alteryx Certified Partner
Alteryx Certified Partner

While @bgraves is going down a regex path, here is a brute force way to accomplish the task:

 

1. Tokenize each value

2. Find 3rd T

3. Replace with U

4. Restructure Data

 

Screen Shot 2016-06-16 at 2.52.16 PM.png

Screen Shot 2016-06-16 at 2.52.39 PM.png

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Meteoroid

Nice one, @MarqueeCrew

 

Here's an updated Regex ... it's still not quite right! 

 

(?:.*?(T)+){3}.*?((T)+)

 

ACE Emeritus
ACE Emeritus

Couple issues:

 

  1. I am using v10.1, so I can't open your workflow. 
  2. Typical DNA strands are much longer (potentially billions of basepairs), not 25 pairs long.    I could see this approach causing the size of the data to explode to brute force this.
  3. I will typically be looking for a string representing an entire gene not one basepair represented by one character.
ACE Emeritus
ACE Emeritus

I've racked my brains on RegEx too... surprinsingly difficult!  Anyway, here's another brute force approach:

Capture.PNG

Alteryx Alumni (Retired)

@pcatterson, as an FYI...if you get a workflow sent that is "beyond" your version, you can usually just open the YXMD in a text tool (like Notepad) and change the version to your earlier one. It's in the second line of code.

Of course the caveat is if the workflow uses tools that are in the newer version only, it won't work.

I think the following formula should work:

REGEX_Replace([Input],"(.*?T.*?T.*?)T(.*)","$1U$2")

 

or a slightly tweaked version which allows for changing instance number more easily (just change the 2!):

REGEX_Replace([Input],"((.*?T){2}.*?)T(.*)","$1U$3")
Labels