Replacing particular occurrence of a character
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I want to identify and replace the third occurrence of the "T" character in each data row
AGCTTAGGCGAGTGCGAGTGCGATA
AGCTAGGCCGTAAAGCGAGGAGCCC
CTAGCATGCATGGGACCTAGGACCA
TAGAGATCGACGATTTACGAGGTTC
to
AGCTTAGGCGAGUGCGAGTGCGATA replaces it at character 12 (base 0)
AGCTAGGCCGTAAAGCGAGGAGCCC only 2 occurrences, replaces nothing
CTAGCATGCAUGGGACCTAGGACCA replaces it at character 10
TAGAGATCGACGAUTTACGAGGTTC replaces it at character 13
This is a simple case. I'm looking for something like this that will work for any occurrence and any character or string more generally. I expect it will require REGEX, but I'm not yet proficient with it.
Solved! Go to Solution.
- Labels:
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Use the RegEx Tool
Configuration
- Regular Expression ([T])
- Replacement Text $3U
Everything else is default
This replaced the 3rd occurance of T in any string with U
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This is what I did:
but, it replaced all the Ts, not the third instance:
Am i missing something?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Woops sorry about that! Thought I had it there... let me keep trying.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
While @bgraves is going down a regex path, here is a brute force way to accomplish the task:
1. Tokenize each value
2. Find 3rd T
3. Replace with U
4. Restructure Data
Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Nice one, @MarqueeCrew
Here's an updated Regex ... it's still not quite right!
(?:.*?(T)+){3}.*?((T)+)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Couple issues:
- I am using v10.1, so I can't open your workflow.
- Typical DNA strands are much longer (potentially billions of basepairs), not 25 pairs long. I could see this approach causing the size of the data to explode to brute force this.
- I will typically be looking for a string representing an entire gene not one basepair represented by one character.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I've racked my brains on RegEx too... surprinsingly difficult! Anyway, here's another brute force approach:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@pcatterson, as an FYI...if you get a workflow sent that is "beyond" your version, you can usually just open the YXMD in a text tool (like Notepad) and change the version to your earlier one. It's in the second line of code.
Of course the caveat is if the workflow uses tools that are in the newer version only, it won't work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I think the following formula should work:
REGEX_Replace([Input],"(.*?T.*?T.*?)T(.*)","$1U$2")
or a slightly tweaked version which allows for changing instance number more easily (just change the 2!):
REGEX_Replace([Input],"((.*?T){2}.*?)T(.*)","$1U$3")
