Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

String parsing from the end

HA
8 - Asteroid

Hi,

I have a large log file that I need to parse and search for specific tags, based on a certain logic. There are some cases that the logic seems to be too complicated for Alteryx because Alteryx doesn't have a string function to search from the end of a string.(like Findstring but search from the end not beginning of a string).

 

simplified example:

Original string : 1490175939957 CAPITALFIX-FXCFX IN [FXCFX//38.160.70.92:54448] 1903=FIX.4.49 =53235=834=642 1903=FXCFX50 =ams.kas.tgoossens52=20170322-09:45:39.930 1903=CAPITALFIX57 =lon.capital.rmichalak 8=FIX.4.49= 1906=5 1490175939957 8=FIX.4.49=53235=8  

 

For the sample string given above the logic is to first find tag 1906 and if it's equal 5  1906=5 (easy to do  using Alteryx string functions) then search back and extract the value for tag 1903=. This is the part that I don't know how to do in Alteryx because if I use findstring to search for tag 1903 it will return the first occurrence of that tag which is   1903=FIX.4.49 but I need the last occurance which is 1903=CAPITALFIX57.

 

I hope my explanation of the requirements was clear. I would appreciate any suggestion on how to do this type of search in alteyx    

10 REPLIES 10
MarqueeCrew
20 - Arcturus
20 - Arcturus

@HA,

 

While having sample data in a CSV or yxdb would make solving this problem easier.  The answer is yes.  It can be solved using Alteryx.  I can see multiple approaches to solving the problem.  I would like to see more sample data in order to not overly simplify a solution.

 

"Well, the popular TV personality known as Professor Backwards was slain in Atlanta yesterday, by three masked gunmen. According to reports, neighbors ignored the Professor's cries of 'Pleh! Pleh!' "

 

ReverseString([String]) could have saved professor backwards.  If I were looking for the LAST occurrence of 1903=, I could look for "=3091".  That's one way to approach the problem.  Another approach would be to use regular expressions to parse the data into rows (you'd later number the rows) looking for a pattern of "(\d{4}=.*)".  I don't know what the box character represents in the sample that you provided, but that would bound the search expression.

 

Perhaps another way would be to parse to rows with an = sign.  You could use a multi-row formula to grab the last word in the row above it and get the digit value there.

 

Just some thoughts without data...

 

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
NicoleJohnson
ACE Emeritus
ACE Emeritus

Piggy backing off of @MarqueeCrew suggestion, the following formula appears to work for the sample you provided... however he is correct in that it is hard to determine if this is working without a few more lines of data to test against. If you can provide, let us know and we can double-check this possible solution, plus whatever else Mark might have up his sleeve! My initial approach was the backwards path, see below...

 

Possible formula solution:

IF REGEX_CountMatches(String,".*(1906=5).*")=1
THEN RegEx_Replace(Right([String],FindString(ReverseString([String]),"=3091")+5),"(^\S*).*","$1")
ELSE Null() ENDIF

 

Tests to see if there is an instance of 1906=5 first... then uses Mark's suggestion of ReverseString to find the starting point of the last instance of 1093= (or =3091). Uses that value to cut off the data string starting in that position (I added 5 to capture the "1093=" too, if that is also required), then used RegEx_Replace to capture just the first part of the data until it runs into a space.

 

Hope these ideas help point you in the right direction! That was a fun one to test out some unique string parsing theories :)

 

Cheers!

NJ

Joe_Mako
12 - Quasar

How about Regex like:

 

.*1903=(.*?)\s.*1906=5

or

.*(1903=.*?)\s.*1906=5

 

With the "Copy Unmatched Text to Output" turned off.

 

Depending if you want to not keep the 1903= or keep it in the capture group. The first .* will capture as many characters as possible while the rest of the expression is still met. The .*? will capture few characters as possible while the rest of the expression is met.

 

This will extract the last 1903 before the last 1906=5

 

Attached is an example workflow with some test data.

HA
8 - Asteroid

Hi,

MarqueeCrew suggestion to use RevreseString () is quite interesting and I think it should work unless there are some limitations of the string length ( some records have more than 80K characters ).

I really like regex option suggested by you guys and if it works that will be a superior solution in this case. I don’t have time to test them today but I will test those options in the next couple of days and report the results.

I have attached a very small portion of the extract file that contain the tags I mentioned in my first post for testing. The delimiter in this file is CharFromInt(1)

 

Thank you guys for very good suggestions, I really appreciate that. 

 

 

Joe_Mako
12 - Quasar

How about something like the attached (replaced the \s with \x01):

 

last regex.png

 

GarthM
Alteryx Alumni (Retired)

here's 2 ways of doing the same thing to get, what i believe, you're asking for

HA
8 - Asteroid

both ReverseSstring() and RegEx methods work but RegEx is much simpler and cleaner.

Thank you for your help

HA
8 - Asteroid

Hi Joe I used your suggested RegEx. It is really a beautiful solution

HA
8 - Asteroid

Hi Joe,

I used            \x011903=(.*)\x01.*1906=5   regular expression with output method parse that creates a new field with the matching values returned. ( output field expression is (.*).

I know what (?) does in a regular expression but I was wondering whether there was any specific reason you used it in your expression because it seems we don't need it in this case.

 

Thanks,  

Labels