Alteryx Designer Desktop Discussions

HA · ‎07-24-2017

Hi,

I have a large log file that I need to parse and search for specific tags, based on a certain logic. There are some cases that the logic seems to be too complicated for Alteryx because Alteryx doesn't have a string function to search from the end of a string.(like Findstring but search from the end not beginning of a string).

simplified example:

Original string : 1490175939957 CAPITALFIX-FXCFX IN [FXCFX//38.160.70.92:54448] 1903=FIX.4.49 =53235=834=642 1903=FXCFX50 =ams.kas.tgoossens52=20170322-09:45:39.930 1903=CAPITALFIX57 =lon.capital.rmichalak 8=FIX.4.49= 1906=5 1490175939957 8=FIX.4.49=53235=8

For the sample string given above the logic is to first find tag 1906 and if it's equal 5 1906=5 (easy to do using Alteryx string functions) then search back and extract the value for tag 1903=. This is the part that I don't know how to do in Alteryx because if I use findstring to search for tag 1903 it will return the first occurrence of that tag which is 1903=FIX.4.49 but I need the last occurance which is 1903=CAPITALFIX57.

I hope my explanation of the requirements was clear. I would appreciate any suggestion on how to do this type of search in alteyx

MarqueeCrew · ‎07-24-2017

@HA,

While having sample data in a CSV or yxdb would make solving this problem easier. The answer is yes. It can be solved using Alteryx. I can see multiple approaches to solving the problem. I would like to see more sample data in order to not overly simplify a solution.

"Well, the popular TV personality known as Professor Backwards was slain in Atlanta yesterday, by three masked gunmen. According to reports, neighbors ignored the Professor's cries of 'Pleh! Pleh!' "

ReverseString([String]) could have saved professor backwards. If I were looking for the LAST occurrence of 1903=, I could look for "=3091". That's one way to approach the problem. Another approach would be to use regular expressions to parse the data into rows (you'd later number the rows) looking for a pattern of "(\d{4}=.*)". I don't know what the box character represents in the sample that you provided, but that would bound the search expression.

Perhaps another way would be to parse to rows with an = sign. You could use a multi-row formula to grab the last word in the row above it and get the digit value there.

Just some thoughts without data...

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

NicoleJohnson · ‎07-24-2017

Piggy backing off of @MarqueeCrew suggestion, the following formula appears to work for the sample you provided... however he is correct in that it is hard to determine if this is working without a few more lines of data to test against. If you can provide, let us know and we can double-check this possible solution, plus whatever else Mark might have up his sleeve! My initial approach was the backwards path, see below...

Possible formula solution:

IF REGEX_CountMatches(String,".*(1906=5).*")=1
THEN RegEx_Replace(Right([String],FindString(ReverseString([String]),"=3091")+5),"(^\S*).*","$1")
ELSE Null() ENDIF

Tests to see if there is an instance of 1906=5 first... then uses Mark's suggestion of ReverseString to find the starting point of the last instance of 1093= (or =3091). Uses that value to cut off the data string starting in that position (I added 5 to capture the "1093=" too, if that is also required), then used RegEx_Replace to capture just the first part of the data until it runs into a space.

Hope these ideas help point you in the right direction! That was a fun one to test out some unique string parsing theories :)

Cheers!

NJ

Joe_Mako · ‎07-24-2017

How about Regex like:

.*1903=(.*?)\s.*1906=5

or

.*(1903=.*?)\s.*1906=5

With the "Copy Unmatched Text to Output" turned off.

Depending if you want to not keep the 1903= or keep it in the capture group. The first .* will capture as many characters as possible while the rest of the expression is still met. The .*? will capture few characters as possible while the rest of the expression is met.

This will extract the last 1903 before the last 1906=5

Attached is an example workflow with some test data.

HA · ‎07-25-2017

Hi,

MarqueeCrew suggestion to use RevreseString () is quite interesting and I think it should work unless there are some limitations of the string length ( some records have more than 80K characters ).

I really like regex option suggested by you guys and if it works that will be a superior solution in this case. I don’t have time to test them today but I will test those options in the next couple of days and report the results.

I have attached a very small portion of the extract file that contain the tags I mentioned in my first post for testing. The delimiter in this file is CharFromInt(1)

Thank you guys for very good suggestions, I really appreciate that.

Joe_Mako · ‎07-25-2017

How about something like the attached (replaced the \s with \x01):

GarthM · ‎07-25-2017

here's 2 ways of doing the same thing to get, what i believe, you're asking for

HA · ‎08-01-2017

both ReverseSstring() and RegEx methods work but RegEx is much simpler and cleaner.

Thank you for your help

HA · ‎08-01-2017

Hi Joe I used your suggested RegEx. It is really a beautiful solution

HA · ‎08-01-2017

Hi Joe,

I used \x011903=(.*)\x01.*1906=5 regular expression with output method parse that creates a new field with the matching values returned. ( output field expression is (.*).

I know what (?) does in a regular expression but I was wondering whether there was any specific reason you used it in your expression because it seems we don't need it in this case.

Thanks,

Alteryx Designer Desktop Discussions

String parsing from the end