Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing CEF Files (from Incapsula web application security)

Korpelak
8 - Asteroid

Hi there!

 

I have a CEF (Common event format)  file that looks like this:

 

CEF:0|Incapsula|SIEMintegration|1|1|Illegal Resource Access|9| fileid=3412341160002518171 sourceServiceName=site123.abcd.info siteid=1509732 suid=50005477 requestClientApplication=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0 cs2=true cs2Label=Javascript Support cs3=true cs3Label=CO Support src=12.12.12.12 caIP=[13.13.13.13] ccode=[IL] tag=www.elvis.com cn1=200 in=54 xff=44.44.44.44 cs1=NOT_SUPPORTED cs1Label=Cap Support cs4=c2e72124-0e8a-4dd8-b13b-3da246af3ab2 cs4Label=VID cs5=de3c633ac428e0678f3aac20cf7f239431e54cbb8a17e8302f53653923305e1835a9cd871db32aa4fc7b8a9463366cc4 cs5Label=clappsigdproc=Browser cs6=Firefox cs6Label=clapp ccode=[IL] cicode=Rehovot cs7=31.8969 cs7Label=latitude cs8=34.8186 cs8Label=longitude Customer=CEFcustomer123 start=1453290121336 request=site123.abcd.info/ requestmethod=GET qstr=p\=%2fetc%2fpasswd app=HTTP act=REQ_CHALLENGE_CAPTCHA deviceExternalID=33411452762204224 filetype=30037,1001, filepermission=2,1, cs9=Block Malicious User,High Risk Resources, cs9Label=Rule name

 

This one line record btw.

Any ideas on how to best parse this? The first step was using "|" but everything else got complicated. I've tried using text to column tool with "=" as delimiter but then I encountered records that had the "=" character in the field content as well. I am using the regex parse with '([^=\s]+)=((?:[\\]=|[^=])+)(?:\s|$)' to match single key=value pair but then it gets ugly... because I have to repeat this like 40 times for each field. Is there a prettier way to do this in Alteryx? Like a "for i in a" ?

 

Any ideas would be much appreciated :)

 

Thanks!

2 REPLIES 2
NicoleJohnson
ACE Emeritus
ACE Emeritus

See attached, this might have some pieces that could help you out...

 

Essentially, I used Text to Columns (parsing to rows instead of columns), and then using a formula to determine which rows have an = in them. I then used a multi-row tool to create a record ID based on whether a row had the = sign in it, then a summarize tool to concatenate the results (which effectively combined any rows without an = sign with the prior rows). I then was able to use the Text to Columns tool with the = sign as the delimiter, and specified any extras per row to remain in the last column (which handles results that have an = in data).

 

This feels like a terribly confusing explanation, but the workflow does seem to work (at least at a very cursory glance), and I think it will be clearer if you take a look & dissect the formulas I used :) I hope this helps!!

 

CEFExample.JPG

 

Cheers,

NJ

Korpelak
8 - Asteroid

You rock Nicole!! Your workflow is simple, elegant and it works!

Thank you!!!

 

Karina

Labels
Top Solution Authors