In case you missed the announcement: Alteryx One is here, and so is the Spring Release! Learn more about these new and exciting releases here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing CEF Files (from Incapsula web application security)

Korpelak
8 - Asteroid

Hi there!

 

I have a CEF (Common event format)  file that looks like this:

 

CEF:0|Incapsula|SIEMintegration|1|1|Illegal Resource Access|9| fileid=3412341160002518171 sourceServiceName=site123.abcd.info siteid=1509732 suid=50005477 requestClientApplication=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0 cs2=true cs2Label=Javascript Support cs3=true cs3Label=CO Support src=12.12.12.12 caIP=[13.13.13.13] ccode=[IL] tag=www.elvis.com cn1=200 in=54 xff=44.44.44.44 cs1=NOT_SUPPORTED cs1Label=Cap Support cs4=c2e72124-0e8a-4dd8-b13b-3da246af3ab2 cs4Label=VID cs5=de3c633ac428e0678f3aac20cf7f239431e54cbb8a17e8302f53653923305e1835a9cd871db32aa4fc7b8a9463366cc4 cs5Label=clappsigdproc=Browser cs6=Firefox cs6Label=clapp ccode=[IL] cicode=Rehovot cs7=31.8969 cs7Label=latitude cs8=34.8186 cs8Label=longitude Customer=CEFcustomer123 start=1453290121336 request=site123.abcd.info/ requestmethod=GET qstr=p\=%2fetc%2fpasswd app=HTTP act=REQ_CHALLENGE_CAPTCHA deviceExternalID=33411452762204224 filetype=30037,1001, filepermission=2,1, cs9=Block Malicious User,High Risk Resources, cs9Label=Rule name

 

This one line record btw.

Any ideas on how to best parse this? The first step was using "|" but everything else got complicated. I've tried using text to column tool with "=" as delimiter but then I encountered records that had the "=" character in the field content as well. I am using the regex parse with '([^=\s]+)=((?:[\\]=|[^=])+)(?:\s|$)' to match single key=value pair but then it gets ugly... because I have to repeat this like 40 times for each field. Is there a prettier way to do this in Alteryx? Like a "for i in a" ?

 

Any ideas would be much appreciated :)

 

Thanks!

2 REPLIES 2
NicoleJohnson
ACE Emeritus
ACE Emeritus

See attached, this might have some pieces that could help you out...

 

Essentially, I used Text to Columns (parsing to rows instead of columns), and then using a formula to determine which rows have an = in them. I then used a multi-row tool to create a record ID based on whether a row had the = sign in it, then a summarize tool to concatenate the results (which effectively combined any rows without an = sign with the prior rows). I then was able to use the Text to Columns tool with the = sign as the delimiter, and specified any extras per row to remain in the last column (which handles results that have an = in data).

 

This feels like a terribly confusing explanation, but the workflow does seem to work (at least at a very cursory glance), and I think it will be clearer if you take a look & dissect the formulas I used :) I hope this helps!!

 

CEFExample.JPG

 

Cheers,

NJ

Korpelak
8 - Asteroid

You rock Nicole!! Your workflow is simple, elegant and it works!

Thank you!!!

 

Karina

Labels
Top Solution Authors