Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
The Expert Exam is now live online! Read about the specifics and what it took to bring it to life in the blog by our very own Elizabeth Bonnell!

Regex replace until first new line character occurrence encountered

6 - Meteoroid

I have a Download tool that downloads data (comma separated) multiple times. I have been able to parse the data but the header(which is the first line in the row value) gets union multiple times. So I need to keep the first header of the download data column and then delete header after that (please note, the header is the first line in the row value). Below is a screenshot and the actual data. 


2018-10-14,43100000396124630,Broad,long beach online insurance,0
2018-10-13,43100000396125206,Broad,oxnard automotive insurance,0
2018-10-13,43100000396125880,Broad,san buenaventura free car insurance quote,0


Tried this solution but failed- 

First I add a record id so that I can skip the first row (keep the header values).



As stated earlier, the rows in the download data are separated by a newline character. So I need to replace the sequence until the first newline character occurrence. So, in a formula tool I'm doing the below regex replace operation: 

IF [RecordID]>1 THEN Regex_replace([DownloadData],"\A.*\n",''")
ELSE [DownloadData]


But it doesn't work, rather deletes the whole row value than the match until the first newline character. Can anyone please help to write the regex for this? 

This above regex does work in the regex test, maybe the regex needs to be tweaked alteryx as it uses boost regex. 

Regex tester:   Have given a screenshot below



14 - Magnetar
14 - Magnetar

That does seem like odd RegEx behavior - I was actually able to get that RegEx formula to work on my end using the data you provided, so not sure why it's not working in your instance... could it be that it isn't actually a newline character separating the header & data rows? 


At any rate, there is a relatively easy non-RegEx workaround - you could parse your data as-is on the newline characters to rows, then parse again on the commas to columns... use the Dynamic Rename tool to rename your columns using the first line of data (i.e. the first instance of headers)... and then just filter out those remaining records where the [date] field = "date" (which would effectively remove the duplicate headers). 


Let us know if that would be an acceptable work-around!