community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Regex replace until first new line character occurrence encountered

Highlighted
Meteoroid

I have a Download tool that downloads data (comma separated) multiple times. I have been able to parse the data but the header(which is the first line in the row value) gets union multiple times. So I need to keep the first header of the download data column and then delete header after that (please note, the header is the first line in the row value). Below is a screenshot and the actual data. 

Capture.JPG

date,keywordId,keywordMatchType,keywordText,hits
2018-10-14,43100000396124630,Broad,long beach online insurance,0
2018-10-13,43100000396125206,Broad,oxnard automotive insurance,0
2018-10-13,43100000396125880,Broad,san buenaventura free car insurance quote,0

 

Tried this solution but failed- 

First I add a record id so that I can skip the first row (keep the header values).

Capture2.JPG

 

As stated earlier, the rows in the download data are separated by a newline character. So I need to replace the sequence until the first newline character occurrence. So, in a formula tool I'm doing the below regex replace operation: 

IF [RecordID]>1 THEN Regex_replace([DownloadData],"\A.*\n",''")
ELSE [DownloadData]
ENDIF

 

But it doesn't work, rather deletes the whole row value than the match until the first newline character. Can anyone please help to write the regex for this? 

This above regex does work in the regex test, maybe the regex needs to be tweaked alteryx as it uses boost regex. 

Regex tester:  https://regex101.com/   Have given a screenshot below

Capture3.JPG

 

Magnetar
Magnetar

That does seem like odd RegEx behavior - I was actually able to get that RegEx formula to work on my end using the data you provided, so not sure why it's not working in your instance... could it be that it isn't actually a newline character separating the header & data rows? 

 

At any rate, there is a relatively easy non-RegEx workaround - you could parse your data as-is on the newline characters to rows, then parse again on the commas to columns... use the Dynamic Rename tool to rename your columns using the first line of data (i.e. the first instance of headers)... and then just filter out those remaining records where the [date] field = "date" (which would effectively remove the duplicate headers). 

 

Let us know if that would be an acceptable work-around!

 

Cheers,

NJ

Labels