Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Grouping by StartsWith Patterns & Multi Row Tool

hellyars
13 - Pulsar

 

My target data is text that falls between opening and closing paragraph tags.  For 95% of the records, the data is contained in one row.  The data for the remaining records is split across 3 rows, with the first row being the opening <p, the second containing the target text, and the third row being the closing </p>.    

 

I tried the following expression in a Multi-Row tool, but it fails.  A second Mutt-Role tool would have carried the RecordID for the starting <p to the start of the next <p.   The third step would be to use a Summarize tool to concat everything back in to a single line where it can then be processed using an existing macro.

 

 

 

if StartsWith([DownloadData],"^<p.*?>") && 
StartsWith([Row+2:DownloadData], "^<\/p>") then [RecordID] else "" endif 

 

 

 

A few important notes.   The target data is found in a larger HTML file.  There are other rows that start with <p.   But only the target rows follow the patters of  row 1 = <p, row 2 = target text, row3=</p>. 

 

 

 

HTMLRecordIDDesired Group 
<p style="text-align: center;">8211 
<strong>AIR FORCE</strong><br />8221 
</p>8231 
<p>8242 
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.8252 
</p>8262 
2 REPLIES 2
lmorrell
11 - Bolide

Hi @hellyars 

 

Workflow is attached.

 

Grouping by StartsWith Patterns & Multi Row Tools.png

 

Your logic was on the money, but I notice that you're using Regular Expressions in the StartsWith function which doesn't seem to be supported. The StartsWith function seems to use a non-case specific character match. Changing these StartsWith functions to a regex_match() function and changing the 'else ""' section to 'else null()' to preserve the column's data type should return the desired output.

 

if regex_match([HTML], '<p.*>') 
	AND regex_match([Row+2:HTML], '<\/p>') then [RecordID]
elseif not regex_match([HTML], '<p.*>') 
	AND not isnull([Row-1:Grouping field #2]) then [Row-1:Grouping field #2]
else null()
endif 

 

If you were super keen to achieve the same result with a StartsWith function then the below formula provides the same output

 

if startswith([HTML], '<p') 
	AND startswith([Row+2:HTML], '</p') then [RecordID]
elseif not startswith([HTML], '<p') 
	AND not isnull([Row-1:Grouping field]) then [Row-1:Grouping field]
else null()
endif 

 

Hope this helps!

hellyars
13 - Pulsar

@lmorrell Thank you for the assistance and explanation.  

Labels