We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Grouping by StartsWith Patterns & Multi Row Tool

hellyars
13 - Pulsar

 

My target data is text that falls between opening and closing paragraph tags.  For 95% of the records, the data is contained in one row.  The data for the remaining records is split across 3 rows, with the first row being the opening <p, the second containing the target text, and the third row being the closing </p>.    

 

I tried the following expression in a Multi-Row tool, but it fails.  A second Mutt-Role tool would have carried the RecordID for the starting <p to the start of the next <p.   The third step would be to use a Summarize tool to concat everything back in to a single line where it can then be processed using an existing macro.

 

 

 

if StartsWith([DownloadData],"^<p.*?>") && 
StartsWith([Row+2:DownloadData], "^<\/p>") then [RecordID] else "" endif 

 

 

 

A few important notes.   The target data is found in a larger HTML file.  There are other rows that start with <p.   But only the target rows follow the patters of  row 1 = <p, row 2 = target text, row3=</p>. 

 

 

 

HTMLRecordIDDesired Group 
<p style="text-align: center;">8211 
<strong>AIR FORCE</strong><br />8221 
</p>8231 
<p>8242 
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.8252 
</p>8262 
2 REPLIES 2
lmorrell
11 - Bolide

Hi @hellyars 

 

Workflow is attached.

 

Grouping by StartsWith Patterns & Multi Row Tools.png

 

Your logic was on the money, but I notice that you're using Regular Expressions in the StartsWith function which doesn't seem to be supported. The StartsWith function seems to use a non-case specific character match. Changing these StartsWith functions to a regex_match() function and changing the 'else ""' section to 'else null()' to preserve the column's data type should return the desired output.

 

if regex_match([HTML], '<p.*>') 
	AND regex_match([Row+2:HTML], '<\/p>') then [RecordID]
elseif not regex_match([HTML], '<p.*>') 
	AND not isnull([Row-1:Grouping field #2]) then [Row-1:Grouping field #2]
else null()
endif 

 

If you were super keen to achieve the same result with a StartsWith function then the below formula provides the same output

 

if startswith([HTML], '<p') 
	AND startswith([Row+2:HTML], '</p') then [RecordID]
elseif not startswith([HTML], '<p') 
	AND not isnull([Row-1:Grouping field]) then [Row-1:Grouping field]
else null()
endif 

 

Hope this helps!

hellyars
13 - Pulsar

@lmorrell Thank you for the assistance and explanation.  

Labels
Top Solution Authors