I have a SharePoint Input Tool which brings in a list with multiple fields. One field in specific (called 'Additional Items') is a free text field that the end user fills out. Typical structure of the field in SharePoint is as follows:
Description / Description : | PRODUCT 1 NAME |
Item / Art : | 12345678 |
UPC / CUP : | 12345678901 |
Retail / P. det : | $11.96 |
Size / Format : | EA |
Description / Description : | PRODUCT 2 NAME |
Item / Art : | 23456789 |
UPC / CUP : | 23456789012 |
Retail / P. det : | $13.98 |
Size / Format : | EA |
Description / Description : | PRODUCT 3 NAME |
Item / Art : | 34567890 |
UPC / CUP : | 34567890123 |
Retail / P. det : | $11.96 |
Size / Format : | EA |
Description / Description : | PRODUCT 4 NAME |
Item / Art : | 45678901 |
UPC / CUP : | 45678901234 |
Retail / P. det : | $11.96 |
Size / Format : | EA |
Description / Description : | PRODUCT 5 NAME |
Item / Art : | 56789012 |
UPC / CUP : | 56789012345 |
Retail / P. det : | $11.96 |
Size / Format : | EA |
Description / Description : | PRODUCT 6 NAME |
Item / Art : | 67890123 |
UPC / CUP : | 67890123456 |
Retail / P. det : | $95.68 |
Size / Format : | EA |
Description / Description : | PRODUCT 7 NAME |
Item / Art : | 78901234 |
UPC / CUP : | 78901234567 |
Retail / P. det : | $19.97 |
Size / Format : | EA |
Description / Description : | PRODUCT 8 NAME |
Item / Art : | 89012345 |
UPC / CUP : | 89012345678 |
Retail / P. det : | $19.97 |
Size / Format : | EA |
When the Additional Items field pulls into Alteryx, it looks like this:
<div class="ExternalClassDD65885A5A8245CF91E842F2A32D7147"><pre>
 </pre><table><tbody><tr><td>Description / Description :</td><td>PRODUCT 1 NAME</td></tr><tr><td>Item / Art :</td><td>123456789</td></tr><tr><td>UPC / CUP :</td><td>12345678901</td></tr><tr><td>Retail / P. det :</td><td>$11.96</td></tr><tr><td>Size / Format :</td><td>EA</td></tr><tr><td> </td><td> </td></tr><tr><td>Description / Description :</td><td>PRODUCT 2 NAME</td></tr><tr><td>Item / Art :</td><td>234567890</td></tr><tr><td>UPC / CUP :</td><td>23456789012</td></tr><tr><td>Retail / P. det :</td><td>$13.98</td></tr><tr><td>Size / Format :</td><td>EA</td></tr><tr><td> </td><td> </td></tr><tr><td>Description / Description :</td><td>PRODUCT 3 NAME</td></tr><tr><td>Item / Art :</td><td>345678901</td></tr><tr><td>UPC / CUP :</td><td>34567890123</td></tr><tr><td>Retail / P. det :</td><td>$11.96</td></tr><tr><td>Size / Format :</td><td>EA</td></tr><tr><td> </td><td> </td></tr><tr><td>Description / Description :</td><td>PRODUCT 4 NAME</td></tr><tr><td>Item / Art :</td><td>456789012</td></tr><tr><td>UPC / CUP :</td><td>45678901234</td></tr><tr><td>Retail / P. det :</td><td>$11.96</td></tr><tr><td>Size / Format :</td><td>EA</td></tr><tr><td> </td><td> </td></tr><tr><td>Description / Description :</td><td>PRODUCT 5 NAME</td></tr><tr><td>Item / Art :</td><td>567890123</td></tr><tr><td>UPC / CUP :</td><td>56789012345</td></tr><tr><td>Retail / P. det :</td><td>$11.96</td></tr><tr><td>Size / Format :</td><td>EA</td></tr><tr><td> </td><td> </td></tr><tr><td>Description / Description :</td><td>PRODUCT 6 NAME</td></tr><tr><td>Item / Art :</td><td>678901234</td></tr><tr><td>UPC / CUP :</td><td>67890123456</td></tr><tr><td>Retail / P. det :</td><td>$95.68</td></tr><tr><td>Size / Format :</td><td>EA</td></tr><tr><td> </td><td> </td></tr><tr><td>Description / Description :</td><td>PRODUCT 7 NAME</td></tr><tr><td>Item / Art :</td><td>789012345</td></tr><tr><td>UPC / CUP :</td><td>78901234567</td></tr><tr><td>Retail / P. det :</td><td>$19.97</td></tr><tr><td>Size / Format :</td><td>EA</td></tr><tr><td> </td><td> </td></tr><tr><td>Description / Description :</td><td>PRODUCT 8 NAME</td></tr><tr><td>Item / Art :</td><td>890123456</td></tr><tr><td>UPC / CUP :</td><td>89012345678</td></tr><tr><td>Retail / P. det :</td><td>$19.97</td></tr><tr><td>Size / Format :</td><td>EA</td></tr></tbody></table></div>
From the above, I would like to parse out the number that follow the 'Item/Art' flag (highlighted in red text above).
Any insight on how to cleanly parse out these item numbers, either into one field (without everything else), or onto separate rows?
Thanks!
Solved! Go to Solution.
Here's a solution.
- Use Regex Tokenize Tool to split your data to rows
([^><]+)(?:<[^>]+>|$)
This expression basically selects everything that is not between (<>), these html specifiers.
- Use Multi-Row formula field to flag the pair of Item Art and respective number.
IF Contains([Field], "Item / Art") THEN 0
ELSEIF Contains([Row-1:Field],"Item / Art") THEN 1
ELSE NULL() ENDIF
- User Filter to Flag only the numbers
- Use Select Tool to have only a single field.
WF attached. Version 2019.3
Cheers,
For extra credit, here's a dynamic solution based on @Thableaus excellent start that parses all the fields in the dataset and transposes them by product
Start by removing the blank lines and restoring the colon character ":". Then group the rows by product and assign a line number within each product. The odd numbered rows become the column headers and the even ones are the values so the Filter tool splits them into 2 sets. The 2 sets are joined by record position and the crosstab tool flips the data into columns. The Unique tool and the Dynamic Rename work around the fact the default behaviour of the Crosstab tool is to reorder the columns alphabetically and replace all non- alpha characters with underscores.
The results are as follows
Dan