<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Web scraping/regex in Alteryx Designer Desktop Discussions</title>
    <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545436#M117630</link>
    <description>&lt;P&gt;Also look at the solutions to weekly challenge 13 and 40 for more ideas on parsing html&lt;/P&gt;</description>
    <pubDate>Tue, 24 Mar 2020 08:43:05 GMT</pubDate>
    <dc:creator>DavidP</dc:creator>
    <dc:date>2020-03-24T08:43:05Z</dc:date>
    <item>
      <title>Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545306#M117608</link>
      <description>&lt;P&gt;Hi all,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I watched a webinar on this and thought it would be easy... I'm trying to download some data that is held in a table on a website. I have got as far as downloading the data, but am stuck on how to extract what I want out of that data. The source code looks like:&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;div&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;table &lt;SPAN class="html-attribute-name"&gt;cellspacing&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;0&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;border&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;0&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;id&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;ctl00_MainContent_ucFinancialComplianceTable_gdvFinancialAndComplianceHistory&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;border-width:0px;border-style:None;border-collapse:collapse;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;tr&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;th &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridHeading&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;scope&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;col&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;Financial year end (FYE)&lt;SPAN class="html-tag"&gt;&amp;lt;/th&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;th &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridHeading&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;scope&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;col&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;Income&lt;SPAN class="html-tag"&gt;&amp;lt;/th&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;th &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridHeading&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;scope&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;col&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;Spending&lt;SPAN class="html-tag"&gt;&amp;lt;/th&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;th &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridHeading&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;scope&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;col&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;Accounts received&lt;SPAN class="html-tag"&gt;&amp;lt;/th&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;th &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridHeading&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;scope&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;col&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;Annual Return received&lt;SPAN class="html-tag"&gt;&amp;lt;/th&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;th &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridHeading&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;scope&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;col&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;View&lt;SPAN class="html-tag"&gt;&amp;lt;/th&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/tr&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;tr &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;background-color:#FFFF99;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;STRONG&gt;31 Aug 2018&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnNumeric&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&amp;amp;#163;&lt;STRONG&gt;1,313,932&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnNumeric&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&amp;amp;#&lt;STRONG&gt;163;1,312,739&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;30 Jun 2019&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;30 Jun 2019&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;a &lt;SPAN class="html-attribute-name"&gt;title&lt;/SPAN&gt;='&lt;SPAN class="html-attribute-value"&gt;31 Aug 2018 Accounts PDF&lt;/SPAN&gt;' &lt;SPAN class="html-attribute-name"&gt;target&lt;/SPAN&gt;='&lt;SPAN class="html-attribute-value"&gt;_blank&lt;/SPAN&gt;' &lt;SPAN class="html-attribute-name"&gt;href&lt;/SPAN&gt;='&lt;A href="https://apps.charitycommission.gov.uk/Accounts/Ends74/0000267174_AC_20180831_E_C.pdf" target="_blank" rel="noreferrer noopener"&gt;/Accounts/Ends74\0000267174_AC_20180831_E_C.pdf&lt;/A&gt;'&amp;gt;&lt;/SPAN&gt;Accounts&lt;SPAN class="html-tag"&gt;&amp;lt;/a&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/tr&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;tr &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;background-color:#FFFFCC;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;STRONG&gt;31 Aug 2017&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnNumeric&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&amp;amp;#1&lt;STRONG&gt;63;1,166,743&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnNumeric&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&amp;amp;#163;&lt;STRONG&gt;1,127,987&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;05 Jun 2018&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;05 Jun 2018&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;a &lt;SPAN class="html-attribute-name"&gt;title&lt;/SPAN&gt;='&lt;SPAN class="html-attribute-value"&gt;31 Aug 2017 Accounts PDF&lt;/SPAN&gt;' &lt;SPAN class="html-attribute-name"&gt;target&lt;/SPAN&gt;='&lt;SPAN class="html-attribute-value"&gt;_blank&lt;/SPAN&gt;' &lt;SPAN class="html-attribute-name"&gt;href&lt;/SPAN&gt;='&lt;A href="https://apps.charitycommission.gov.uk/Accounts/Ends74/0000267174_AC_20170831_E_C.pdf" target="_blank" rel="noreferrer noopener"&gt;/Accounts/Ends74\0000267174_AC_20170831_E_C.pdf&lt;/A&gt;'&amp;gt;&lt;/SPAN&gt;Accounts&lt;SPAN class="html-tag"&gt;&amp;lt;/a&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/tr&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;tr &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;background-color:#FFFF99;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;STRONG&gt;31 Aug 2016&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnNumeric&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&amp;amp;#163;&lt;STRONG&gt;1,169,515&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnNumeric&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&amp;amp;#163;&lt;STRONG&gt;1,159,754&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;13 Jun 2017*&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;13 Jun 2017&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;a &lt;SPAN class="html-attribute-name"&gt;title&lt;/SPAN&gt;='&lt;SPAN class="html-attribute-value"&gt;31 Aug 2016 Accounts PDF&lt;/SPAN&gt;' &lt;SPAN class="html-attribute-name"&gt;target&lt;/SPAN&gt;='&lt;SPAN class="html-attribute-value"&gt;_blank&lt;/SPAN&gt;' &lt;SPAN class="html-attribute-name"&gt;href&lt;/SPAN&gt;='&lt;A href="https://apps.charitycommission.gov.uk/Accounts/Ends74/0000267174_AC_20160831_E_C.pdf" target="_blank" rel="noreferrer noopener"&gt;/Accounts/Ends74\0000267174_AC_20160831_E_C.pdf&lt;/A&gt;'&amp;gt;&lt;/SPAN&gt;Accounts&lt;SPAN class="html-tag"&gt;&amp;lt;/a&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/tr&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;tr &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;background-color:#FFFFCC;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;STRONG&gt;31 Aug 2015&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnNumeric&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&amp;amp;#163;&lt;STRONG&gt;1,193,076&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnNumeric&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&amp;amp;#163;&lt;STRONG&gt;1,112,737&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;12 May 2016&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;26 May 2016&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;a &lt;SPAN class="html-attribute-name"&gt;title&lt;/SPAN&gt;='&lt;SPAN class="html-attribute-value"&gt;31 Aug 2015 Accounts PDF&lt;/SPAN&gt;' &lt;SPAN class="html-attribute-name"&gt;target&lt;/SPAN&gt;='&lt;SPAN class="html-attribute-value"&gt;_blank&lt;/SPAN&gt;' &lt;SPAN class="html-attribute-name"&gt;href&lt;/SPAN&gt;='&lt;A href="https://apps.charitycommission.gov.uk/Accounts/Ends74/0000267174_AC_20150831_E_C.pdf" target="_blank" rel="noreferrer noopener"&gt;/Accounts/Ends74\0000267174_AC_20150831_E_C.pdf&lt;/A&gt;'&amp;gt;&lt;/SPAN&gt;Accounts&lt;SPAN class="html-tag"&gt;&amp;lt;/a&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/tr&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;tr &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;background-color:#FFFF99;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;STRONG&gt;31 Aug 2014&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnNumeric&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&amp;amp;#163;&lt;STRONG&gt;1,129,065&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnNumeric&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&amp;amp;#163;&lt;STRONG&gt;1,021,361&lt;/STRONG&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;06 Mar 2015&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:146px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;08 Jun 2015&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;td &lt;SPAN class="html-attribute-name"&gt;class&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;GridColumnText&lt;/SPAN&gt;" &lt;SPAN class="html-attribute-name"&gt;style&lt;/SPAN&gt;="&lt;SPAN class="html-attribute-value"&gt;width:73px;&lt;/SPAN&gt;"&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;a &lt;SPAN class="html-attribute-name"&gt;title&lt;/SPAN&gt;='&lt;SPAN class="html-attribute-value"&gt;31 Aug 2014 Accounts PDF&lt;/SPAN&gt;' &lt;SPAN class="html-attribute-name"&gt;target&lt;/SPAN&gt;='&lt;SPAN class="html-attribute-value"&gt;_blank&lt;/SPAN&gt;' &lt;SPAN class="html-attribute-name"&gt;href&lt;/SPAN&gt;='&lt;A href="https://apps.charitycommission.gov.uk/Accounts/Ends74/0000267174_AC_20140831_E_C.pdf" target="_blank" rel="noreferrer noopener"&gt;/Accounts/Ends74\0000267174_AC_20140831_E_C.pdf&lt;/A&gt;'&amp;gt;&lt;/SPAN&gt;Accounts&lt;SPAN class="html-tag"&gt;&amp;lt;/a&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/td&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/tr&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/table&amp;gt;&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&lt;SPAN class="html-tag"&gt;&amp;lt;/div&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to extract the data in bold. they are DATE, INCOME and EXPENDITURE over 5 years. I've been trying to build a regular expression but to avail. Similarly, I've downloaded the data to a temp file and if I try and filter it does it on the tempfile name, not the content... Feel I'm a little out of my depth. Any pointers?&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jun 2024 21:32:27 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545306#M117608</guid>
      <dc:creator>MTMjames</dc:creator>
      <dc:date>2024-06-13T21:32:27Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545322#M117611</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.alteryx.com/t5/user/viewprofilepage/user-id/24923"&gt;@MTMjames&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any chance you could like the website you're trying to scrape?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Thanks,&lt;/P&gt;&lt;P&gt;Jordan&lt;/P&gt;</description>
      <pubDate>Mon, 23 Mar 2020 23:22:18 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545322#M117611</guid>
      <dc:creator>JordyMicheal</dc:creator>
      <dc:date>2020-03-23T23:22:18Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545349#M117617</link>
      <description>&lt;P&gt;I modified the workflow found in this post slightly based on the data you provided and it came up with the results below.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Parsing-HTML-tables/td-p/25886" target="_blank"&gt;https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Parsing-HTML-tables/td-p/25886&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="DavidP_0-1585008480310.png" style="width: 400px;"&gt;&lt;img src="https://community.alteryx.com/t5/image/serverpage/image-id/104049i35C9BA34E6BE12AC/image-size/medium?v=v2&amp;amp;px=400" role="button" title="DavidP_0-1585008480310.png" alt="DavidP_0-1585008480310.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2020 00:08:40 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545349#M117617</guid>
      <dc:creator>DavidP</dc:creator>
      <dc:date>2020-03-24T00:08:40Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545433#M117627</link>
      <description>&lt;P&gt;The website is&amp;nbsp;&amp;nbsp;&lt;A href="https://apps.charitycommission.gov.uk/Showcharity/RegisterOfCharities/FinancialHistory.aspx?RegisteredCharityNumber=267174&amp;amp;SubsidiaryNumber=0" target="_blank"&gt;https://apps.charitycommission.gov.uk/Showcharity/RegisterOfCharities/FinancialHistory.aspx?RegisteredCharityNumber=267174&amp;amp;SubsidiaryNumber=0&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have around 400 of these where the number is difference else the URL is the same. That bit I have sorted and all works, just not how to grab the data our. I have attached it here...&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2020 08:38:15 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545433#M117627</guid>
      <dc:creator>MTMjames</dc:creator>
      <dc:date>2020-03-24T08:38:15Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545436#M117630</link>
      <description>&lt;P&gt;Also look at the solutions to weekly challenge 13 and 40 for more ideas on parsing html&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2020 08:43:05 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545436#M117630</guid>
      <dc:creator>DavidP</dc:creator>
      <dc:date>2020-03-24T08:43:05Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545437#M117631</link>
      <description>&lt;P&gt;That is brilliant, thanks. I'm new to the community and should have explained better! How do I plug that in to the rest of the flow?&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2020 08:43:11 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545437#M117631</guid>
      <dc:creator>MTMjames</dc:creator>
      <dc:date>2020-03-24T08:43:11Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545467#M117636</link>
      <description>&lt;P&gt;Ok, no worries. Leave it with me and I'll work on it this morning. Now that I can see what the data looks like, perhaps we can do something better.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2020 09:58:02 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545467#M117636</guid>
      <dc:creator>DavidP</dc:creator>
      <dc:date>2020-03-24T09:58:02Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545484#M117639</link>
      <description>&lt;P&gt;Here's a quick and dirty version for you to play with. I'll see if I can get the data out for you in a more elegant representation.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="DavidP_0-1585046999447.png" style="width: 400px;"&gt;&lt;img src="https://community.alteryx.com/t5/image/serverpage/image-id/104107i893A609EDA0ADA67/image-size/medium?v=v2&amp;amp;px=400" role="button" title="DavidP_0-1585046999447.png" alt="DavidP_0-1585046999447.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2020 10:50:19 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545484#M117639</guid>
      <dc:creator>DavidP</dc:creator>
      <dc:date>2020-03-24T10:50:19Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545509#M117643</link>
      <description>&lt;P&gt;This is slightly cleaner I think. Since we're only looking to extract the Table data, I went looking for those rows and filtered the rest out early on. Also replaced&amp;nbsp;&amp;amp;#163 (£ ascii code) with £ sign in the amounts. I added Charity No to show what each table relates to.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Let me know if you have any other questions&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="DavidP_0-1585049661122.png" style="width: 400px;"&gt;&lt;img src="https://community.alteryx.com/t5/image/serverpage/image-id/104114i524FB1CDF4955884/image-size/medium?v=v2&amp;amp;px=400" role="button" title="DavidP_0-1585049661122.png" alt="DavidP_0-1585049661122.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2020 11:41:57 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545509#M117643</guid>
      <dc:creator>DavidP</dc:creator>
      <dc:date>2020-03-24T11:41:57Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545517#M117645</link>
      <description>&lt;P&gt;That is totally amazing, thank you!!&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2020 11:54:30 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/545517#M117645</guid>
      <dc:creator>MTMjames</dc:creator>
      <dc:date>2020-03-24T11:54:30Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/546606#M117951</link>
      <description>&lt;P&gt;I must admit I'm totally impressed with that but can't work out how you did it! I'm trying to get some more data from the same site... how would I adapt that flow to pull the postcode and the name of the Chair?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In this page:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://apps.charitycommission.gov.uk/Showcharity/RegisterOfCharities/ContactAndTrustees.aspx?RegisteredCharityNumber=307322&amp;amp;SubsidiaryNumber=0" target="_blank"&gt;https://apps.charitycommission.gov.uk/Showcharity/RegisterOfCharities/ContactAndTrustees.aspx?RegisteredCharityNumber=307322&amp;amp;SubsidiaryNumber=0&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm looking to pull&amp;nbsp;&lt;SPAN&gt;SP10 3AL and&amp;nbsp;CAROLE MACHIN. The email of sbench@rookwoodschool.org would be helpful too, but I guess that breaks some rules.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks again&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Mar 2020 20:30:01 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/546606#M117951</guid>
      <dc:creator>MTMjames</dc:creator>
      <dc:date>2020-03-25T20:30:01Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/546673#M117979</link>
      <description>&lt;P&gt;To make the workflow a bit easier to understand, I've added some documentation to the original workflow. I've also duplicated the workflow to show how to retrieve the address and trustees info from the new URL&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope you find this helpful.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="DavidP_0-1585177814701.png" style="width: 400px;"&gt;&lt;img src="https://community.alteryx.com/t5/image/serverpage/image-id/104550iA335D096274A200D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="DavidP_0-1585177814701.png" alt="DavidP_0-1585177814701.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Mar 2020 23:10:58 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/546673#M117979</guid>
      <dc:creator>DavidP</dc:creator>
      <dc:date>2020-03-25T23:10:58Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/546767#M118011</link>
      <description>&lt;P&gt;Ah, I think I understand it more now. Could you put each of the outputs per URN into its own Column? so the postcode and Key person have their own columns? That will help me reference them against other data. Hope thats ok! I've been working with&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="0" cellspacing="0" cellpadding="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;267174&lt;/TD&gt;&lt;TD&gt;&lt;A href="https://apps.charitycommission.gov.uk/Showcharity/RegisterOfCharities/FinancialHistory.aspx?RegisteredCharityNumber=267174&amp;amp;SubsidiaryNumber=0" target="_blank" rel="noopener"&gt;https://apps.charitycommission.gov.uk/Showcharity/RegisterOfCharities/FinancialHistory.aspx?RegisteredCharityNumber=267174&amp;amp;SubsidiaryNumber=0&lt;/A&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;268482&lt;/TD&gt;&lt;TD&gt;&lt;A href="https://apps.charitycommission.gov.uk/Showcharity/RegisterOfCharities/FinancialHistory.aspx?RegisteredCharityNumber=268482&amp;amp;SubsidiaryNumber=0" target="_blank" rel="noopener"&gt;https://apps.charitycommission.gov.uk/Showcharity/RegisterOfCharities/FinancialHistory.aspx?RegisteredCharityNumber=268482&amp;amp;SubsidiaryNumber=0&lt;/A&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 26 Mar 2020 08:11:04 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/546767#M118011</guid>
      <dc:creator>MTMjames</dc:creator>
      <dc:date>2020-03-26T08:11:04Z</dc:date>
    </item>
    <item>
      <title>Re: Web scraping/regex</title>
      <link>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/546784#M118015</link>
      <description>&lt;P&gt;Firstly, let's make the input a bit more dynamic so that you just enter the URN and Charity No and the URL is built dynamically.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then you have isolate the postcode row and the last row for each set and join them to the 1st data set, like this&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="DavidP_0-1585213699368.png" style="width: 400px;"&gt;&lt;img src="https://community.alteryx.com/t5/image/serverpage/image-id/104593iCB0AAA4BA43CC2BD/image-size/medium?v=v2&amp;amp;px=400" role="button" title="DavidP_0-1585213699368.png" alt="DavidP_0-1585213699368.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Mar 2020 09:08:37 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Web-scraping-regex/m-p/546784#M118015</guid>
      <dc:creator>DavidP</dc:creator>
      <dc:date>2020-03-26T09:08:37Z</dc:date>
    </item>
  </channel>
</rss>

