Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Alteryx is here to help you solve your biggest data challenges. Read about the new Virtual Solution Center here.

Harvesting site with iterative macro troubleshoot

Highlighted
8 - Asteroid

Attempting to harvest the below website and also get better at iterative macros. Macros atatched.

 

2 main issues with the DM 1 macro:

1. No matter what I use for the first input link (which page 1-300) the output of the macro is always the first link and then multiple rows of the second correct link, but not continuing on to link 3.

        ex: 1. www.page.com/1

              2. www.page.com/2

              3. www.page.com/2

              4. www.page.com/2

              5. www.page.com/2

              6. www.page.com/2

2. When I go to the link output from the macro, which should match the URL, it redirects me to the first page, which means that for 2 different URLS going into the data harvest macro, only results from the original page are returned.

iterate.PNG


DM 2 has a few parsing errors to fix as well but works fine for this purpose.

https://laegemiddelstyrelsen.dk/da/godkendelse/sundhedspersoners-tilknytning-til-virksomheder/lister...

 

Any suggestions?

Highlighted
12 - Quasar

Hi @Adara,

 

I wasn't able to follow all your parses to see what data you were trying to get to, but you shouldn't need a macro to extract the data.  Attached is a workflow that determines the number of pages in the link from the page counter at the bottom of the first page, then creates web addresses for each page.  Data investigation indicates this website uses a different method of pagination other than page=1, page=2, etc., which this workflow addresses.  Then all pages are fed at once into the Download tool and the data for all pages is in the DownloadData field.  You should be able to break out the data by newlines then further parse into what you are looking for.  Let the community know if you need further assistance parsing the data, but it looks like you are well on your way.

 

Webscrape 2.png

 

12 - Quasar
12 - Quasar

Without seeing how your workflow is setup, there are a few things that we need to look at. 

 

1) Is your iterative INPUT setup correctly? if you don't have one of those setup, then there is nowhere for the iterative output to go back through the macro.

2) It looks like you have a formula where you are looking at engine.iterationNumber. whenever you do get the lack of iterating fixed, this engine.iteration number will start at 0. So you may want to think about some way to stop the first iteration from creating a URL for the second iteration back through the call for page 1. Essentially, like if engine.iterationNumber = 0 then 2 else engine.iterationnumber + 1

3) @T_Willins is correct, you don't have to do an iterative macro. Their solution works quite well.

 

Here is an article I wrote about iterative macros and how to set them up, there might be something here you haven't thought of through your process.

https://community.alteryx.com/t5/Engine-Works-Blog/Hello-Iterative-Macro-My-Old-Friend/ba-p/420308

 

Let me know how these go.

Labels