Note: I have not built an interface with Alteryx before, but I have very familiar with the ETL capabilities of Alteryx. So this whole interface is new to me but I would like to be able to leverage it to create value.
Here is my situation:
My current workflow takes in a data source with a list of unique identifiers (UIs) (lets call it "Main"). I then join this with another data source that has more specific information on each UI (lets call it "Detail"). However, Detail does not contain every single UI. I also output a list of all of the UIs from Main that were not contained in the Detail.
I then manually go retrieve the information for each UI from a website. The website requires me to search by UI, therefore I must know the UI before I cannot get the data (so getting all of the data beforehand is not an option). I then add these new UIs to Detail.
I then run the workflow again to link the UIs from Main with the UIs from Detail (which now contains all of the UIs from Main).
Here is what I would like to do. I have two ideas:
1) Use a macro to web scrape the website for the information within the current workflow. However, I cannot find a macro that can take data (the UIs) from within a workflow, search for them on a website, and then return the results. If this exists, please point me to it. If not, I have never made a macro before, so unless it is easy, I would like to avoid it.
2) Create an interface that runs the workflow once and gives the list of the UIs that need to be searched online to the user. It then prompts the user to enter in the information for these UIs. (The user can search these online outside of the interface, but it would be ideal if it would be possible to put it directly inside the interface. I am unsure if that is possible.) The user then enters this information into the interface and clicks a continue like button. The interface then runs the workflow again with the new information included.
I hope that this makes sense and is possible to do! Any help is appreciated. Please let me know if I need to be clearer in the explanation. I understand that I am asking for a bit of a complicated process. And if there is a better way to do this, please let me know!
Solved! Go to Solution.
Hi @jedwards543,
If I'm reading it correctly, I think full automation with #1 makes sense, using an iterative macro. For an iterative macro, your input would be the list of UIs; grab the first from the list, and send all but the first to the "Loop Control" output. Then send that single UI through whatever process you have for scraping.... send the output from that to another Macro Output.
In short, the "iterative macro" will continue sending "loop control" output until it is empty... since you remove an item with every iteration, that should work. The other output (when done iteratively) ill just get unioned (essentially) together.
Hope that helps!
John
Thank you John!
I'm glad that you support #1 because it is definitely the more automated approach. I have found web scraping macros that can scrape a table from a website. However I have not found any web scraping macros that can first search in a search bar, and then scrap the table the is returned. Could you point me in the direction of resources that will do that or teach me to do it?
Thank you!
Adding to @JohnJPS's reply:
If say you were performing a lookup on Yahoo Finance and needed to lookup AYX (a missing value), you could construct the search:
https://finance.yahoo.com/quote/AYX?p=AYX
Then send each URL through the download tool.
Just a thought,
Mark
You can use the Inspect feature in Google Chrome prior to clicking on the button in the webpage to see the URL/Headers/Payload that you need to craft. Then with that info, you can make a workflow like:
The attached will:
- Join two data streams to find IDs not in the known file
- Formula to prep the ID, removing space and replacing space with +
- Tile tool to identify batches of 200 records
- Summarize to concatenate, making use of the separator option
- Formula to set URL and concatenate Fields
- Download to a Temp file using Post and passing the Fields as the payload
- Dynamic Input to read in the downloaded CSV file
If all of the equipment IDs match and there are not any Equipment IDs that need to be scraped from the website (which can sometimes happen), how do we prevent the error thrown at the Dynamic Input? Is there a way to say, "If no Equip IDs, do not run this part?"
Thank you!
Yeah, the tool "Dynamic Input" has a lot of limitations, and commonly errors before ruining (maybe making you think that there is an error when there is not, and maybe when you go to try to fix it, the attempt to fix actually breaks it). The "Dynamic Input" tool is not resilient.
In the attached I replaced it with a quick Macro, that will not error, and will not give you error prior to running.