Advent of Code is now back for a limited time only! Complete as many challenges as you can to earn those badges you may have missed in December. Learn more about how to participate here!
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Scrape data via download tool

KD82
6 - Meteoroid

I need help scraping data from https://www.ffiec.gov/npw/Institution/TopHoldings. Each top holding company has a set of subsidiaries that I’ll like to download. Please help as I’m new to this, I get a 403 message when I try…thank you

13 REPLIES 13
alexnajm
18 - Pollux
18 - Pollux

If it's 403 Forbidden, then nothing else can be done with the Download tool! You'll have to find another way - maybe an RPA tool can replicate the clicks to download the CSV from that site

ntakeda
12 - Quasar

I checked briefly, and it looks like the site is rejecting requests from Alteryx, likely because it blocks automated tools or requires a browser-based user agent.

I think it's difficult to achieve this using Alteryx.

apathetichell
19 - Altair

https://www.ffiec.gov/npw/Institution/TopHolderList

 

having said that --- they have a data download. go through any apis/datadownloads BEFORE trying any web scraping.

 

sample csv---> https://www.ffiec.gov/npw/FinancialReport/ReturnFinancialReportCSV?rpt=BHCPR&id=1039502&dt=20241231

JoshuaM
9 - Comet

Hey @KD82 

 

I've put together a solution using Alteryx's Python tool (found in the Developer Tool Palette) to scrape the table from the url https://www.ffiec.gov/npw/Institution/TopHoldings.

 

The attached workflow utilizes the Python libraries selenium & pandas to extract the data to pre-process in Alteryx. If you need to scrape a different url, the script may require minor adjustments to accommodate the new page structure.

 

While this solution is slightly more complex than other download methods, it automates the data extraction and preprocessing within Alteryx, eliminating the need for manual intervention 😊!

 

 

To use this solution:

  1. Install Required Libraries: If not already installed, you'll need to add selenium & websockets to your miniconda environment’s site-packages, as this is where Alteryx executes Python commands. (I used pip install from command line / then copy + pasted into desired folder)

  2. Import and Run the Workflow: Export the attached workflow in Alteryx, import the provided Python script into the Jupyter notebook within the Python tool, and run the workflow.

 

 

Setup in AYX.gif 

 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

Alteryx workflow.png

KD82
6 - Meteoroid

I’ll really like to know where to search for the api’s as this will could allow use of the download tool…much appreciated 

KD82
6 - Meteoroid

This is really helpful but I have restrictions on python libraries usage…did learn a lot from what you provided…🙏🏾👏🏾

apathetichell
19 - Altair

Hey --- in Chrome use control/shift/j to open the developer console. explore the network tab to see what's running and identify the backend apis. I tend to stick to that method vs Selenium unless I'm doing something which needs browser automation.

 

apathetichell
19 - Altair

One more thing---  https://www.ffiec.gov/npw/FinancialReport/ReturnFinancialReportCSV?rpt=BHCPR&id=1039502&dt=20241231 --- the csv link is the combo of datetime (20241231) and the specific id for that entity ("RssdId": 1039502 - for example for JP Morgan)... you can try to link these calls if you want to access the sub records.

KD82
6 - Meteoroid

Thank you…I’m having a little challenge with this part where I want the hierarchy for say JP and how I can modify the date. Please see attached 

Image 5-9-25 at 1.30 PM.jpeg

Labels
Top Solution Authors