Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Scrape data via download tool

KD82
6 - Meteoroid

I need help scraping data from https://www.ffiec.gov/npw/Institution/TopHoldings. Each top holding company has a set of subsidiaries that I’ll like to download. Please help as I’m new to this, I get a 403 message when I try…thank you

13 REPLIES 13
alexnajm
18 - Pollux
18 - Pollux

If it's 403 Forbidden, then nothing else can be done with the Download tool! You'll have to find another way - maybe an RPA tool can replicate the clicks to download the CSV from that site

ntakeda
12 - Quasar

I checked briefly, and it looks like the site is rejecting requests from Alteryx, likely because it blocks automated tools or requires a browser-based user agent.

I think it's difficult to achieve this using Alteryx.

apathetichell
20 - Arcturus

https://www.ffiec.gov/npw/Institution/TopHolderList

 

having said that --- they have a data download. go through any apis/datadownloads BEFORE trying any web scraping.

 

sample csv---> https://www.ffiec.gov/npw/FinancialReport/ReturnFinancialReportCSV?rpt=BHCPR&id=1039502&dt=20241231

JoshuaM
9 - Comet

Hey @KD82 

 

I've put together a solution using Alteryx's Python tool (found in the Developer Tool Palette) to scrape the table from the url https://www.ffiec.gov/npw/Institution/TopHoldings.

 

The attached workflow utilizes the Python libraries selenium & pandas to extract the data to pre-process in Alteryx. If you need to scrape a different url, the script may require minor adjustments to accommodate the new page structure.

 

While this solution is slightly more complex than other download methods, it automates the data extraction and preprocessing within Alteryx, eliminating the need for manual intervention 😊!

 

 

To use this solution:

  1. Install Required Libraries: If not already installed, you'll need to add selenium & websockets to your miniconda environment’s site-packages, as this is where Alteryx executes Python commands. (I used pip install from command line / then copy + pasted into desired folder)

  2. Import and Run the Workflow: Export the attached workflow in Alteryx, import the provided Python script into the Jupyter notebook within the Python tool, and run the workflow.

 

 

Setup in AYX.gif 

 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

Alteryx workflow.png

KD82
6 - Meteoroid

I’ll really like to know where to search for the api’s as this will could allow use of the download tool…much appreciated 

KD82
6 - Meteoroid

This is really helpful but I have restrictions on python libraries usage…did learn a lot from what you provided…🙏🏾👏🏾

apathetichell
20 - Arcturus

Hey --- in Chrome use control/shift/j to open the developer console. explore the network tab to see what's running and identify the backend apis. I tend to stick to that method vs Selenium unless I'm doing something which needs browser automation.

 

apathetichell
20 - Arcturus

One more thing---  https://www.ffiec.gov/npw/FinancialReport/ReturnFinancialReportCSV?rpt=BHCPR&id=1039502&dt=20241231 --- the csv link is the combo of datetime (20241231) and the specific id for that entity ("RssdId": 1039502 - for example for JP Morgan)... you can try to link these calls if you want to access the sub records.

KD82
6 - Meteoroid

Thank you…I’m having a little challenge with this part where I want the hierarchy for say JP and how I can modify the date. Please see attached 

Image 5-9-25 at 1.30 PM.jpeg

Labels
Top Solution Authors