Right now, the power to solve new global challenges across industries, is at your fingertips, no matter where you're working from. Create a new topic or reply to an existing thread to share your perspective.

Industry Discussions

A dedicated space to share resources, connect with like-minded data workers, and discuss industry specific analytic challenges + solutions.


7 - Meteor

Who is doing OSINT Research and Analysis using Alteryx? How is Aleryx optimized to do Internet Scrapping to support OSINT Research?

11 - Bolide

I'm going to be the first to admit that I had not heard of osint until I saw your post. Now that I have googled and read a few articles I am fascinated!! I was able to find where is was mentioned in another post in the community. It looks like one of the commentators with experience in that area is an Alteryx Ace who might be able to answer some questions.

Alteryx Community Team
Alteryx Community Team

Thanks for providing some insights @SGolnik . Let's see if we can get the man himself to weigh in. @patrick_mcauliffe , what is your experience with OSINT Research and Analysis with Alteryx?

14 - Magnetar
14 - Magnetar

Thanks @LaurenU !


@mikeanders I did OSINT, using Alteryx, as part of my full time position for roughly 5 years.  About six months ago I decided to go into consulting, so now I only do it for fun (haven't had a client with that need yet).


How you use it and what it works best at totally depends on what type of research you're doing, just like any other tool.

If you have something specific you're working on or thinking about, let me know and we can focus on that.


Generically speaking though, Alteryx has a few options for web scraping data. 

The primary tool I used was the Download tool - it's technically just an interface for cURL. 

For me this cut down on a development time for simple web scraping that I otherwise would have written a script for.  Instead of scripting something to use cURL and then further parse out data I could just drop in the URL to a field and run it through a Download tool. 

After that it was all drag and drop data rearrangement from there.

Occasionally you'll run into those sites where you need to spoof the user-agent or include some other required header, but that was as simple as adding it into a preset for the Download tool config options.  The bonus in that situation is you can save it into a macro so you don't need to redo that every time you write a workflow.

Also, there was the throttling aspect with some sites.  Alteryx makes this very easy in that you can use the Throttling tool, and the Wait a Second macro (from CReW) to create very specific request intervals. 

When you need a random interval, just create a field for a random digit and have that pass the timeout parameter to the Wait a Second macro.


The one downside to using the Download tool for web scraping is that it doesn't (or didn't back when I was using it) support SOCKS so web scraping from the Tor network wasn't a thing you could do.

However, there are still other options if you need to scrape from Tor.  They just require a little more work.

You'd have to script it in Python or R; but luckily that's just a simple tool in Alteryx where you can drop in your code.


Overall I wouldn't compare Alteryx to any other OSINT tool because unlike OSINT tools, Alteryx isn't meant to do just one thing.

Think about Maltego, Octoparse, Creepy, etc.  They're great at what they do, if you want to do just that one thing with the data you acquire using it.

But what if you get data via that tool and then realize you need to reuse the data in another type of analysis?  You'd have to write a whole new intake process in a different tool or try to output it in a compatible format.  That's all very time consuming. 

I'd say Alteryx is a future-proofing method to OSINT work because it is the Swiss Army knife of OSINT.  You can quickly get the data and iterate through an endless number of different analyses (and automate it all!).


There are many more experiences and opinions I could share, but I'll stop it right here and pause for questions.