I am trying to set up Alteryx to help automate background checks.
I can use Google and Bing to get search engine results pages programmatically, and I can retrieve the links for each result.
The next step is to try to get an AI to go to the links, "read them", summarise and assess their relevance to the initial search query. ChatGPT can currently do web searches interactively, but this functionality is not exposed yet on the API. I have tried scraping text from the pages at the search result links but this produces very patchy results, the pages have headers and footers and adverts all sorts of garbage around the actual content, and sometimes the content is PDF.
Does anyone have any good pointers for how to go about this?
By background checks, I suppose you mean like a full on compliance background check? If yes, even if you automate from searching online, how would you account for false positives?
Companies like Thompson Reuters does that - their database is comprehensive. Does your company have access to software like that? Or are you trying to make your own using Alteryx + GenAI?
@caltang More of a scan of relatively recent news. The customers should have had full background checks to be where they are, but they can always transgress and incur risk in real time. The idea is to watch for news or recent events, court action etc, and surface anything that might effect the risk that this customer represents. Currently a person does this as a routine periodic process, and the idea is to try to automate it using Alteryx as the data driver and GPT type AI as the assessor for relevance.