Alteryx Designer Ideas

Share your Designer product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Featured Ideas

When using the text mining tools, I have found that the behaviour of using a template only applies to documents with the same page number.

 

So in my use case I've got a PDF file with 100+ claim statements which are all laid out the same (one page per statement). When setting up the template I used one page to set the annotations, and then input this into the T anchor of the Image to Text tool. Into the D anchor of this tool is my PDF document with 100+ pages. However when examining the output I only get results for page 1.

 

On examining the JSON for the template I can see that there is reference to the template page number:

cgoodman3_0-1604393391514.png

 

And playing around with a generate rows tool and formula to replace the page number with pages 1 - 100 in the JSON doesn't work. I then discovered that if I change the page number on the image input side then I get the desired results. 

 

cgoodman3_1-1604393499357.png

However an improvement to the tool, as I suspect this is a common use case for the image to text tool, is to add an option in the configuration of the image to text tool to apply the same template to all pages.

 

cgoodman3_4-1604393738275.png

 

 

 

 

 

Please remove all stopwords that help to identify the sentiment of a text. E.g. words like 'no' and 'not' are currently removed when you enable the 'remove stopwords' options. Here is an example:

 

grossal_0-1606336161535.png

grossal_1-1606336172201.png

 

 

People will probably use the option to remove stopwords without even thinking about issues like this and might remove relevant information from texts and then do a Sentiment Analysis afterwards and wonder why the results are bad.

 

Dear Alteryx, please find a better stopword list or remove some words from the list.

Hello!

I recently had a use-case that needed me to start using the text mining tools for a reporting based workflow, and i had pretty good success. However, my workflow does not always have data being brought through it, and when 0 results are supplied to a Text Pre-processing tool, it produces a bug and log file. I imagine this is a small fix, just a weird one to run into.

Steps to replicate:
1- Add a text input tool with some dummy text values as part of a field

TheOC_0-1625041927041.png

 

2- Add a text pre-processing tool to the workflow, and configure it to use this field and English.

TheOC_1-1625041976621.png

 

3- add a filter tool, prior to the text pre-processing tool, to filter out any valid results

TheOC_2-1625042026104.png

 

4- Run the workflow

TheOC_3-1625042045969.png

TheOC_4-1625042053768.png


As the data cleansing tool does not behave similar, I fully believe this to be an unintended outcome.

With the new intelligence suite there is a much higher use of blob files and we would like to be able to input them as a regular input instead of having to use non- standard tools like Image, report text or a combination of directory/blob or input/download to pull in images, etc. I would like to see the standard input tool capable of bringing in blob files as well.

Blob InputBlob InputImage InputImage InputText InputText Input

In the new Intelligence Suite tools, it would be extremely useful to have the option to add n-gram (combining words/tokens ) in the Topic Modeling Text Mining Tool. 

 

This is important in many NLP topic modeling scenarios.

It would provide more flexibility to build better NLP models.

 

For details on n-gram

https://en.wikipedia.org/wiki/N-gram

 

 

I have a PDF of 27 pages and each page is identical.  The headers, footers and data are static in positioning on each page. It would be great if I could define the text to parse out on the first page, then that could be used to parse out all of the pages in the PDF.  It would make the tool far more useful.

Instead or in addition to be able to manually enter additional stop words, it would be great if you could have an optional input connection where you could point to a file with additional stop words in it.  Very manual to type the additional stop words in...

  • Category Text Mining
0 Likes

Hello all,

 

As Intelligence Suite is a great expansion in Alteryx Designer, it would be great to expand the data types in the "Text Mining" and "Computer Vision" ribbon. The "Image Template" accepts only "strings" data types and specific Languages. It would be great to be added more data types, Language and ISO-Codes. 

0 Likes

Hello all,

Here is a very, very simple idea :

just having the right to put a minimum length to Text Pre-processing
Why ? A lot of words (especially in French) with 1 or 2 characters does not change the meaning, the sentiment: la, le, à, un, une, etc... In English : in, the, etc..

 
 
 

Best regards,

Simon

0 Likes

Hello,

As of today, only English is available. But it's hard to convince French Customers with french language data to buy the AIS if it cannot work with their data.

Best regards,

Simon

0 Likes

Hello,

As of today, we can configure the language for all rows but it doesn't work when I have several languages in my data :

simonaubert_bd_0-1622199279906.png

 

I woud like to take a field so that I can specify the language. The ideal would be a two-time configuration : globally in one language but I can overwrite that by choosing a field containing the language.

Best regards,

Simon

0 Likes

Hallo,

 

i think that for fin companies it would be very helpful to have an algorithm to analyse sentiment on various topic in articles, tweets, linktin, FB, etc. It could be helpful e.g. to understand what market think about some reg developments, projects and hot topics. Most importantly that fin companies normally deal with very spacial types of text, which are industry specific hence VADER algorithm broadly used for tweets does not really perform great on fin data. I would suggest to add FinBERT model (and BERT model as such) which are top of the pops in AI (BERT is used in goolge search engine). The pre-trained models are freely accessible. it would be very helpful if the range of model would be extended to FinBERT, for banks, FS teams, asset managers, BERT for general use, MedBERT for pharma.  

0 Likes

I would like to share my idea that would be definitely useful for fast automation of the process with reading and correctly recognizing the text from PDF input. I wrote about that, hoping somebody has already thought about that here.

 

The idea is the tools "PDF Input" and "Image to Text" from "Text Mining" category to be improved, so as the text from PDF document to be read properly, no matter the text position on each page. 

 

It could be also considered the performance of the combined tools of "PDF Input" and "Image to Text" to be improved as they work slower than the customized tool PDF Input does.

 

The idea also can be expanded to an entirely new tool that works out all the actions, needed for correctly reading of a PDF document without manual intervention.

0 Likes

In the next product version, can the parameter options for the topic modelling be changed to allow the output of both word relevance summary and interactive chart? It's a bit strange to run the tool twice to get this output.  

Top Liked Authors