Hi All,
I am trying to filter columns containing certain software. Is there a way to not only filter the column based on a text i.e., SAP but to also delete all other text in that cell except for the Text (text in the filter criteria).
Below is an example of text in a cell. Please let me know, thanks
FileMaker Pro, ITTIA ODBC, Microsoft Access, Microsoft SQL Server, Microsoft SQL Server 2008, Microsoft Team Foundation Server (TFS), MongoDB, Oracle Database (RDBMS), SAP HANA, Sybase, UDB
Solved! Go to Solution.
You cannot filter and modify the field's value at the same time. You would have to use a Filter tool then add a Formula tool to get rid of the extra text you don't want.
Hope this helps!
That makes sense. Thanks. I am having an issue though, im getting no true outputs. Below is the formula im using an below that is an example of a field. Please let me know, thanks
(Column name is Business Intelligence/Big Data)
[Business Intelligence/Big Data] IN ("SAP Business Intelligence (BI)","SAP BusinessObjects","SAP Crystal Reports","SAP NetWeaver BPM","SAP BusinessObjects Web Intelligence")
Adobe Analytics, Adobe SiteCatalyst (Omniture), Apache Hadoop, Clearfit, ClickTale, Domino, Gerber AccuMark, Google Analytics, IBM Cognos Enterprise, IBM SPSS Advanced Statistics, iDashboards, Keen, Keynote, L2, MEGA, MicroStrategy Business Intelligence, Oracle BI Publisher, Oracle BlueKai, Oracle Business Intelligence Discoverer, Oracle Business Intelligence Enterprise Edition, Oracle OBIEE, PARIS, SAP BusinessObjects, SAP BusinessObjects Web Intelligence, SAP Crystal Reports, SAS Enterprise BI Server, SAS Visual Analytics, Splunk, Tableau, Tableau Desktop, Twitter Universal Website Tag |
@AustinRiggs94 Because you're using the IN() function, Alteryx is looking for the keywords you're passing to it to be the entire value of the field. Instead, you'll want to use the Contains() function to check if your values are in any part of the field.
@Kenda Thanks for your help, i got the formulas working, however im having trouble removing the extra text within the field. I understand the RegEx is probably my best option, but i do not know how to properly write a RegEx formula to look for text such as "SAP BusinessObjects". Do you have an example or a link that explains? Please let me know, thanks.
Hey @AustinRiggs94
You're right, RegEx is probably your best bet. Using your above example, I was able to get rid of the extra text with this expression in a Formula tool:
REGEX_Replace([Business Intelligence/Big Data], "(.*)(SAP BusinessObjects)(.*)", "$2")
This is basically splitting up your field into three sections (each set of parenthesis) then tells Alteryx to only keep what's within the second set.
Hope this helps!
Thanks! @Kenda im still understanding how Regex works and how to write certain expressions. Is there a way for RegEx to pull multiple phrases such as SAP BusinessObjects, SAP Crystal Reports, SAP HANA? Or would i have to write an expression for each phrase?
@AustinRiggs94 If you're wanting to keep multiple phrases, I have an alternative way to do this that might be better.
Instead of editing the existing field to parse it down to only the phrases you want, what if you create a new field and build it using only the phrases you want? For example, create a new field in a Formula tool with this expression:
IIF(Contains([Business Intelligence/Big Data], "SAP BusinessObjects"), "SAP BusinessObjects", "")
Then, create another expression in the same Formula tool to edit the field you just created like this:
IIF(Contains([Business Intelligence/Big Data], "SAP Crystal Reports"), [programs]+", SAP Crystal Reports", [programs])
Continue doing this with all of the phrases you want to search for. In the end, you will get a new field that only has a list of your desired programs that are contained in the original field. In the picture below, I only have two because I only checked for two.
Hi @AustinRiggs94 ,
Great work so far with @Kenda . Regex is an amazing tool, and I highly recommend looking into the Weekly Challenges to learn more about RegEx and see how people are leveraging its power.
I took a very different approach. Took the long string and parsed into rows using Text to Columns, and cleaned up extra spaces with Data Cleansing (unnecessary, I just prefer that). Then used a filter for all rows containing SAP. Finally, if you want them all in the same cell, concatenate back together with a summarize tool, where the separator is ", " a comma followed by a space.
If you have multiple rows of this complicated cell, add a Record ID tool before the Text to Columns, and then in the summarize tool, use the Record ID field as a "Group By" field.
No RegEx, still flexible. The beauty of Alteryx is being able to solve the same challenge in many ways.
Cheers!
Esther
Hi All,
Thanks @Kenda and @estherb47 i tried both of your ways, and both of them worked. I ended up having to do a set of formulas for each software name. The excel i had, had multiple columns of these fields, for multiple entities. (see attached). For future reference is there a way to Regex_replace or parse an entire spreadsheet? Im sure in the future i will come across this and i would like to be prepared. I know there has to be someway to do this, i just do not know how yet. Please let me know, thanks again for all of your help.
Please note, there are multiple columns in the original excel that look like the one given.