I am very new to Alteryx and seem to have a pretty complex question in my first project I want to realize.
To give you a clue what I am asking about, I will provide you example data as a screenshot and as an excel file attached to this topic.
In order to use this Data, which is derived from a survey, to analyze correlations between for example the size of the company (which was another question in the survey) and the activities they perform (as seen in this screenshot), I would need to first encode this data into numbers. At lest that's what I learned. But before I can encode this data, I (in my opinion) need to logically split up this data into several columns to see which features on their own are correlating and not a combination of features. So my question basically is, if there is a way to do so?
To maybe give you more options in order to help me, I will explain two methods that came into my mind in order to reach my goal, which I was not able to realize in Alteryx so far. The first one would be to create new columns that are named as all options available in all cells above that I want to split up and then for each row to put a 1 into the column if it is part of the cell I want to split and a 0 if not. The second one would be kind of a database approach to normalize this table. I would create new tables with all possible options of the cells I want to split up, give them an ID and then join them via foreign keys. But I think this second option would not help me to encode in a way that I can calculate correlations.
I hope I could make my very complex problem clear enough for you to understand. If not, please comment your question and I will try to answer asap. Any help is very much appreciated!
since this is your first post, welcome to the Alteryx Community!
On a note before I go into your problem: You are right now in the German Community and it also looks like you are capable of German because the Sample Data shows German words, therefore it would be great if we continue in case of questions in German - anyway, I'll answer it in English because the request was English. (I will write the German Tool Names in () just in case your Designer is set to German).
On thing you will learn very quickly - most things are actually super easy in Alteryx. Your task can be solved with just a handful tools - let me show you how!
The basic workflow looks like this:
And here is what happens:
1. We add a RecordID (Datensatz-ID) tool to identify the separate rows later on when we bring them back into the original shape (Default Tool settings)
2. We use a Text-To-Columns (Text-zu-Spalten) Tool with the Comma Separator and split it INTO ROWS, this will come in handy in just a minute.
Now the data looks like this:
3. The data already looks a lot better, but we can immediately notice (by the red indicator) that we have leading whitespaces because the comma-separated list included them. We can fix this with the Data Cleansing (Datenbereinigung) Tool with the default options.
The option to remove leading whitespaces it automatically ticked. I'd recommend to remove the 'RecordID' column in the top, because we don't need to change something here - but it wouldn't something if you leave it as it is. This is more relevant if you have bigger datasets.
4. As you already mentioned in your suggestion, we need some sort of indicator, therefore I simply add a Flag-Column with an 1 in it - This can be done with a Formula (Formel) Tool. Make sure to set the data type to int, as we might want to summarize it later on.
5. Time for the magic! The Cross Tab (Kreuztabelle) Tool pivots the data back into the original shape. We use the our RecordID to group the data and our shifted values as the new column names. The trick here is, that in case it can not find a value, for a certain column, it will just leave it empty.
The result looks like this:
If you want the sum, you could also remove the RecordID in the Group-By Values and this will do the job, but I'd recommend to do this separately (hint hint, I did this in the advanced container in case you might take a look).
You might want to do one, two or all three steps here. (I'll skip the exact tool configurations here, you can explore them in the attached workflow. The are pretty simple again, in case of questions feel free to ask) 1) I like to rename the columns back, because the Cross-Tab replaces the spaces in the column names with _, this is done using the Dynamic Rename (Dynamisches Umbenennen)
2) I than use the Summarize (Zusammenfassen) Tool to sum it up.
3) The data is easier to read if it's shifted (another Transpose-Tool), but maybe that's just me. ( The naming can be adjusted in the previous step)
You can find the workflow attached. You will need to change the Input file location of the sample to the location of the sample on your computer. Maybe you also have to change the workflow version if you do not use 2020.4, this can be done with a simple text editor. Just right-click the workflow in the windows explorer and open it in notepad and change the second line to your version.
Let me know if you have any questions!
2021-01-06-split up and encode different values from one cell to several columns normalize.yxmd
vielen Dank für Deine Antwort und die ganze Mühe! Die Alteryx Community ist tatsächlich so gut, wie sie mir beschrieben wurde! 🙂 Ich dachte, dass mein Post weltweit zu sehen wäre, deshalb habe ich auf Englisch geschrieben. Also auch Danke für diesen Hinweis!
Ich werde Deinen Workflow heute Abend einarbeiten und Rückmeldung geben, ob alles funktioniert oder eventuell Rückfragen stellen.
gerne! Beim Ausführlichen erklären lernt man selbst auch immer nochmal ein wenig dazu 🙂
Ich freue mich das wir deinen Erwartungen und den Versprechungen anderer gerecht geworden sind :'). Es gibt auch ein weltweites Forum, aber ich vermute du bist durch deine Sprachsetting erstmal hierher geleitet worden. In der internationeln Community liegt der Fokus oft nur auf dem reinen "lösen" von Posts und weniger dem Erklären. Hier in der deutschen Community nehmen wir uns auch gerne die Zeit ausführlicher über die Details zu sprechen.
Nachdem du den Post schon als Lösung markiert hast, vermute ich mal, dass er dir geholfen hat 🙂