Forum - Deutsch

MichaelHa · ‎03-28-2021

Hello everybody,

I ran into a problem where I'm unfortunately currently stuck. Maybe the community can help me here. I would like to create a matching over several lines with the contains function. For example:

Input:

Fruits	total	text
Apple	5	Apples tastes great with pears
pear	1	Pears tastes great with apples
strawberry	7	Strawberrys tastes great with cherrys
orange	3	Oranges tastes great with limes
cherry	6	Cherrys tastes great with strawberrys
lime	6	Limes tastes great with oranges

Output:

Fruits	total	text	fruits Match	total
Apple	5	Apples tastes great with pears	pear	1
pear	1	Pears tastes great with apples	Apple	5
strawberry	7	Strawberrys tastes great with cherrys	cherry	6
orange	3	Oranges tastes great with limes	lime	6
cherry	6	Cherrys tastes great with strawberrys	strawberry	7
lime	6	Limes tastes great with oranges	orange	3

Thanks,

Michael

reply_mueller · ‎03-28-2021

Hello @MichaelHa ,

this is in fact very easy in Alteryx. What you have to do is compare every target "fruit" with every text line, to do so, a so called cross-product of the inputs has to be built.
This can be easily done, using the Append Fields tool, by using the same input for Source and Target.

Afterwards you just filter out the matching lines... and voilà:

Configure the Append Fields tool to allow for all sizes of Target records.
CAREFUL: This creates an output of input-size^2, so this can become huge for large inputs:

Make sure to also filter out those lines where the "Fruit" matches the "fruit Match" to avoid wrong results:

P.S. dadurch, dass es sich bei diesem Forum um die deutsche Alteryx Community handelt können wir auch gerne Deutsch sprechen.

Viele Grüße

Johannes
(Blue Reply)

grossal · ‎03-29-2021

Hello @MichaelHa,

welcome to the Alteryx Community and the German Forum! If you are able to, feel free to make your next post in German 😉

Thank you @reply_mueller for looking at it.

Best

Alex

MichaelHa · ‎03-29-2021

Danke für den schnellen Lösungsvorschlag! Es ist aber mit einem Datensatz von mehr als 300.000 Zeilen nicht machbar.

reply_mueller · ‎03-29-2021

Hallo @MichaelHa ,

du kannst die Größe der zu matchenden Daten natürlich verringern, wenn du vor den Target-Anker ein Sample-, oder Unique-Tool einfügst, dann schrumpft die Größe des resultierenden Datensatzes auf #(eingabezeilen) * #(suchterme) und diese Anzahl der Vergleiche brauchst du ja im Endeffekt auch. Alteryx ist releativ fix, was das verarbeiten großer Datenmengen betrifft. Give it a go.

So sähe es dann in dem optimierten Fall aus:

Eine andere Möglichkeit wäre es, dynamisch aus diesen Suchtermen eine komplexe RegEx zu erstellen und mit dieser die ursprüngliche Tabelle zu matchen.

Ob das allerdings große Geschwindigkeitsvorteile mit sich bringt, müsste man testen.

Viele Grüße

Johannes
(Blue Reply)

StephV · ‎04-19-2021

Hallo @MichaelHa,

vielen Dank, dass Sie die Antwort von als Lösung akzeptiert haben.

Es freut mich zu sehen, dass die deutsche Community (Danke @reply_mueller 😎) Ihnen helfen konnte.

Werden Sie morgen an der User Group teilnehmen?

Viel Spaß mit Alteryx, bei Fragen sind wir hier im Forum immer gerne für dich da.

Einen schönen Tag,

Steph Vitale-Havreng

Forum - Deutsch

contain Matching over several ines