Hi community
this might be a simple question:
if a line contains the same patterns more times than one, how can i parse all incidents to multiple rows.
If the pattern is (\d.{0,4}.?[Kk][Gg]) it will find and parse 15,5 kg one time - but becuase i work in unstructured data lines lookes different per eg.: record 34: live_weight 2500kg, loin 2,8 Kg, rear 18kg, vegi_25lbs etc.
I would like to make sure that all entities containing [kg]is parsed out into separate collumns in the same process so :
RECORD 34: "live_weight 2500kg, loin 2,8 Kg, rear 18kg, vegi_25lbs etc".becomes --> col1: 2500kg; col2: 2,8 Kg; col3: 18kg and in addition i would also like to get a consolidated count : parsed = 3 out of 4
2) i have found that specification of several of variants of the same product into same line. Eg.: in this eg. in fact there is 4 different motors:
RECORD: MOTOR 2500RPM, 2,8KW, 18NM, 7,5A,346V 1750RPM, 3,3KW, 18NM, 7,5A,398V 2000RPM, 3,7KW, 18NM, 7,6A,447V - HOW can i make a split regex based on recognition of same string-pattern -> so i get :
MOTOR 2500RPM, 2,8KW, 18NM, 7,5A,346V
MOTOR 1750RPM, 3,3KW, 18NM, 7,5A,398V
MOTOR 2000RPM, 3,7KW, 18NM, 7,6A,447V
BECAUSE I LOOK THROUGH a lot of lines solving this will be a tremendous help.
br anitta
Solved! Go to Solution.
Hi @Anitta,
Not sure but I'm guessing you're using the formula tool and parsing using the functions there? Consider using the Parse RegEx tool, (https://help.alteryx.com/10.5/RegEx.htm) which accept a RegEx, and has the option to split results to rows.
hi John
you are right i do use the parsing tool and i am stumbeling into issues using regex. Because the lines are containing the same info that i want to parse - using the tool is limiting me in making sure that i have all data for the same pattern because it only retrieves the left-one.
I upload a sample of the data here - lines contains data for actually three different items and using the parsingtool only tells me about one of them.
br anitta
Hi Joe
I am amazed. Thank you for your solutions. I am jsut now trying your solution on big sample data in order to scrutinize in order find out if i can connect your solutions in the same workflow or if i have to do it in separate steps. But your solutions actually do seem to perform:-)
Since the data is very unstructured i am not all together sure how data behaves accross databases - but this is one of the biggest issues so far.
Thanks anitta
Hi Joe
I am amazed. Thank you for your solutions. I am jsut now trying your solution on big sample data in order to scrutinize in order find out if i can connect your solutions in the same workflow or if i have to do it in separate steps. But your solutions actually do seem to perform:-)
Since the data is very unstructured i am not all together sure how data behaves accross databases - but this is one of the biggest issues so far.
Thanks anitta
Hi
I have been working on your suggested solution because i need two more steps in order to analyse data.
1) need to shuffle all pattern-items to colums and
2) need to have values for each header-text concatenated(if available then value if not then null)
Bascially i would like to have for each unique row to have pattern shuffled to columns and for each pattern to have the value attached in a second row. In this process i need to scrutnize the relation between "matching patterns" if they are the same or not (partly manually), but ultimately i would like to have the columns for matching pattens concatenated and the values (null/actual value ) listed underneath in separate row.From where i am now i cannot use the cross tab tool because i have only name-txt and no values, so i was thinking to parse out the values in a separate step and afterwards concatenate based on record id.
But do you see another way around this? I have attached the additional step so far in work flow.
br anitta