I have a file where multi-valued columns are represented as duplicate rows where one column is different:
Username, Name, Groups
johns, John Smith, admin
johns, John Smith, wheel
billt, Bill Thomas, staff
I want these the johns row to be a single entry with an array value for column 'Groups' that has ["admin", "wheel"]
The actual source of this data are standard unix /etc/passwd and /etc/group files. I have everything merged between these using a left join that seems to work fine to figure out the group names that each user is a member of, except I end up with a separate row for each group for each user.
Solved! Go to Solution.
Hi Tristan,
You can use the aggregate list function to achieve what you want.
Simply group by username and name, then use the list function on the Groups column.
See screenshot below.
For reference on list function, see documentation here:
https://docs.trifacta.com/display/PE/List+Function
Thanks,
Wei
Hi,
I hope this might help you.. try a list()
Perfect. Thanks for the help, Wei.
Appreciate the input Sridar, but I needed a bit more information to know where to put the LIST function within the aggregation transform.
what is your trifacta version?? if its 5, then
step1.: have 3 columns and values as below:
Step2:
add new recipe: type pivot
Step 3:
In Row Lable, select username and name
In values: type list(groups)
i hope this helps!.
missing images..step1
missing image step2:
missing image: step3
Hi Sridar,
Apologies for the miscommunication. I had already resolved the issue based on the input from Wei Zheng.
Thanks again.
Tristan