Hi All,
I'm seriously stuck as to how I would approach this.
I have a data base that has multiple lines that have to be consolidated. I feel like a multirow + a multi field tool could accomplish this task but I have never used those tools.
There are multiple examples of logic in the attached excel examples. I don't even know where to start. Any help would be much appreciated, I'm drowning and the deadline is within 2 days.
I looked briefly, pulling in your first two scenarios. You're looking for the infamous "one two punch" of Transpose and Crosstab tools. Take a look at this example and see if it aligns. The workflow is also attached.
Since you're dealing with many "ATTRIBUTE" columns, you need to make sure dynamic fields are enabled on the tools, so that they scale with your needs.
Hopefully this gets you in the right ballpark. -Jay
This workflow might give you a starting point. It's not dynamic but since you have a short deadline it may do the trick. The setup worked for the first example you gave; it did not work for the second example because I wasn't sure what constitutes a unique record. For example, you joined Supplier Site "40-2 FLEETWOOD COURT|1" onto "40-2 FLEETWOOD COURT". Should these be joined onto each other? If so, just remove the pip delimiter and number at the end of the Supplier Site name column.
Also, I'm confused how data that is in Attrribute4 is then output into Attribute7 (second screenshot). I could be missing something but if you can clarify this I might be able to help more.
Hi Hsandness,
The unique key would be Supplier Name, Procurement BU, and Address. Supplier Site isn't considered. If there are duplicate keys mentioned earlier and they also have values in ATTRIBUTE4, then the start of the values will be in ATTRIBUTE7, 8, 9 etc
So, I tried coming up with a fix for those, but then they could have the above rule but also can have all the others as well, I'm not sure how to combine the rules all into one while making the modifications and keeping the original number of rows intact. I've attached part of the flow I am working on if you'd like to see, I think I am getting to somewhere, but it singles the rows out individually and adds duplicate ATTRIBUTE's when I need all of 000000495|3 to be in ATTRIBUTE4, and then all of 000000495|4 in ATTRIBUTE7, and then 000000495|5 to be in AATTRIBUTE8, etc. There are a few scenarios that have more advanced logic than the examples I was given / passed on to you here, like if duplicates in the same ATTRIBUTE column, then move farther out, like a cascade.
I'm fairly busy today and I know you're running on a tight deadline as well, so I apologize, but I won't be any help before this needs to be done. I may take a look at it over the weekend to see if I can still come up with a solution. If I do, I'll get back to you before Monday.
Otherwise, depending on how many records you have and how many times you will have to repeat this process, you could always do it the old-fashioned way (in Excel) to just get it done and not have to deal with the headache of building custom logic.
Completely understandable I appreciate you taking the time out of your day to even look at it in the first place. Funny part is I'm helping out another team that didn't give me the full logic and this tight of a turn around. I spent all day yesterday poking holes in it because I was finding discrepancies between their examples and the data.
@Skyline_TnL Did you get a chance to play around with Transpose and Crosstab? Maybe I missed some part of your need, but I really do think it can be as simple as that to dynamically align the different values coming from the attribute columns... Let me know, I may have more time to support if needed. -Jay
Yes thank you! I never hit reply, I guess. However, there are other issues with it as far as I can see, I tried combining yours with a basic filter on attribute4 and there seems to duplication issues still with some examples. Something like Supplier name "Verizon". I attached the workflow and its entire data set above, I tried a work around to carry those values that match the unique key Taxpayer ID or Supplier Name + Procurement BU + Address Name and move them to different ATTRIBUTE columns alike a cascade. There seems to be duplicates in ATTRIBUTE4 and ATTRIBUTE5 columns still.
In the original data, under Address NAME "Assessments LLC" is another example. You had concatenate as the output but it was combing them and I needed them stretch across the different columns ATTRIBUTE4 to ATTRIBUTE 7 / 8 /9 if there was more than 1 grouping. ATTRIBUTE5 to 10/11/12 etc.
Ya, i'm seeing what you mean with the concatenate. The problem is, now that i've added in all your use cases, is that your data has values stacked under A4 and A5, for example and you're expecting them to know to transfer themselves over to A7 or A8. Without helping the workflow know when and where to cast those values, there isn't a way to force them over to different Attribute columns in the output.
Regarding the duplicates, when trying Join methods, for example is that you have multiple values that match up to themselves (Nulls in particular), so you're going to need to either figure out a way to remove those rows during the merging process or you're going to have to compare afterwards and remove the extra rows by seeing where A4 = A5 (as one example) and filter those out.
To me though, the biggest issue with the data is that you have an expectation that multiple values in A5 are supposed to know that they'll become an 8 in the output (A4 to A7 in another example)...
~~~~~~~~~~~~~~~~~~~~~~
Okay, so I played around with this a little more... If you can ignore what Attribute it gets re-assigned to, but need to prioritize putting the values on the same row; then you can rename the "Name" field after the Transpose and then when you bring it into the Crosstab it'll concatenate everything that matches the key. Then you can text-to-columns them back out into unique fields. Obviously I'm getting more and more creative here, but see if this spurs some ideas to help get you where you need to be. It should still scale nicely for you (attached).
Other than that, I think i'm tapped out. Hope this is useful, -Jay