Hi,
I'm working with some data and from looking at a sample it looks like there are a few values where someone has manually inputted "N/A" and similar values. I would usually open data in excel and look at what values are in the column using the filter option, to do the relevant find and replace, but the main data that I'm using is too big for my computer to open in excel.
Is there any way to create a list of the unique values in each column? e.g. -
Strings | Numbers |
a | 1 |
b | 1 |
b | 2 |
a | 3 |
c | 3 |
Would return -
Strings | Numbers |
a | 1 |
b | 2 |
c | 3 |
Thanks!
Solved! Go to Solution.
Hi @rachel9989
You can use the Summarize tool once for each column, using Group By as the method, and then (as long as there are the same number of rows), you can join together by row position.
Let me know if that helps!
Cheers,
Esther
Hi Esther,
Thanks for this, do you know if there's a way to do this in one step? I have about 30 fields so this would be helpful.
Thank you!
Hi @rachel9989
your question can be taken in one of two ways. @estherb47 has a correct solution to generate a distinct list of rows per field. I would add that using a Join Multiple by row position will generate the list regardless of whether the number of values per field is equivalent. See attached wf1. Again, credit to @estherb47
one downside of doing it in that fashion is the need for a new Summarize tool per incoming field. Check out wf2 for a dynamic way of getting the same list. This is the second workflow. It can handle any amount of incoming fields. Please note that getting these lists in any fashion can be computationally expensive as the data grow in size.
If I read into your question a different way, it sounds like you want to decompose every field into a list of strings and numbers. That is trickier. I'm working on that and will post shortly.
Hi @rachel9989 ,
Here is an answer to the second interpretation of your question. I am not sure you're looking for this and it might be largely academic. Still, fun workflow. Between this and the wf2, your question probably has been resolved.
Thanks, @seven
Definitely a fun exercise!!
@rachel9989 , you can try this approach to use one summarize tool overall.
Transpose the data to get everything into 2 columns. I include the sort for my own sanity in viewing, and for the downstream Tile tool.
Unique on both columns will get you all unique header/value columns. The Tile tool assigns row numbers to rebuild the table. And then a crosstab tool to rebuild.
Let me know if this is easier. No summarize required!
Cheers,
Esther
Hi @estherb47
good tag team! I like your approach, one thing I've noticed is that a Multirow is drastically more performant than the Tile tool. Might be good to do a side by side performance profile.
@EstherB47 @seven thank you both! Sorry, in my original post it made it seem like I wanted to separate strings and numbers - I just meant these as example columns, I should have done this differently... but both give the result I was looking for! thanks so much 🙂
Great solving with you, @seven . That's my favorite part of community, working through challenges with great minds
Cheers,
Esther