This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I am convinced one of my main problems is….I don’t know the right vocabulary, so when I search for a solution, I am not asking the right question.
So forgive me.
I have a list of businesses with various data attached to each. In this case, Walmarts in Wyoming.
I want each row to represent ONE physical location. However, the data I have breaks down the subsegments of Walmart (Pharmacy, bakery, etc) I want to agg them up in a certain way.
full_walmart_list is an example of the complete file. You will see there are 51 records, but analyzing the zip/lat/long reveals there are only 7 walmarts.
Final_walmart_list is what I want as the result of my workflow.
Here are some ideas/assumptions/issues:
I think I need to distinguish walmarts by looking at zip/lat/long. I can not just rely on lat/long (and yes that is unfortunately the level of detail on lat/long I have). If I did just lat/long, the first 14 records would be considered 1 walmart – but they are in different zips. And that’s true. In Cheyenne, there is a Walmart 5 miles away from a Walmart.
Likewise, I am guessing I can’t rely on JUST zips – as there are some big ip codes in the U.S.
I want to add up all the EMPLOYEES, REVENUE and COUNTS of Walmart departments. This is represented in columns G,H,I in final_walmart_list.
I want to collect the NAICS code (business code) of all the departments I did add up. This is represented in columns J-R in final_walmart_list.
Now the trick here is……what ONE record do I choose to “represent” the business in its entirety? Here’s what I came up with – use the record that has the most employees. Doesn’t always work – but don’t know what else to do.
Remember – I am trying to do this with EVERY business in the US – so it’s not just walmarts in Wyoming.
I wouldn’t even mind anyone pointing me in the right direction (which tools to use)
Now the out-of-the-box macro is designed to take in a full address to provide this matching capability, but there is no reason why you could not modify/create another batch macro that looked at matching on alternative criteria.
Your problem seems to be primarily around knowledge discovery, in terms of Alteryx tooling the above groups of tools should equip you with all your need to slice and dice your data.
Specifically Fuzzy Match, with its extended custom options will allow you to match on different field partially and tools under Transform will allow you to group and transform your values to get one record in the output, which i understand is your primary objective.
It's both, the purple set of icons are from Join/Fuzzy Match, whereas the orange ones are from Transform section of the Alteryx palette of tools.
What i tried to emphasise is that your problem is primarily around matching the knowledge and expertise into a workflow (it's not just a set of tools that will do the trick, as when you open the data-set beyond Walmart, you may notice and need to adopt to a new findings about your data ) - this is where Fuzzy Match match and it's partial matching capabilities come in.
Transform tools like: Summarize, Transpose, CrossTab will give you some magical powers to organise your data and assemble the final output - for example in the attached workflow i've summed the 2x numerical columns at the end and whereas group by at different levels is allowed withing the same operation, Cross Tab is not and thus the need to use an extra tool - luckily the group field from Fuzzy Match is now a common denominator and allows to slice and dice your data however you want.
Hope this makes, if you need help with any particular tool or scenario, just let me know?