This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have a question about classification of string values.
Essentially, I have scraped data from variety of online sources that give me three fields of data:
Store Name | Product Description | Price | Location
I wanted to compare this data against Consumer Price Index to understand which stores or areas are contributing to price inflation using a linear regression.
The store name is standardized, the price is in USD, the date the data was collected is the last run of the workflow, and the location is a 5 digit zip code. The issue I am having is the Product description.
The product description could be a number of items such as: "Bread", "Round Bread", "Ciabatta", "French Brioche". All of these in theory should fit under the group "Cereal and Bakery Products".
What is the best method to approach this problem. I was thinking using a find replace with a Keyword and a Group, but I did find if I use a keyword like "Bread" it could potentially pick up "Breaded Chicken" as a "Cereal and Bakery Product". Therefore, I would need to be very specific, and the classification would be very labor intensive. Does anyone have any other recommendations for classification?
Sounds like fun! Help me understand how you'd like to use "Product Description" in conjuction with CPI data. There are tons of tables available for CPI On the surface it's going to be hard because you don't have anything to compare it to BUT you might be able to use a corpus that already exists out there and/or use other data sets to help you, just need to get clear on how exactly the product description will help you on this.
The BLS has an API that allows you to extract information on Consumer Price Index - Average Price Data.
The data I'm looking at is usually grocery store data, and I want to show how different locations and stores differ vastly from each other. The price data can help me group things, because the BLS has already grouped and tracked these items over time. However, they don't provide much data about specific locations.
I'm trying to match to descriptions such as the one below:
APU0100702111 Bread, white, pan, per lb. (453.6 gm)
Hopefully, I can find a more generic group from the API for Bread, Bakery goods, etc.