Separating the two workflows. This is what they do, but I haven't checked any validity of what's being done.
- Linear Regression:
- This takes a list of dimond prices and some variables representing each entry. The Linear Regression uses that data to train a model that represents the combination of variables that make up the price. Once this model is trained, it's output via the O connector and into the score tool. The score tool ingests some new dimond info and uses the model to predict the price based on the variables provided.
- The Reports of the model are output to Excel for some reason. In practice, you would use these reports to work out what variables to select etc in order to train the model better.
- As for the summarise and formula... no idea, must be related to the actual task.
- Scatter Plot:
- Just a simple scatterplot to look at the data and see if anything stands out. If this looked evenly dispersed across a grid, then we would know that we're not going to see a trend without stratifying it for instance. In this, we can see that there's an expected correlation as Carat goes up, so does price, but that's just an assumption at the moment.