community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

LEARN MORE

(Advice on) Use of PCA for Predictive Analysis

Meteor

Dear Alteryx Community,

 

I have a few questions regarding the use of PCA for predictive analysis.

 

I have around 16 Xs (process parameters) which are leading to Y (product quality) for an industrial process.

 

Among the 16 Xs, 6 of them are related to the same equipment so I am considering the use of PCA to lower the computation needs since I believe there should be similarity in their behavior.

Once I have run the PCA analysis, I get 6 principal components as follows:

 

screenshot.2019-03-18 (2).png

 

My first conclusions are (and please correct me if am I wrong):

- The correlation between the Xs and PCs varies between -1 and 1

- PC1 is more related to the last 4 Xs (around 50% for each)

- PC2 is more related to "E7_Speed" (-0.92)

- By including PC1 to PC4 in my dataset instead of the 6Xs, I am still getting 95% of the variation representation.

 

After running a predictive model with the 4 PCs and the 10 remaining Xs, I am getting a new model. What should be the right way to test it with a new dataset?

Is there a way to extract the PCA formula to apply it to the new dataset? (I suspect it would not make sense to run a second PCA on the new data)

 

Thanks a lot for your advice,

Pierre-Louis

Alteryx
Alteryx

You are correct that it does not make sense to run a second PCA for your second dataset.

 

R provides the ability to extract the PCA formula and apply it to a new dataset with the predict() function. Unfortunately, this isn't built directly into the GUI. I would recommend using a R code node to build your PCA and apply the formula to the new dataset.

Meteor

Thanks @AndrewKramer for your confirmation.

Unfortunately, using a R code node to extract the relevant information is far beyond my competencies... so far... but I will explore this option :-)

Alteryx
Alteryx

This link provides a pretty good example of how to do it. Let me know if you have any questions.

 

https://stats.stackexchange.com/questions/72839/how-to-use-r-prcomp-results-for-prediction

 

 

Meteor

Hi @AndrewKramer !

I have started to lean Python as my 2000's programming skills (PHP mostly!) are obsolete!

Unfortunately, I cannot build on the example you have provided.

Would you be so kind to guide me on the example I have designed? 

Alteryx
Alteryx

I was able built a quick example similar to yours. I used the R code tool to construct the PCA model, then got the PCA values for the training and the validation dataset. I was then able to build my Gradient Boosting model on the training data, and score the validation.

pca.PNG

Here is my code in the R tool. Be sure to use read.Alteryx and write.Alteryx to pass data two and from Alteryx.

#Load Data
data <- read.Alteryx("#1", mode="data.frame")
new_data <- read.Alteryx("#1", mode="data.frame")

#Run PCA on original data
pca <- prcomp(data[c('age','balance','duration','previous','campaign')], retx=TRUE, center=TRUE, scale=TRUE)

#Score PCA on original data
#Select 4 Principal Components
pca_data <- data.frame(predict(pca, data[c('age','balance','duration','previous','campaign')])[,1:4])
pca_data$y <- data$y

#Score PCA on new data
#Select 4 Principal Components
pca_new_data <- data.frame(predict(pca, data[c('age','balance','duration','previous','campaign')])[,1:4])
pca_new_data$y <- new_data$y

#Write to Alteryx
write.Alteryx(pca_data, 1)
write.Alteryx(pca_new_data, 2)

Let me know if you have questions.

Meteor

Hi @AndrewKramer !

Victory! I have been able to adapt your code, calculate the PCA parameters on the first set of data and apply it on a second one:

 

#Load Data
data <- read.Alteryx("#1", mode="data.frame")
new_data <- read.Alteryx("#2", mode="data.frame")

#Run PCA on original data
pca <- prcomp(data[,3:8], retx=TRUE, center=TRUE, scale=TRUE)

#Score PCA on original data
#Select 4 Principal Components
pca_data <- data.frame(predict(pca, data[,3:8])[,1:4])
pca_data$y <- data$y

#Score PCA on new data
#Select 4 Principal Components
pca_new_data <- data.frame(predict(pca, new_data[,3:8])[,1:4])
pca_new_data$y <- new_data$y

#Write to Alteryx
write.Alteryx(pca_data, 1)
write.Alteryx(pca_new_data, 2)

Thanks a lot for guiding me through the whole process, I owe you one!

Pierre-Louis

Alteryx
Alteryx

Glad to help

Labels