Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Use of Large CSV Files with Data Input Tool

mkeiffer
10 - Fireball
10 - Fireball

Hi everyone,

 

I need some tips/guidance on how to proceed with some CSV files that I am working on with Alteryx.  In the datasets, the number of records is 2,205 records but there is 6,000 fields per record.  

 

When I attempt to import this data file, I adjust the configuration for the size of the fields down to 10 characters in length, but it still takes a fair amount of time for the workflow to complete.  Even after the workflow completes, Alteryx tends to freeze or lockup when I navigate to other parts of the workflow.  

 

Is this file too big for Alteryx, or is there anything I can do to make this work faster and stop Alteryx from locking up?  The file comes in as a Tab delimited file, and I have actually only need the first 10 fields or so.  Is there a way to limit the number of columns the Data Input tool will initially bring in or anything else I can do stop Alteryx from freezing up?

 

Thanks in advance!

 

Mike

24 REPLIES 24
Qiu
21 - Polaris
21 - Polaris

@mkeiffer 
It is an interesting issue.
I guess maybe it is not possible to have some sample data?

I dont know how to limit in the Input, but maybe we can set the delimiter to something else and bring the  6,000 fields as one column?

mkeiffer
10 - Fireball
10 - Fireball

@Qiu I have tried changing the delimiter and the length of each field to see if that would help, but no success.  Everything is slow and Alteryx locks up.

 

I was attempting to use Alteryx to do some linking and data blending with these files, and this is one of about 15 files or so that I am working on with this case study.  I may have to use Python to try to deal with the large number of fields in the file, but I was curious to see if I could do it in Alteryx.

 

Thanks so much for responding!

Yoshiro_Fujimori
15 - Aurora

Hi @mkeiffer ,

 

I am not sure why you cannot read all the data,

but to select only the first 10 columns,

you may want to try R tool or Python tool to read it, instead of Input Data tool.

 

I used R tool as a sample.

Yoshiro_Fujimori_2-1682819124320.png

 

Code in R tool

Yoshiro_Fujimori_4-1682819485464.png

 

Input Data

Yoshiro_Fujimori_0-1682818926208.png

 

Output Data

Yoshiro_Fujimori_1-1682818984918.png

 

It should be possible to do the same with Python tool if you prefer.

Good luck.

mkeiffer
10 - Fireball
10 - Fireball

@Yoshiro_Fujimori Thank you so much for sharing this.  This was extremely helpful.  Hope you have a great day today!

mkeiffer
10 - Fireball
10 - Fireball

@Yoshiro_Fujimori I did change the path when using the attached workflow.  However, I got an error message when running the workflow and it said "Error in library(readr).  This is no package called readr."  I have installed the R predictive analytics with my version of Alteryx but I do not have R installed on my computer (I do have Python).  

 

Do I need to install the latest version of R on my computer in order to get the R tool in Alteryx to work correctly?  

 

Again, thank so much in advance for your help!

Yoshiro_Fujimori
15 - Aurora

Hi @mkeiffer ,

 

Please try to install the missing library like this:

 

if (!require(readr)) {
    install.packages("readr")
    stopifnot(require(readr))
}
library(dplyr)
df_sample <- read_csv("C:\\Users\\yoshi\\OneDrive\\ドキュメント\\1.Project\\202304\\Sample.csv")
df_sample <- select(df_sample, c(1:10))
write.Alteryx(df_sample, 1)

 

If it still doesn't help, please consider installing the missing library using this tool on the Community Gallery.

https://community.alteryx.com/t5/Community-Gallery/Install-R-Packages/ta-p/878756

 

I am not sure if it works as I have not used it. Good luck.

mkeiffer
10 - Fireball
10 - Fireball

@Yoshiro_Fujimori Thanks again!  I tried both approaches, adjusting the Alteryx configuration to install the readr package and I have also run the Alteryx tool to install the R packages several times, but no luck.  Is it possible the readr package is not included with the Alteryx downloads?  Should I try installing R on my machine and then adding the readr package that way?

 

Also, is it possible to see specifically what packages the Alteryx tool/workflow has available to install?

 

Again, thanks so much for your help!

 

Mike

Yoshiro_Fujimori
15 - Aurora

Hi @mkeiffer ,

 

Sorry to hear that my proposed solution didn't work.

The R package for Alteryx is installed here.

"C:\Program Files\Alteryx\R-4.1.3\"

 

You may want to check README.md in this folder and follow the procedure of

# A Recipe For Upgrading to a New R Version

 

Or if you are more familiar with Python, you may try Python tool to read the csv file.

Yoshiro_Fujimori
15 - Aurora

Hi @mkeiffer ,

 

If you are stuck with R library for Alteryx, you may also try Python tool.

 

from ayx import Package
#Package.installPackages(['pandas','numpy'])

from ayx import Alteryx

import pandas as pd
df = pd.read_csv('C:\\your path here\\Sample.csv')

df.iloc[:, :10]

Alteryx.write(df,1)

 

Good luck.

Labels
Top Solution Authors