Hi everyone,
I need some tips/guidance on how to proceed with some CSV files that I am working on with Alteryx. In the datasets, the number of records is 2,205 records but there is 6,000 fields per record.
When I attempt to import this data file, I adjust the configuration for the size of the fields down to 10 characters in length, but it still takes a fair amount of time for the workflow to complete. Even after the workflow completes, Alteryx tends to freeze or lockup when I navigate to other parts of the workflow.
Is this file too big for Alteryx, or is there anything I can do to make this work faster and stop Alteryx from locking up? The file comes in as a Tab delimited file, and I have actually only need the first 10 fields or so. Is there a way to limit the number of columns the Data Input tool will initially bring in or anything else I can do stop Alteryx from freezing up?
Thanks in advance!
Mike
Solved! Go to Solution.
I don't think the R/Python ways are really getting at the core issue here. The core issue is that your rows are 60,000 characters long - but you only want the first 1000 characters or whatever. I'd recommend bringing this in as non-delimited ((\0 delimiter) - expanding the size of your data to 250000 characters - and then truncating your data post read (you could use a left([field1],1000) to take only the first 1000 characters in a particular row. You can then parse the fields as you want. You could actually do this in your initial input data (and set your field length as 1000 characters) and just ignore the resulting truncated fields warning.
Having said that - you could be having a readr issue because you are running a different version of Alteryx (and different R) than @Yoshiro_Fujimori 's solution...
@apathetichell Thank you so much for your response and I will try this approach as well. In regards to the version of Alteryx, I am using the latest version of Alteryx and the latest version of the R package tools. I would like to see if I can get the R approach to work that was shared by @Yoshiro_Fujimori as I like this approach too. I also would like to make sure I have the full functionality of all of the R tools in Alteryx as well.
Thanks so much for responding!
@Yoshiro_Fujimori Thank you again so much for your great help! I was able to successfully install the readr package for R, but now when I go to run my workflow, it says I have an error in my Select statement.
Here is the configuration from my workflow.
library(readr)
library(dplyr)
df_sample <- read_csv("C:\\Alteryx Workflows\\Data Club Project\\Source Data Original\\PS2.txt")
df_sample <- select(df_sample, c(1:10))
write.Alteryx(df_sample, 1)
Again, thanks so much for your help with this!
@Yoshiro_Fujimori Thank you again for your help with this! Your solution did work when I tried using a different CSV file.
I think the reason my Select statement did not work is because of the structure and nature of the original source data file. The original file i am working with is a Tab delimited text file and not a true CVS file, so I apologize for that mistake on my end.
I am attaching a sample copy of one of the data files. If you don't mind assisting me with this file, could you help me adjust my R select statement using the attached file?
Again, thanks so much for your great assistance with this!
@Yoshiro_Fujimori I do apologize for all of the posts. I was able to get my workflow to work in Alteryx by changing the "read.csv" portion of the configuration to "read.tsv" and that did the trick.
The only thing I need to fix now is the data is coming in with the first record as the column names/titles so now I just have to configure the R statement to not include the 1st row as the column titles. Any suggestions on how to change this would be most welcome.
Again, thanks so much for all of your help!
Mike
@Yoshiro_Fujimori Again so sorry for all of the posts, but I was able to get the R tool to successfully read my data files. Here is how I modified my configuration to read the file with no column titles:
library(readr)
library(dplyr)
df_sample <- read_tsv("C:\\Alteryx Workflows\\Data Club Project\\Source Data Original\\PS1.txt", col_names = FALSE)
df_sample <- select(df_sample, c(1:10))
write.Alteryx(df_sample, 1)
Thank you again so much for your help!
Mike
@Yoshiro_Fujimori Hi there. I tried running your Python tool solution, and everything looks like it is working. However, when I look at the output using the Browse tool, it looks like all of the columns are coming in from the CSV file, not just the first 10 columns.
I have tried tweaking it several times but I still cannot get it to work. When I run it with the Interactive portion of the Python tool on the left hand side of the screen, it looks like it is limiting the column output there, but when I look at the output with the Browse tool, I am still seeing all of the columns.
Any help or guidance on what I change?
Thanks again for your help!
Mike
@apathetichell I will try this approach and see how it works. What I wanted to do, in order to speed up the time it takes to run the workflow, was to not even bring in all of the columns. The 6,000 plus fields was causing the workflow to run slowly and also it was causing Alteryx to freeze up.
Thanks for responding to the post!
You don't bring in all the columns in my approach. Alteryx stops reading at X characters in the input data. You process it as pure text. You don't create columns until the text to column tool. I use tab as the delimiter but it could be anything.