I am using the Download node to download a csv from a url and I am getting ~1500 rows when I am expecting ~400,000.
My workflow (attached) has the following nodes connected in the following sequence:
1) Text Input
Contains the request url.
2) Download
Downloads the csv from the request url using default settings.
3) Text to Columns
Breaks the 'DownloadData' column into rows, using newline as the delimiter.
This node results in 2123 rows, ~400,000 expected.
4) Text to Columns
Breaks the 'DownloadData' column into columns, using comma as the delimiter.
This node results in 1672 rows, input was 2123 rows.
Doing the equivalent in Python, I get the expected ~400,000 rows, so it doesn't seem to be the API.
import pandas as pd
url = "http://cdec.water.ca.gov/dynamicapp/req/CSVDataServlet?Stations=MRZ&dur_code=E&SensorNums=1&Start=2000-01-01"
# Download csv
df = pd.read_csv(url)
# Check length
len(df)
Out[4]: 399788
Any idea what is going on here? Why aren't I getting the full csv? What is happening to those rows in between nodes 3 and 4?
Solved! Go to Solution.
@tbergama Its actually pulling in the full dataset, you just have to add the browse tool. The browse tool allows you to take a look at the full dataset. You can also use an ouptut data tool to write it out to a flat file.
@DiganP That makes a lot of sense. Thanks.