I am relatively new to Alteryx and thus far been successful with most data parsing tasks. I have recently started to experiment with some public data available from https://download.bls.gov/pub/time.series/ap/ and it has posed a challenge for me.
The file in question is located here: https://download.bls.gov/pub/time.series/ap/ap.series
This file is a challenge because the "field" called series_title is filled with individual elements that are separated by commas.
It is difficult for me to determine how best to read this file successfully using the "download" tool and properly parsing that particular LONG field. If it has been "quoted" on both ends, it would be easier.
The data from that file looks like the following:
series_id area_code item_code series_title footnote_codes begin_year begin_period end_year end_period APU0000701111 0000 701111 Flour, white, all purpose, per lb. (453.6 gm) in U.S. city average, average price, not seasonally adjusted 1980 M01 2017 M10 APU0000701311 0000 701311 Rice, white, long grain, precooked (cost per pound/453.6 grams) in U.S. city average, average price, not seasonally adjusted 1980 M01 1981 M12 APU0000701312 0000 701312 Rice, white, long grain, uncooked, per lb. (453.6 gm) in U.S. city average, average price, not seasonally adjusted 1980 M01 2017 M10
So my question is how would a person successfully read this file structure directly from the web?
I've included the experimental workflow I constructed for reading this file. Clearly things go bad once it gets to the seriestitle field for obvious reasons. Hopefully this is an interesting puzzle to someone. I'm curious how to best solve this scenario since I can imagine it will occur from time-to-time.
Thanks in advance for comments and suggestions.
-Stew
Solved! Go to Solution.
Hey @ssutton,
You are actually so very close... You have the first T2C tool to split newlines, remove the Data Cleansing tool and then split in the 2nd T2C on tab (\t).
When I tried to copy your sample above into a Text Input, it all split apart perfectly which indicated that there must be a delimiter. In this case it was a tab separated file.
Kane
@KaneG,
Thank you but I must be missing something since I don't get the same result. I also determined that tab was the delimiter but for some reason I can't get it to behave correctly. Here is my "simplified" / "modified" workflow with the suggestions you made, yet it does not behave as I expect it would. I must be missing something important (perhaps it is staring me in the face!)
Notice how the columns for area_code, item_code, etc. are "blank"? It is not seeming to parse correctly for me. Not sure why? It is as if there are extra "tabs" there but I don't believe so. More of a configuration error on my part I believe. Here is the workflow that produces that result (uploaded as Sample-Workflow-02 for inspection.
Hopefully my error is entertaining.
Best regards,
-Stew
Your 2nd T2C Tool has ' \t' in it rather than just '\t'. And so the T2C will be splitting on spaces and tabs.
Kane
Thank you Kane! I had a suspicion that it was something so easily missed. Good lesson learned here on my end. Much appreciated.
-Stew
User | Count |
---|---|
18 | |
14 | |
13 | |
9 | |
8 |