Want to get involved? We're always looking for ideas and content for Weekly Challenges.
SUBMIT YOUR IDEAI got stuck at the delimiter stage and looked at the solution to help solve it. Everything was okay after that!
Can anyone help me understand how we go about identifying which delimiter to use?
A png of my workflow below:
Kind regards,
Andrew
Hi,
Please find my solution below. I had to use link mentioned somewhere earlier in the topic https://datahub.io/machine-learning/mushroom/r/mushroom.csv instead of the one in starting workflow.
This is in reply to @AndrewHoData ...
In this case, the data returned was a CSV file. The data came back as a single string in one cell. A CSV is a text file, and text files will use non-printing characters LineFeed (LF) and CarriageReturn (CR) to break lines.
Depending on the computer generating the text file, the text file might use one or both of these characters. These have ASCII codes of 10 and 13 respectively. The challenge assumes you are going to know this, which is a bit unfair maybe, but once you know it, it is true forever. Knowing that you have either LF or CR as the characters that mark the end of lines, we can use the TextToColumns tool to act on "\n", which is the tool's way of referring to NewLine ie LF, or "\r" which is the tool's way of referencing Return ie CR.
If you want to examine a string to see the Ascii code or Unicode values for the characters, I save the data into a file and then open it using Textpad. Textpad is a free text editor that can open the files in Binary mode, and it shows me value of each character used. This is useful when processing HTML data, as this can use odd HTML specific characters like non-breaking-space that do not crop up in normal text files.
Alternatively, the "challenge_69.yxzp" solution I have attached shows a way to do this in Alteryx. I truncate the string (to speed up processing), then split to one char per row with one of my own macros. Then I can convert the character to an Int value, and that is the ASCII code. In the attached example, once the data is broken down to one-char-per-record, records 301 and 302 show values 13 and 10 in the data, CR then LF. This tells me I can break the source string into lines with either \r or \n in the TextToColumns tool.
I used the https://datahub.io/machine-learning/mushroom/r/mushroom.csv
as well because the one in the start file is no longer functional but I was able to get it done with that.
I was also downloading the data from here: https://datahub.io/machine-learning/mushroom/r/mushroom.csv