Hi,
BACKGROUND...
I have a very large flat text file (HTML). The text is minified, so 85,000 plus records are trapped in one line. I am trying to bring it into Alteryx.
INPUT TOOL SETUP
I have the Input Tool set to "Read it as a fixed width text file". EOL is set to None. Record Length is set to 6500. Field Type is set to V_String. This results in 4,195 records that I assume are confined by the Length of 6500.
NEXT STEP
I can then use the Text To Columns tool to break the lines into individual records using a pipe "|" I added after the end of each closing table row tag before bringing the data into Alteryx.
PROBLEM
The initial 4,195 records stop at 6500 characters. They do not necessarily terminate nicely at the end of a table row. They may terminate mid-stream. This results in a few thousand broken records. (LOL)
QUESTION:
Is there any way to increase the initial Length to greater than 6500? ALTERNATIVELY, is there a way to merge these initial 4,195 records into larger chunks with a much higher size?
Solved! Go to Solution.
No joy. That does not appear to work. The file is 120MB. I have been able to reduce it to 28MB, but it is still minified and it is extremely difficult to work with in popular text editors, such as ATOM, Sublime Text, or UltraEdit.
@hellyars Not sure if I follow, but have you tried importing your file as text, set the delimiter to none (by using \0), and setting the field size limit to something ridiculously large?
\0 also does not work. It tries to make every table row and fails.
I had to find a text editor that could handle a 120MB minified file. Neither Sublime Text or ATOM worked. UltraEdit worked. Added a \n after each </tr>. Deleted unnecessary html tags, and replaced others with a delimiter to allow split rows into columns.
User | Count |
---|---|
19 | |
14 | |
13 | |
9 | |
8 |