community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Convert a single text column that has multiple columns one below the other..

Meteoroid

Hi,

 

I have a pdf file that has tabular data with 5 columns. I have used python tool and converted the pdf file to text. Now I got all the columns and its data in only one single column one followed by other...I have to split this row into individual columns and load in into a table. There are no delimiters. The data in pdf looks similar to the one attached.

 

Capture.PNGCapture1.PNG

Highlighted

I would suggest the following.

 

1. Use a Sample tool to skip the first row

2. Use a Multi Row formula tool set to look back over 5 rows with expression [Row-5:Id]+1 to create a new Id column

3. Use another multi row formula to create a ColumnId grouped by Id with expression [Row-1:ColumnID]+1

4. Then use a cross tab tool to make the table

 

Sample attached

Fireball

@sushmitha0127

 

Here is an example workflow to do what you're asking. I'm pretty much using the same methodology @jdunkerley79 described.

 

Best,

mmenth

Quasar

Hi @sushmitha0127 

 

It looks like your data is in the following order 

Description_1

Description_2

Description_3

...

Description_N

MPN_1

MPN_2

MPN_3

...

MPN_N

etc

 

If that that's the case and all 6 columns are included that would give you 34 original rows = 234 records, incl the column names.  Your data appears to missing a value.  Probably a border condition missed by the python script

 

The attached workflow. Takes the input data, with the embedded headers

Input.png

 

calculates how many rows there should be based on 6 columns and performs a crosstab

 

solution.png

 

to return your original table.

 

Output.png

 

Dan  

Meteoroid

Hi.. 

Thanks this helped me to some extent. I could convert the first 2 columns but for 3,4,5,6 columns I have all the 4 column names one below the other and then the column values followed. After the column names 4th column values started.  As the 3rd column has only 2 values that came up at very last rows.

Labels