Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Python and alteryx - Tabula

Hamder83
8 - Asteroid

Hi 

I'm 100% new to python, but im trying to use the tabula libary.

Basicly I want to to to load a pdf file, and make it as a dataframe.

And it loads the file correctly, but i get som odd error. And I have no idea what it means? 😕

"Warning: Python (1): Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray"


4 REPLIES 4
clmc9601
12 - Quasar

Hi @Hamder83,

 

This is a cool library! The reason you're getting this error is that you're trying to combine dataframes of differing schemas. I used the following code to look at the rows, columns, and datatypes of the output of tabula. Tabula already outputs dataframes, so you probably don't need to re-convert them later. 

 

for iteration in a:
    print("rows:",len(iteration),"; columns:",len(iteration.columns), type(iteration))

 

For the pdf I input to tabula, my dataframes were all different sizes. Naturally, pandas didn't know how I wanted to combine them. Note that if you look at the variable "df", it still worked! Pandas just smushed all the dataframes into a single column. The red background text it gave you is the equivalent of an Alteryx warning, not an error.

 

Screen Shot 2021-09-28 at 9.56.37 AM.png

 

Going forward (and depending on your use case), I'd recommend you either reformat your columns to align and use pd.merge() to combine all the dataframes output by tabula. You can use the help() function on a python function to ask Jupyter what parameters it wants. For example:

help(tabula.read_pdf)

 

I also find that Google Colab is far more intuitive for learning Python than Jupyter notebooks. I practice most of my syntax in Colab before transferring it into Alteryx Jupyter.

 

If this helps, please consider marking it as a solution so others may find it. Thanks!

dbmurray
8 - Asteroid

Not a direct answer, but more of an aside - there is also tabula specific R package. It may be easier to use than the python implementation. 

Hamder83
8 - Asteroid

Thank you for a super fine explanation, I will try and dive further into it 🙂 

This is definitely a good help! 

Hamder83
8 - Asteroid

Sounds interesting, i'll have a look at that, thanks 🙂 

Labels