This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I'm 100% new to python, but im trying to use the tabula libary.
Basicly I want to to to load a pdf file, and make it as a dataframe.
And it loads the file correctly, but i get som odd error. And I have no idea what it means? 😕
"Warning: Python (1): Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray"
This is a cool library! The reason you're getting this error is that you're trying to combine dataframes of differing schemas. I used the following code to look at the rows, columns, and datatypes of the output of tabula. Tabula already outputs dataframes, so you probably don't need to re-convert them later.
for iteration in a:
print("rows:",len(iteration),"; columns:",len(iteration.columns), type(iteration))
For the pdf I input to tabula, my dataframes were all different sizes. Naturally, pandas didn't know how I wanted to combine them. Note that if you look at the variable "df", it still worked! Pandas just smushed all the dataframes into a single column. The red background text it gave you is the equivalent of an Alteryx warning, not an error.
Going forward (and depending on your use case), I'd recommend you either reformat your columns to align and use pd.merge() to combine all the dataframes output by tabula. You can use the help() function on a python function to ask Jupyter what parameters it wants. For example:
I also find that Google Colab is far more intuitive for learning Python than Jupyter notebooks. I practice most of my syntax in Colab before transferring it into Alteryx Jupyter.
If this helps, please consider marking it as a solution so others may find it. Thanks!