Input tool seems to be importing blank columns

Question

I am working on a set of data where there are 12 files for each month. The size of data is pretty huge i.e. almost 0.5 MN rows per month. I need to have a summary of the entire data hence I pulled all the files using the * wildcard and 11 of the files could get imported.

There's a file of one particular month which could not be imported with the wildcard due to the reason that this file had a different schema, I imported that file through a separate input tool (since it was only one file) and thought to union it with the other input tool (which has the rest 11 files. On importing this file, however, I observed a unique issue with the file. The input tool seems to be importing all the blank columns in the tool which is causing the data to grow from 70 MB to a few GBs. I have attached the screenshot below of the input configuration tool which shows only the 19 fields whereas the select tool and the summarise tools show that there are fields starting from F20 to F16384. Thus it seems that the input tool is importing some 16365 columns additionally. While I can deselect the data in the select tool, I wanted to understand what is causing this and how to address this at the threshold so that the blank columns are not imported in alteryx. This is because I wish to cache the data so that the work is faster (considering the size of the data of 12 months). I tried manually selecting all the blank columns and pressing delete. But that is not helping.

Any inputs on what may be wrong with the data?? (I will not be able to upload the file due to the confidential nature of the data).

Select tool showing many columns

Select tool 2nd screenshot

TapasTR · Answer

thank you @JoeS  - Copying and pasting the data into a new sheet worked.

I also tried to use the other method suggested i.e. selecting the empty 16000 or so columns and doing a right-click and delete. This, however, did not work since I guess excel would try to delete all the 16000+ columns which did not happen due to some memory issues.

While this workaround solves the problem, I am still curious to know the way Alteryx also behaved here. The first time data is imported using the "input" tool, the input tool configuration window only showed the relevant columns where data was. Also, if I had a "Select" tool besides it, the select tool also had the same limited number of columns in it's configuration. It is only when the workflow is run, that the alteryx suddenly starts recognizing the other 16000 + columns / fields. Also the input tool configuration continues to show only the fields which have data and not the remaining ones where there's no data.

Any inputs why alteryx would not recognise these blank columns (which is correct) at the first instance but later bring them in when the workflow is run (which is incorrect).

Thanks much

JoeS · Answer

Hi,

This could well be due to the used range in Excel being what Alteryx reads. So perhaps there was information across those columns, a formula or text? Then it was deleted, so the cells appear to be empty.

But they are that, empty and not unpopulated. So when Alteryx reads the file it is given a load of empty cells, rather than give the range of cells that appear to be populated.

It's one of the quirks with working with Excel.

Copy and paste what you need to a new sheet/workbook can be a good way around it.

You can also do a full delete (highlight the columns and/or rows and then right click and delete rather than pressing the backspace or delete button on your keyboard).

TapasTR · Answer

I tried copying and pasting the data in a new file and that worked, however, i would still like to understand what was wrong with the original file, since I also selected all the blank columns and pressed delete which did not solve the problem.

Thank you