Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

General Discussions

Discuss any topics that are not product-specific here.

File Metadata not working in Data Catalog

Dynamomo
11 - Bolide

I added a Data Source for the files at a specific network path and was successful in creating a Data Catalog for the files there.

However, I noticed two strange things:

  1. A folder that got renamed to something else is still showing up in the Data Catalog.  And inside the folder are files that don't exist there anymore because that folder got renamed so those files are also found under another folder (the one I renamed the first folder to).  I have synched the Data Catalog a few times now....how does Alteryx automatically get rid of that folder with the files that are already under another folder in the Data Catalog?  These are now duplicates that don't really exist.
  2. The files in the various directories show up but when I click on them to see metadata, a error message pops up "File wasn't parsed properly".  I have tried on csv and excel files.  The csv file shows 0 for number of rows but it has 1500 records in it. See below.

output metadata.PNG

 

3 REPLIES 3
MartinM1
Alteryx Alumni (Retired)

Hi @Dynamomo ,

 

  1. You are correct that there is a defect around renaming files on filesystem and then re-syncing Data Catalog. Thanks for pointing that out, we will look in to for future release. As for now, I can only recommend to manually delete this folder from Hub.
  2. Could you please provide me with some additional information? Which delimiter and encoding is used in your csv file? Would it be possible to share with me structure of header so I can test it on the csv file with the same structure as you have? On top of that, can you please share which version of Hub you are using?

Thanks,
Martin

Dynamomo
11 - Bolide

Hi @MartinM1 ,

Thank you for responding and confirming the defect.

Regarding the lack of metadata, nothing is showing for any of the files in the data catalog, doesn't matter if it is csv, xlsx, txt (although I'm not clear on what file types are supported).  These files were output from Alteryx so the csv is just standard comma delimited output from Alteryx.  Hub is 2020.4.

I have uploaded two of the files from the data catalog here.

Thanks

MartinM1
Alteryx Alumni (Retired)

Hi @Dynamomo,
 
thanks for details and attached files. Opening them through data catalog (after performing sync operation) I am able to see proper list of columns, however some metadata are not valid (file size or number of rows). While I am not able to fully reproduce the error you encountered using csv and xslx files, I am able to reproduce it with yxdb files - we will have to look in to the parser to see why its happening (especially since files are parsed correctly, when stored in Hub filesystem).

 

For columns in csv and xlsx files, it might help to delete whole synced folder from Data Catalog and sync data source again, that could give you right list of  columns at least for those file types. I am sorry for inconvenience.
 
Thanks,
Martin

Labels