Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

.xyz file input

Ozzy_Campos
8 - Asteroid

Does anyone have any experience using scientific .xyz files? They're used in sciences (something like computational chemistry to describe a molecular structure), but apparently there's not really a standard format, and you can incorporate comments throughout, I'm not really sure how to parse it consistently.

 

I attached one as a .csv, but I've got millions of these, and wasn't sure if I can just parse it like a general .csv, or if I have to get metadata from where the data is coming from to determine the parsing.

 

Bottom line, if anyone with some sort of a science background can point me in the right direction of a good resource on working with these, I can't really find much on this subject. I'm decent with R and Python as well, so if you have any insight on packages there, that would be useful.  

4 REPLIES 4
CharlieS
17 - Castor
17 - Castor

Could you post some example files with varying style and the desired output? The core structure seems simple enough that Alteryx should be more than capable of parsing variations on that core (The "general .csv" parsing approach you mentioned).

Ozzy_Campos
8 - Asteroid

@CharlieS  Thanks for the response, I copied and pasted some of the .xyz files - the image is what I'd be looking for; it's more involved than text parsing based on spaces, I'm trying to figure out things such as 

 

1) how do I determine units, does that come from metadata? I'm not confident they're all angstrom (10^-10)

2) Is the scientific software generating this by just pulling from a big library of stock metadata, or calculating these independently

 

Basically I've built a bunch of prediction tools and apps within Alteryx for myself, so it would be helpful to use these .xyz files within Alteryx (as opposed to the incredibly clunky lab software), as the end result is a simple 3-d axis (just tons of them), but I can't find many resources on how I associate them with metadata and trusting that they're consistent and parsed correctly.  I might be biting off more than I can chew right now though, but if someone with a computational science background can point me to a good resource, that would be helpful.     

 

 

Capture.PNG

CharlieS
17 - Castor
17 - Castor

So here's something I put together to get things started. I used the file name to determine how the file needed to be parsed (looking for "_fcs" or "_acs"). After that, I parsed the necessary information from each file (compound name, component, bond lengths). I figured the positions/bond lengths could be determined by values, so this definitely needs a chemists' insight. In the attached wizard/app below, there's a collapsed tool container with this work.

 

When I was done with that, it was obvious that a lot of redundancies could be consolidated, so I did that and wrapped it up into a wizard/app. This wizard allows you to input the file path and it will write a .yxdb of the requested information in the same folder. 

 

I'm sure there's a million things that need to be added before this is ready, but it should give you some good ideas to get started.

Ozzy_Campos
8 - Asteroid

Thanks a lot for taking the time to put that together! I'm going through it right now and tweaking some things and incorporating it with some other sections that I'm working on. I marked it as a solution, I'm hoping over the long-term that I can build an entire science application within alteryx, that works much better than scientific software from the late 90's.   

Labels