Does anyone have any experience using scientific .xyz files? They're used in sciences (something like computational chemistry to describe a molecular structure), but apparently there's not really a standard format, and you can incorporate comments throughout, I'm not really sure how to parse it consistently.
I attached one as a .csv, but I've got millions of these, and wasn't sure if I can just parse it like a general .csv, or if I have to get metadata from where the data is coming from to determine the parsing.
Bottom line, if anyone with some sort of a science background can point me in the right direction of a good resource on working with these, I can't really find much on this subject. I'm decent with R and Python as well, so if you have any insight on packages there, that would be useful.
Solved! Go to Solution.
Could you post some example files with varying style and the desired output? The core structure seems simple enough that Alteryx should be more than capable of parsing variations on that core (The "general .csv" parsing approach you mentioned).
@CharlieS Thanks for the response, I copied and pasted some of the .xyz files - the image is what I'd be looking for; it's more involved than text parsing based on spaces, I'm trying to figure out things such as
1) how do I determine units, does that come from metadata? I'm not confident they're all angstrom (10^-10)
2) Is the scientific software generating this by just pulling from a big library of stock metadata, or calculating these independently
Basically I've built a bunch of prediction tools and apps within Alteryx for myself, so it would be helpful to use these .xyz files within Alteryx (as opposed to the incredibly clunky lab software), as the end result is a simple 3-d axis (just tons of them), but I can't find many resources on how I associate them with metadata and trusting that they're consistent and parsed correctly. I might be biting off more than I can chew right now though, but if someone with a computational science background can point me to a good resource, that would be helpful.
So here's something I put together to get things started. I used the file name to determine how the file needed to be parsed (looking for "_fcs" or "_acs"). After that, I parsed the necessary information from each file (compound name, component, bond lengths). I figured the positions/bond lengths could be determined by values, so this definitely needs a chemists' insight. In the attached wizard/app below, there's a collapsed tool container with this work.
When I was done with that, it was obvious that a lot of redundancies could be consolidated, so I did that and wrapped it up into a wizard/app. This wizard allows you to input the file path and it will write a .yxdb of the requested information in the same folder.
I'm sure there's a million things that need to be added before this is ready, but it should give you some good ideas to get started.
Thanks a lot for taking the time to put that together! I'm going through it right now and tweaking some things and incorporating it with some other sections that I'm working on. I marked it as a solution, I'm hoping over the long-term that I can build an entire science application within alteryx, that works much better than scientific software from the late 90's.