Hi Alteryx Gurus! My first post so please be nice to me and apologies if I'm asking daft questions...
We have a new install of Alteryx Server and I'm trying to answer some cyber security questions before it goes into production. I've dug high and low through the community posts but there seem to be a lot of conflicting answers. I've also been through the MongoDB database collections but I'm still not sure I'm seeing the whole picture.
1) I understand that metadata (workflows, scheduling, apps, etc) are stored in the embedded MongoDB, but does any of the actual data being processed in workflows ever move through the MongoDB instance? Or is the processing all done in memory, external files, or maybe temp tables I'm not seeing? For example, if I am processing a file containing PII, is there a chance this could be stored in the AlteryxService MongoDB for any length of time?
2) If the answer to number 1 is yes, is there an option I haven't found yet to encrypt the data at rest within MongoDB? (I'm guessing not as I believe it's an enterprise option, but asking anyway!)
3) If I run a job from the command line using alteryxenginecmd.exe, does any of the processing touch the MongoDB database, either metadata or processed data? It looks from the collections like the answer is no... Also I've downloaded and run the usage tool, which leads me to believe MongoDB is only storing data relating to the Gallery-run jobs/apps. Is that correct?
4) I haven't yet seen any connectors that incorporate an option to encrypt data transmissions for example if I'm connecting to a database on another machine, or another network. From various posts I've seen it seems like this might be a bit of a theme in terms of enhancement requests - would that be fair to say?
Thank you very much in advance for any light you can throw on the above!
Best regards
Luke
Solved! Go to Solution.
@LukeDolman Thank you for your questions and I will do my best to answer them.
1. This is going to depend on how the workflow is configured, the assets included when the workflow is packaged, and how the workflows output is configured. If a workflow is published to gallery or scheduler that has an input file included as an asset, and the user selects to package the asset, the file will be packaged with the workflow when it is uploaded and stored in the database. These files including the workflow and all included assets are packaged as a yxzp file and split into ~16MB binary chunks which are then stored in the database using a proprietary method. The chunks can be found in the AlteryxService database in a collection called AS_App_Chunks. This data isn't technically encrypted, but is extremely difficult to piece back together without using the server's API functions. Another thing to consider is that result output maybe stored in the MongoDB also depending on how the workflow's output is configured. These output files would be stored in the same database in a collection called AS_ResultsFiles. They are binary encoded, but are not encrypted. If these items are a concern I would recommend not packaging input sources with the workflows and ensuring outputs are written directly to databases or network file shares instead of outputting to local or relative paths. You can also enable some persistence options that will remove result output after a defined period of time.
2. No, the current MongoDB driver doesn't support encrypted connections to MongoDB (either embedded or user managed), and the embedded MongoDB does not have the option to encrypt data at rest database wide (this may be available with user managed enterprise versions of MongoDB check with your MongoDB admin for details). We do store some data in the database in an encrypted state. This includes but isn't necessarily limited to; Data Connections, Workflow Credentials, and any sensitive information such as passwords.
3. Workflows run from Designer or the AlteryxEngineCmd directly do not interact with the server's database in any way. They will store data necessary for the workflow in memory and in disk cache as needed during processing. Only Gallery runs and scheduled workflows will interact with the database. Workflows run via Gallery and the servers scheduler will also store data needed to process the workflow in memory and disk cache as needed while the workflow is running.
4. This is really going to depend on the specific connector in use. If the connector utilize a REST or SOAP API that requires an SSL/TLS connection that connection would be encrypted and data in transit between that source (in either direction) would be encrypted as it would traverse this connection. This also applies when connecting to any HTTPS or SFTP service via the Download tool. Any connection to a database using ODBC or another database driver (OCI, OLEDB, etc...) would only be encrypted if the driver supports encryption. Even then the driver would need to be properly configured to establish an encrypted connection via the DSN or connection string, and the database server you are connecting to would also need to be configured to accept encrypted connections. If this is properly configured the connection to the database would be encrypted by the driver and data in transit would be encrypted.
Hi Kevin,
Thank you very much indeed for such a detailed response. That answers everything I need to know - much appreciated!!
Kindest regards
Luke