community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Knowledge Base

Definitive answers from Designer experts.
New Data Science Blog

Check out the latest post: All Models Are Wrong

READ MORE
Announcement | We'll be doing maintenance between pm 4- 6pm MT, which may impact your experience. Thanks for your patience as we work on improving the community!
 You are using an unsupported browser for translation. Please switch to another browser.

This short, but packed demonstration will show you why tens of thousands of data analysts from more than 1,800 companies rely on Alteryx daily to prep, blend, and analyze data, to deliver deeper business insights in hours, not weeks.
View full article
How to troubleshoot when the Salesforce Output tool does not appear in the tool palette.  
View full article
Quick navigation for the Tool Mastery Series!
View full article
Instruction on how to obtain the list of deprecated tools in Alteryx Designer!
View full article
Querying a Salesforce report using the Salesforce Input tool returns a maximum of 2000 records.
View full article
Steps to troubleshooting an "Unhandled Exception" error when attempting to open the Manage Data Connections window.
View full article
The Find Replace Tool is one of those tools that goes relatively unused and uncelebrated until you stumble into a data blending technique that would be extremely difficult without it – at which point, it becomes your favorite tool in the Designer. You can find it in the Join Category and it’ll make easy string substitutions in your data that would otherwise require herculean effort to work around. Today, we celebrate Find Replace as a hero.
View full article
Sampling weights, also known as survey weights, are positive values associated with the observations (rows) in your dataset (sample), used to ensure that metrics derived from a data set are representative of the population (the set of observations).
View full article
Question    How do you use the Arrange Tool in Alteryx?   Answer The Arrange tool allows you to manually transpose and re arrange your data fields for presentation purposes. Data is transformed so that each record is turned into multiple records and columns can be created by using field description data.   Input:     Set the Arrange tool. Key Fields : Select columns from your data stream. Create and manipulate Output Fields .  To create a new ouput field, click Column and select Add to open the Add Column window.  Column Header : Enter the name of the new column of data.  Fill in Description Column : Select Add New Description to create a column containing your description value of the selected fields.         Output:       Please find the example Arrange.yxmd attached.
View full article
Question Where can I find a vailable "Big Data Sets" over the internet?    Big data is data that is usually with sizes beyond the ability of commonly used software tools to manage and process within a tolerable elapsed time. A year-long credit card transaction history or CDR (Call data record) of a telecoms company for the last 9 months, behavioral credit data of a large financial institution are some examples... Answer Amazon (AWS) has a Large Data Sets Repository Data.gov has close to 190k public data sets One of the standard datasets for Hadoop is the Enron email dataset comprising emails between Enron employees during the scandal.  It's a great practice dataset for dealing with semi-structured data (file scraping, regexes, parsing, joining, etc.).  It's ~400MB (compressed) and available for download at http://www.cs.cmu.edu/~enron/ Collection of audio features and metadata for a million contemporary popular music tracks  http://labrosa.ee.columbia.edu/millionsong/ .  SecondHandSongs dataset -> cover songs  musiXmatch dataset -> lyrics  Last.fm dataset -> song-level tags and similarity  Taste Profile subset -> user data  thisismyjam-to-MSD mapping -> more user data  tagtraum genre annotations -> genre labels  Top MAGD dataset -> more genre labels   You can either  download the entire dataset (280 GB) or a subset of 10,000 songs (1.8) for a quick taste. GDELT set: http://www.gdeltproject.org/data.html NY City taxi data sets 1.1BN records: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml  Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance An open-source exploration of the city's neighborhoods, nightlife, airport traffic, and more, through the lens of publicly available taxi and Uber data Airline data set 1987-2008: https://github.com/h2oai/h2o-2/wiki/Hacking-Airline-DataSet-with-H2O  Google Big Table - hosted by Google: https://cloud.google.com/bigquery/sample-tables Weather, timeline of actions such as pull requests and comments on GitHub repositories with a nested or flat schema, US births 1969-2008, Shakespeare - number of times each word appears, Wikipedia articles over 300,000,000 million rows. LENDING CLUB: https://www.lendingclub.com/info/download-data.action Here is a telecom Italia dataset as  a result of a computation over the Call Detail Records (CDRs)  generated by the Telecom Italia cellular network over the city of Milano; You may have to sign-in and activate your account but it's totally free... https://dandelion.eu/datagems/SpazioDati/telecom-sms-call-internet-mi/description/  Data Science Centralhttp://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free  KD Nuggets is a well respected analytics blog, they have put together a very nice and deep list: http://www.kdnuggets.com/datasets/index.html  UK Data https://data.gov.uk/data  Google's Public Data Directory: http://www.google.com/publicdata/directory  For the Spatial and GIS folks: http://gisgeography.com/best-free-gis-data-sources-raster-vector/  The mother of big datasets - Reddit. 1.7bn JSON objects; 250GB compressed. https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment  Loads of really great links from here as well: https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public  http://www.imdb.com/interfaces A subset of the IMDb plain text data files is available from their FTP sites as follows: ftp.fu-berlin.de (Germany)  ftp.funet.fi (Finland) One of my favorites are Data.gov where there is tons of public data from all sectors, different size sets and in different formats including API connections. This url, http://www.data.gov/open-gov/ ,  shows each of the local governments in the US. They have varying degrees of completion on the local level.  https://www.linkedin.com/pulse/need-data-bob-wyman?trk=mp-author-card  The Government of Canada has an Open Data portal -- http://open.canada.ca/en/open-data -- it takes some digging to find the gems, but there are some. There's also some open mapping data at -- http://open.canada.ca/en/open-maps.  This 3TB+ dataset comprises the largest released source of GitHub activity to date. It contains a full snapshot of the content of more than 2.8 million open-source GitHub repositories including more than 145 million unique commits, over 2 billion different file paths, and the contents of the latest revision for 163 million files, all of which are searchable with regular expressions. https://cloud.google.com/bigquery/public-data/github   15.49TB of research data available. http://academictorrents.com/  Australia, New South Wales Open data http://data.nsw.gov.au/  USAFacts: Our Nation, in numbers. Federal, state, and local data from over 70 government sources.
View full article
Question    What are some "Small Data Sets" available over the internet?     Small data is data that is small enough size for human comprehension.   A few thousand lines of credit data or marketing segmentation example data, B2B client contact history of a firm are some examples...  Answer   kaggle.com "Kaggle is a platform for predictive modeling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modeling task and it is impossible to know at the outset which technique or analyst will be most effective."  There are multiple available small datasets that you can test your skills on:  https://www.kaggle.com/c/informs2010 - The goal of this contest is to predict short term movements in stock prices. https://www.kaggle.com/c/axa-driver-telematics-analysis - Use telematic data to identify a driver signature. https://www.kaggle.com/c/sf-crime - Predict the category of crimes that occurred in the city by the bay.  You may find 202 more under the following link https://www.kaggle.com/competitions/search?DeadlineColumnSort=Descending  Kaggle has started a section called Kaggle Datasets, that has public datasets that you can use as datasets for the competitions were often restricted for use outside the competition. https://www.kaggle.com/datasets Kaggle also has scripts for processing the given data sets: https://www.kaggle.com/scripts , which are usually in R or Python. It can be instructive to look at those and discern which parts can be pulled into standard Alteryx tools, and which parts left to a custom R call, for instance.  The nice thing is that, once you've finished, you can submit your output to the relevant Kaggle competition (even after the fact) to see how your output stacks up to the competition. "Small Data" set to test your skills on Duplicate Detection, Record Linkage, and Identity Uncertainty http://www.cs.utexas.edu/users/ml/riddle/data.html  Here is an addition from Europe...http://open-data.europa.eu/en/data/  "The European Union Open Data Portal is the single point of access to a growing range of data from the institutions and other bodies of the European Union (EU). Data are free for you to use and reuse for commercial or non-commercial purposes. By providing easy and free access to data, the portal aims to promote their innovative use and unleash their economic potential. It also aims to help foster the transparency and the accountability of the institutions and other bodies of the EU."
View full article
Question Does "Dictionary Sort Order" always place lower case letters before capital letters?  Answer Yes.  In the Sort-Configuration menu there is an option to "Use Dictionary Order".  When checked it will sort in alphabetical order with lower case first (e.g., a, A, b, B, c, C, etc.).   Input:   If you do not have "Use Dictionary Order" checked, it will sort all Upper case first and then all lower case (e.g., A, B, C, a, b, c, etc.).   Check "Use Dictionary Sort Order.     Dictionary Sort Order   Visit the sort help article or the attached workflow for more details.
View full article
Is there a  way to avoid the Cross Tab Tool from transferring all the input information alphabetically? Simply add a RecordID to your records and add the RecordID field as a grouping field in your Cross Tab Tool to keep the order!
View full article
To avoid the transferring all the input information alphabetically in the Cross Tab Tool, you can add a RecordID for all the records of the input.  Then add RecordID as the Grouping Field in the Cross Tab Tool.     Input: Add RecordID.  Then in the Cross Tab tool, group by RecordID.   Output:
View full article
How do I remove leading zeros from a field?  Use the Formula Tool  a nd the TrimLeft() function to remove leading zeros!
View full article
This is the place to be if you are trying to learn how to download data from your Amazon S3 onto Alteryx Designer, or uploading data from Alteryx Designer to your Amazon S3 account. 
View full article
Looking to install additional R packages?  Here's how!
View full article
With the release of 2018.3 comes the long-awaited and highly anticipated Python Tool! This article is a general introduction to using the tool.
View full article
With the Python Tool, Alteryx can manipulate your data using everyone’s favorite programming language - Python! Included with the tool are a few of pre-built libraries that extend past even the native Python download. This allows you to extend your data manipulation even further than one could ever imagine. The libraries installed are listed here - and below I’ll go into a bit more detail on what and why these libraries are so useful.   Each library is well documented, and there’s usually an introduction or examples on their sites to get you started on how a basic function in their library works.     ayx – Alteryx API – simply enough, we’re using Alteryx, sooo yea, kind of a requirement for the translation between Alteryx and Python.   jupyter – Jupyter metapackage – If you’ve used a Jupyter notebook in the past, you’ll notice the interface for the Python Tool is similar. This interface allows you to run sections of code outside of actually running the workflow, which makes understanding and testing your data that much easier. http://jupyter.org/index.html   matplotlib – Python plotting package – Any charting, plotting, or graphical needs you would want will be in this package. This provides a great deal of flexibility for whatever you want to visualize. https://matplotlib.org/   numPy – NumPy, array processing for numbers, strings, records, and objects – Native Python processes data in what some would call a cumbersome way. For instance, if you wanted to make a matrix, a.k.a. a 4x4 table, you would need to create a list within a list, which can slow processing a bit. However, NumPy has its own “array” type that fits the data in this matrix pattern that allows for faster processing. Additionally, it has a bunch of methods of handling numbers, strings, and objects that make processing a whole lot easier and a whole lot faster. http://www.numpy.org/   pandas – Powerful data structures for data analysis, time series, and statistics – This is your staple for handling data within Alteryx. Those who have used Python, but never pandas, will enter a whole new beautiful world of data handling and structure. Data manipulation within Python is faster, cleaner, and easier to code with. The best part about it is that the Python Tool will read in your Alteryx data as a pandas data frame! Understanding this library should be one of the first things to know when tackling the Python code. https://pandas.pydata.org/   requests – Python HTTP for Humans – for all the connector/Download Tool fans out there. If any of you are familiar with making HTTP requests (API calls and the like), then you should introduce yourselves to this package and explore how Python performs these requests. http://docs.python-requests.org/en/master/   scikit-learn – a set of Python modules for machine learning and data mining – Welcome to the world of machine learning in Python! This library is your go-to for statistical and predictive modeling and evaluation. Any crazy and wild methods you’ve learned for machine learning will most likely be found here and can really push the boundaries of data science. http://scikit-learn.org/stable/   scipy – Scientific Library for Python – all your scientific and technical computing can be found here. This library builds off the packages already installed here, like numPy, pandas, and matplotlib. Dealing with mathematical models and formulae are usually located within this library and can help provide that higher level analysis of your data. https://www.scipy.org/   six – Python 2 and 3 compatibility utilities – For those who are unfamiliar, Python versions come in 2 forms, version 2.x and 3.x (with 3.x being the most recent). Now, even though Python 3 is supposed to be the latest and greatest, there are still many users out there who prefer using Python 2. Therefore, integration between the two is a bit tricky with syntax differences, etc. The six module provides functions that are usable between the two so everyone can remain calm and happy! Their documentation is usually coupled with which version the functions most closely align to, so a user can get a better idea to its functionality. https://pypi.org/project/six/   SQLAlchemy – Database Abstraction Library – SQL in Python! Covers all your database needs from connecting to and extracting data, allowing it to interact with your Python code and thus, Alteryx itself. https://www.sqlalchemy.org/   statsmodels – statistical computations and models for Python – This library builds off sci-kit learn but focuses more on statistical tests and data exploration. Additionally, it utilizes R-style formulae with pandas data frames to fit models! https://www.statsmodels.org/stable/index.html   These are the libraries installed with the Python Tool, which can do almost any data function imaginable. Of course, if you’re looking to do something that these libraries don’t provide, there are myriad other Python libraries that I’m sure will help you with your use case. Most of these are also well documented in how to use so search away and let your mind float away in the beautiful cosmos created by Python.
View full article
When your Python libraries don't work the way they should in Python tool, restoring the tool to it's original state could be the solution. This article walks through how to restore Python libraries and the virtual environment associated with the Python tool.
View full article