community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Knowledge Base

Definitive answers from Designer experts.

Input data that doesn't start on the first row

Sr. Instructional Designer
Sr. Instructional Designer

 Scenario: 

You've been given data for a new project and it contains lots of extra (and unnecessary) rows before you even get to the information you need to work with.  Look familiar?

 

scenario.jpg

 

 For many Alteryx users, this situation is all too common.  Luckily, there's a pretty easy way to resolve this issue using the Sample and Dynamic Rename tools!

 

Method: 

To demonstrate this approach, we'll use some sample data that has extraneous information and space at the top (Rows 1-4) of the spreadsheet in Figure 1 (below).  While the information itself might be important, it's going to interfere with our data analysis.  What we really want to see is the information in Row 5 as our header name and the information from Row 6 onwards to be our data.

 

Figure 1: The data in rows 1-4, as seen in Excel, should not be included in the data analysis.   

Original_input_lines_skipped.jpg

 

Rather than manually re-format our dataset, we'll bring it into Alteryx and let the data preparation begin! Using an Input Tool, we'll navigate to the location of our data file.  The tool gives us a preview of what to expect when bringing in the data (Figure 2).  This format is nowhere near perfect, but we still have a few tricks up our sleeve!

 

 Figure 2: The Input Tool shows how the data will be brought into Alteryx.  Our heading is not correct, and we still have a few lines of data (in light yellow) to eliminate while keeping the data we want to analyze (in dark yellow).  
Skipped_lines_Alteryx_input.jpg

 

A quick visual assessment indicates that we'll need to skip the first three rows of data (the information in Row 4 will become our field names).  We can remove these data using a Sample Tool.  In the Sample Tool configuration (Figure 3), we'll opt to "Skip the 1st N Records"; in this case, N will be equal to 3.

 

Figure 3: Set the number of records to skip, or remove, from the top of the dataset.

sample.jpg

 

Now that we've removed the first 3 rows of data, we are much closer to the version of the data format we'd like to work with.  The data we'd like to use as the field names (Number, FirstName and State) are now in the first row of data.  We'll use the Dynamic Rename Tool to re-name our fields using the option to "Take Fields from the First Row of Data" (Figure 4).  And, voila!! Our data is now ready to use for the next steps of our analyses.  

 

 Figure 4: After removing unwanted rows of data and re-naming the fields, our data is ready for further analyses. final_workflow.jpg

 

*See the attached sample workflow (v10.5) for an example of this process. 

 

 

Comments
Alteryx Alumni (Retired)

Another tool that I like to use in place of the Sample tool is the Select Records tool. Sometimes the header isn't directly on top of the data (a "spacer" line has been added).

The Select Records tool allows you to designate the specific lines you want if they are disconnected.

Sr. Instructional Designer
Sr. Instructional Designer

@RodL, that's a great suggestion!  I've used that trick as well for the exact situation you mentioned.  Thanks for sharing!

Atom

Even I use Record tool & Dynamic Rename tool in above situation. But Thanks for sharing new method Smiley Happy

Meteoroid

What do you recommend doing if the row number the data starts on isn't always on the exact same row, but the row I want to start on has a certain text that is consistent?

Atom

@mweiser , would you like to try this: https://community.alteryx.com/t5/Data-Preparation-Blending/Dynamically-remove-rows-1-to-X-from-a-CSV... , worked as a charm for me!

 

Asteroid

Hello @ChristineB and @RodL

Been making rounds in the community to see if there are any way to solve similar problem with .XLS files. I see there are no option like Start at line option that's available for XLSX files when inputed. I'm aware of skipping records using Preparation tool, but if I need to use a Directory tool to read multiple XLS file type and skip all dirty records for say rows 1-15 there is no solution. I run out of ideas. 

 

Any thoughts?