Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

Input data that doesn't start on the first row

ChristineB
Alteryx Alumni (Retired)
Created

Scenario:

You've been given data for a new project and it contains lots of extra (and unnecessary) rows before you even get to the information you need to work with. Look familiar?

 

scenario.jpg

 

For many Alteryx users, this situation is all too common. Luckily, there's a pretty easy way to resolve this issue using the Sample and Dynamic Rename tools!

 

Method:

To demonstrate this approach, we'll use some sample data that has extraneous information and space at the top (Rows 1-4) of the spreadsheet in Figure 1 (below). While the information itself might be important, it's going to interfere with our data analysis. What we really want to see is the information in Row 5 as our header name and the information from Row 6 onwards to be our data.

 

Figure 1: The data in rows 1-4, as seen in Excel, should not be included in the data analysis.

Original_input_lines_skipped.jpg

 

Rather than manually re-format our dataset, we'll bring it into Alteryx and let the data preparation begin! Using an Input Tool, we'll navigate to the location of our data file. The tool gives us a preview of what to expect when bringing in the data (Figure 2). This format is nowhere near perfect, but we still have a few tricks up our sleeve!

 

Figure 2: The Input Tool shows how the data will be brought into Alteryx. Our heading is not correct, and we still have a few lines of data (in light yellow) to eliminate while keeping the data we want to analyze (in dark yellow).
Skipped_lines_Alteryx_input.jpg

 

A quick visual assessment indicates that we'll need to skip the first three rows of data (the information in Row 4 will become our field names). We can remove these data using a Sample Tool. In the Sample Tool configuration (Figure 3), we'll opt to "Skip the 1st N Records"; in this case, N will be equal to 3.

 

Figure 3: Set the number of records to skip, or remove, from the top of the dataset.

sample.jpg

 

Now that we've removed the first 3 rows of data, we are much closer to the version of the data format we'd like to work with. The data we'd like to use as the field names (Number, FirstName and State) are now in the first row of data. We'll use the Dynamic Rename Tool to re-name our fields using the option to "Take Fields from the First Row of Data" (Figure 4). And, voila!! Our data is now ready to use for the next steps of our analyses.

 

Figure 4: After removing unwanted rows of data and re-naming the fields, our data is ready for further analyses.final_workflow.jpg

 

*See the attached sample workflow (v10.5) for an example of this process.

 

 

Attachments
Comments
RodL
Alteryx Alumni (Retired)

Another tool that I like to use in place of the Sample tool is the Select Records tool. Sometimes the header isn't directly on top of the data (a "spacer" line has been added).

The Select Records tool allows you to designate the specific lines you want if they are disconnected.

ChristineB
Alteryx Alumni (Retired)

@RodL, that's a great suggestion!  I've used that trick as well for the exact situation you mentioned.  Thanks for sharing!

pratikg
5 - Atom

Even I use Record tool & Dynamic Rename tool in above situation. But Thanks for sharing new method :)

mweiser_dup_512
6 - Meteoroid

What do you recommend doing if the row number the data starts on isn't always on the exact same row, but the row I want to start on has a certain text that is consistent?

GCBASTOS
5 - Atom

@mweiser , would you like to try this: https://community.alteryx.com/t5/Data-Preparation-Blending/Dynamically-remove-rows-1-to-X-from-a-CSV... , worked as a charm for me!

 

Raghu_s
8 - Asteroid

Hello @ChristineB and @RodL

Been making rounds in the community to see if there are any way to solve similar problem with .XLS files. I see there are no option like Start at line option that's available for XLSX files when inputed. I'm aware of skipping records using Preparation tool, but if I need to use a Directory tool to read multiple XLS file type and skip all dirty records for say rows 1-15 there is no solution. I run out of ideas. 

 

Any thoughts? 

JohnZurales
6 - Meteoroid

Thanks for this.  Worked perfectly for my situation.   

Thanks,

John

 

JohnZurales
6 - Meteoroid

Christine,

Thanks, worked perfectly.   

 

Best regards, 

John

 

Inactive User
Not applicable

@christineB Perfectly explained, thank

Mikey816
5 - Atom

I'm in a situation where my data starts on the same 3rd row for every Excel file that comes in on a monthly basis. Although sometimes the file have a blank column A inserted, the tool seems to be including this blank column like it's a row in which case I would have to change the range input.

 

Is there any way to only specify the tool to look at the rows and beyond that is specified?  In my case my range input would be 3+ to start at row 3 and all data beyond that.

 

File with blank inserted column A:

File with blank column AFile with blank column A

 

File with no blank column:

File with no blank columnFile with no blank column

MikaelO
6 - Meteoroid

Hello. can someone help. i don't understand how to rename the first row using the dynamic rename tool. please. Any more guideline to that. 

TAGreen1
5 - Atom

The suggestion to use the "Select Records" tool was also extremely helpful and answered another one of my questions, which is how to remove rows at the end of a dataset (or anywhere in the middle) from consideration.

Susna
5 - Atom

Hope we can use the configuration option of the Input Data tool:

Start Data Import on Line: 

 

if just want to remove the unwanted lines from the beginning of a file.

 

Please see the first and second images of Input Data tool configuration options.

 

Thanks!

 

 

Data import Start line-0 .JPGData import Start line .JPG

 

HelenL
8 - Asteroid

This is a very helpful thread. Thanks everyone. But I still can't do exactly what I need to do. 

 

I've used Dynamic rename to make Row 1 my column headers. "Take field names from first row of data". So far so good.

But I have unwanted info in what is now Row 1

My data starts at Row 2

 

I've looked at Select records and the examples provided. But I can't see how to exclude one row (Row 1)

 

Regards,

Helen

 

pratikg
5 - Atom

@helenL 

Hi , Add row Id as a column using row Id module and then filter out unwanted row number.

 

 

 

 

HelenL
8 - Asteroid

Neat! Thanks for your quick reply. It worked perfectly.