Alteryx Designer Desktop Knowledge Base

ChristineB · ‎01-18-2017

Scenario:

You've been given data for a new project and it contains lots of extra (and unnecessary) rows before you even get to the information you need to work with. Look familiar?

For many Alteryx users, this situation is all too common. Luckily, there's a pretty easy way to resolve this issue using the Sample and Dynamic Rename tools!

Method:

To demonstrate this approach, we'll use some sample data that has extraneous information and space at the top (Rows 1-4) of the spreadsheet in Figure 1 (below). While the information itself might be important, it's going to interfere with our data analysis. What we really want to see is the information in Row 5 as our header name and the information from Row 6 onwards to be our data.

Figure 1: The data in rows 1-4, as seen in Excel, should not be included in the data analysis.

Rather than manually re-format our dataset, we'll bring it into Alteryx and let the data preparation begin! Using an Input Tool, we'll navigate to the location of our data file. The tool gives us a preview of what to expect when bringing in the data (Figure 2). This format is nowhere near perfect, but we still have a few tricks up our sleeve!

Figure 2: The Input Tool shows how the data will be brought into Alteryx. Our heading is not correct, and we still have a few lines of data (in light yellow) to eliminate while keeping the data we want to analyze (in dark yellow).

A quick visual assessment indicates that we'll need to skip the first three rows of data (the information in Row 4 will become our field names). We can remove these data using a Sample Tool. In the Sample Tool configuration (Figure 3), we'll opt to "Skip the 1st N Records"; in this case, N will be equal to 3.

Figure 3: Set the number of records to skip, or remove, from the top of the dataset.

Now that we've removed the first 3 rows of data, we are much closer to the version of the data format we'd like to work with. The data we'd like to use as the field names (Number, FirstName and State) are now in the first row of data. We'll use the Dynamic Rename Tool to re-name our fields using the option to "Take Fields from the First Row of Data" (Figure 4). And, voila!! Our data is now ready to use for the next steps of our analyses.

Figure 4: After removing unwanted rows of data and re-naming the fields, our data is ready for further analyses.

*See the attached sample workflow (v10.5) for an example of this process.

RodL · ‎01-18-2017

Another tool that I like to use in place of the Sample tool is the Select Records tool. Sometimes the header isn't directly on top of the data (a "spacer" line has been added).

The Select Records tool allows you to designate the specific lines you want if they are disconnected.

ChristineB · ‎01-19-2017

@RodL, that's a great suggestion! I've used that trick as well for the exact situation you mentioned. Thanks for sharing!

pratikg · ‎01-23-2017

Even I use Record tool & Dynamic Rename tool in above situation. But Thanks for sharing new method :)

mweiser_dup_512 · ‎06-14-2017

What do you recommend doing if the row number the data starts on isn't always on the exact same row, but the row I want to start on has a certain text that is consistent?

GCBASTOS · ‎09-29-2017

@mweiser , would you like to try this: https://community.alteryx.com/t5/Data-Preparation-Blending/Dynamically-remove-rows-1-to-X-from-a-CSV... , worked as a charm for me!

Raghu_s · ‎04-02-2018

Hello @ChristineB and @RodL!

Been making rounds in the community to see if there are any way to solve similar problem with .XLS files. I see there are no option like Start at line option that's available for XLSX files when inputed. I'm aware of skipping records using Preparation tool, but if I need to use a Directory tool to read multiple XLS file type and skip all dirty records for say rows 1-15 there is no solution. I run out of ideas.

Any thoughts?

JohnZurales · ‎06-23-2019

Thanks for this. Worked perfectly for my situation.

Thanks,

John

JohnZurales · ‎06-23-2019

Christine,

Thanks, worked perfectly.

Best regards,

John

Inactive User · ‎07-23-2019

@christineB Perfectly explained, thank

Mikey816 · ‎12-10-2019

I'm in a situation where my data starts on the same 3rd row for every Excel file that comes in on a monthly basis. Although sometimes the file have a blank column A inserted, the tool seems to be including this blank column like it's a row in which case I would have to change the range input.

Is there any way to only specify the tool to look at the rows and beyond that is specified? In my case my range input would be 3+ to start at row 3 and all data beyond that.

File with blank inserted column A:

File with blank column A

File with no blank column:

File with no blank column

MikaelO · ‎07-15-2020

Hello. can someone help. i don't understand how to rename the first row using the dynamic rename tool. please. Any more guideline to that.

TAGreen1 · ‎07-17-2020

The suggestion to use the "Select Records" tool was also extremely helpful and answered another one of my questions, which is how to remove rows at the end of a dataset (or anywhere in the middle) from consideration.

Susna · ‎08-29-2020

Hope we can use the configuration option of the Input Data tool:

Start Data Import on Line:

if just want to remove the unwanted lines from the beginning of a file.

Please see the first and second images of Input Data tool configuration options.

Thanks!

Data import Start line-0 .JPG Data import Start line .JPG

HelenL · ‎11-27-2020

This is a very helpful thread. Thanks everyone. But I still can't do exactly what I need to do.

I've used Dynamic rename to make Row 1 my column headers. "Take field names from first row of data". So far so good.

But I have unwanted info in what is now Row 1

My data starts at Row 2

I've looked at Select records and the examples provided. But I can't see how to exclude one row (Row 1)

Regards,

Helen

pratikg · ‎11-27-2020

@helenL

Hi , Add row Id as a column using row Id module and then filter out unwanted row number.

HelenL · ‎11-27-2020

Neat! Thanks for your quick reply. It worked perfectly.

Alteryx Designer Desktop Knowledge Base

Input data that doesn't start on the first row