Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to remove unnecessary row and data from excel to create single csv data file.

shukla_sumit
7 - Meteor

@ChristineB @jdunkerley79

In my case in a single excel sheet I have data into 3 groups and for each  group structure is same means column header are same in all three groups but I need it only once as a column name or header and need to merge all 3 groups data into single csv file.

How can i handle such situation into alteryx, Can you please help me out with this issue ?

 

attaching source file and expected result both.

6 REPLIES 6
jdunkerley79
ACE Emeritus
ACE Emeritus

Am in transit for next couple of hours but I suggest using a multi row formula to build a Group column something like

Iif(Regex_Match(f2,”Group \d+”),f2,[Row-1:Group])


Should create the grouping and then you can hunt for Speciality.

 

Something like:

2018-12-27_22-57-33.png

 

The last bit in the workflow uses dynamic rename to assign the column names

 

Sample attached

Ladarthure
14 - Magnetar
14 - Magnetar

Hi, I did a sample workflow which does what you need (I think), tell me if it helped

shukla_sumit
7 - Meteor

@jdunkerley79 First of all thank you very much for your help.

 

As I am new in alteryx can you please explain me that work flow which you have shared with me ?

 

second thing is I have 100 of excel files and all the files has same structure, every file has multiple sheet and in every file sheet names are same, Now i need to connect all the files using single workflow which you have already shared with me as a solution and wanted to collect all the data from all the files for one particular sheet  into one single .csv file.

Can you please help me on that ?

 

Need to do all the above into attached workflow.

 

Thanks in advance. :)

jdunkerley79
ACE Emeritus
ACE Emeritus

So the workflow I posted does the following

 

First, use a Select tool to pick the columns I actually want. In this case ignoring F1, F18, F19 and F20 (these are what Alteryx calls columns A, R, S & T).

 

Next, use a Multi-Row formula tool to create a new column containing the Group. The expression:

IIF(Regex_match([F2],"Group \d+"),[F2],[Row-1:Group])

takes the Value of F2 (Column B) if matches the regular expression Group \d+ - i.e. starts with Group followed by a number. If not it copies the value from the row above ([Row-1:Group])

 

Again, I use another Multi-Row formula tool, this time to create a new column, called row, holding the row number within each group.

IIF([F2]="Specialty",0,[Row-1:Row]+1)

Basically, this just adds one to the row above until it finds Speciality in F2 (Column B). In order to not count rows before the first Speciality entry, the Multi-Row formula is told to use NULL as the default value rather than the usual 0 or empty.

 

At this point, I use a Filter tool to remove dead rows (where F2 is empty) or rows with no Row number. This produces the dataset and header needed.

 

A second filter splits the header rows (when row = 0) from the data rows. For the data rows, we are done. 

 

The header rows the first is chosen and used to rename the columns from F2-F19 to nice names. This is done by transposing this row to columns and then using a dynamic rename tool.

 

 

I hope that helps explain the workflow.

 

In terms of doing multiple sheets. 

 

If all the sheets are identical then the easiest way to do it is to use a wildcard on the input tool to read all the files in. You can change the input tool to read a list of worksheet names and then feed this into a dynamic input tool to read all the sheets. This will only work if all columns are consistent across all worksheets in all workbooks. 

 

Could you post an example source workbook with multiple sheets? I can then stick a simple sample together for you to look through.

shukla_sumit
7 - Meteor

Thank you so much for all your help @jdunkerley79.

 

I forgot to ask one more question which is how to create new column from particular row data.

for example in the attached sheet I have row number 5 "Data Collection 2017-18" from which i wanted to create date column and the value will be 20180331 means last day of macrh for year 2018 ,if the year is 2016-17 then 20170331 and so on for all the row of that file.

 

Second In row number 6 which is "My Trust Name" wanted to create new column trust name and the value for that column will be the value of that cell for all the rows of that file ,currently it's My Trust Name.

 

How we can achieve this in previous workflow(Excel Reformat.yxmd).

jdunkerley79
ACE Emeritus
ACE Emeritus

I suggest this extenstion:

2019-01-03_15-29-08.jpg

 

Filter to rows with a group and F2 not NULL

Then use a Select Rows tool to pick row 2 for the Date and row 3 for the Trust Name

You can then use a formula tool to make a date from Row 2 - I chose to use a RegEx expression to do this but there are many ways.

Finally use two append fields tools to join back to original set.

 

Updated sample attached

Labels