Concatenate shifted data

Hi all,

I am facing a problem with frequent data shift when a .csv file imported with delimited input. The data is somewhere broken and shifted to the next line, therefore throws an error in the workflow. This issue is caused when the data is extracted from source systems, and we do not have any control over this process.

I have given the sample data with the line numbers at which the shift occurs below :

Data I have

In this data, a sentence is broken into different adjacent lines and I need these to concatenate into a single line based on the line number.

The output that i need to get is as shown below:

Data I need

The idea is to find out the adjacent line numbers at which the line shift occurs and concatenate into the previous line so that the initial line at which shift starts gives the proper sentence.

Note: It will be preferred to get a solution which automatically groups out the adjacent line numbers at which the shift occurs and then concatenate the largest line number with the previous one so on. I am trying to automate this process since it is a common issue I face in my work.

Thanks in advance

Workflow

Common Use Cases

Bug

Help

Dynamic Processing

Preparation

Expression

Data Investigation

Behavior Analysis

Accepted answers

kat

Hi @hanz_e

Here is a solution that will work on the data you provided. It won't work if the data is split into new lines in places where there is a |.

concat string.PNG

I counted the number of pipes in a row
I created a running total that reset every time we reached 6 pipes (the max)
I created a grouping field, based on the running total, where the grouping reset after each running sum of 6.
I then concatenated the data based on the grouping identified above, with a space (\s) separating the data.

Hope this works for you! Please do give a shout if you need some more help on it or if there's other scenarios.

Cheers

Katrin

concat string.yxmd

All comments

MarqueeCrew

@hanz_e,

I'd like to see the original data. It is possible that the newline (\n) character is present and if removed would fix the problem for you. Otherwise you'll need to use a Multi-Row Formula and if the Right() function is used to search for a period (.) at the end of the record doesn't exist, then you'll be concatenating [Row-1:Field] + [Field] data together.

Cheers,

Mark

danilang

Hi @hanz_e

Your output contains a character that's not in your input.

"Share your smile with the world" Line 34 becomes

"Share your smile with the world." Line 34+35 There's an extra period "." at the end. Is this just a typo on your part when you were creating the sample output data, or something the workflow has to determine?

Dan

hanz_e

Hi @MarqueeCrew,

First of all, sorry that the actual data do not contain any special character at the end of each line to so that the workflow can identify. It was included when I copy a normal sentence to make the sample data.

The the .csv file that I input is similar to the given sample data.

Raw .csv file

Each raw was supposed to have 7 columns with 6 pipe delimiters. But the data have shifts at rows '3' & '7' where one field is split into another line. Thus, when I input this file with delimited input, it produce shifted data with rows having less number of delimiters than actual number of delimiters per line. I have created the workflow to find out the line numbers at which the shift occurs and the shifted data occupies. In this example, I need to automatically group the line numbers '3', '4' & '5' into one and '8' & '9' into another and then concatenate the data in the highest line number of each group to the previous line till it reach the lowest line number of each group.

Hope this explains the help I need.

hanz_e

Hi @danilang,

The actual data do not contain any special character at the end of each line. It was just included when I copy a sentence to make this sample data.

I guess the only way to solve this issue is based on the line number. For reference, I have shared the detail raw data in another post.

Sorry for the confusion.

Quick Links

This months top contributors

mceleavey 383

mbarone 337

Hollingsworth 335

LanisC 335

JeffF 335