I have a work flow that pulls data and outputs a series of rows of data; sometimes small amount of rows, sometimes none. These are error reports generated from system machines.
I have a second data source that is essentially a IT ticketing service that generates reports on a vast array of issues generated and fixed.
So workflow, call it Dataset A output looks like this:
| Error location | error type | unique machine ID |
| EFGH 51.2 | CC | 3456 |
| ABCD 12.3 | BB | 1234 |
The ticketing service unfortunately does not put this information into separate columns. Instead there is a single column called short description with quite a bit of information; let's call it DatasetB:
| Short description | Number |
| String-XXXX-XXX-XX-XXXX-XXX-XXXX-XXX: ABCD 12.3: BB : More String - Data | ABC123456789 |
| String-XXXX-XXX-XX-XXXX-XXX: AAA 12.3: BB : More String - Data | XZY987654321 |
I've had to anonymize the data rather heavily. But I think you can understand where the data is.
So what am I trying to do? Two things:
- Find matches from both data sets that contain the error location. I only need the ABCD section of error location. Then add the the number column from data set B to the first data set A.
- If DataSet A entry does not exist in Dataset B send an email.
So in the example above the output would be:
| Error location | error type | unique machine ID | Number |
| ABCD 12.3 | BB | 1234 | ABC123456789 |
An email then would be sent detailing the first row since it does not have a ticket generated.
Couple things that might clarify:
The first string lengths do vary although the colon is constant. I have used parse text to columns to separate part of the field out with limited success; I'm probably doing it wrong.
Any and all help would be appreciated.