I am trying to read a scanned PDF (I know this is a challenging, but it's not possible), like the table below:
I have successfully read the IDs and the Case numbers, but unfortunately, there is no fixed template for the Case # column, it could be 3,4,5, or even 10 people sharing the same case .
In alteryx, the data is only structured, so as an example, the Case number:11234/2024 would fill in line1, line 2, or it generates a third line and put the Case number and, in the other is null , like this :
ID | Case # |
1111 | 11234/2024 |
null | null |
2222 | null |
or it could be like this :
ID | Case # |
1111 | null |
null | 11234/2024 |
2222 | null |
or like this:
ID | Case # |
1111 | null |
null | null |
2222 | 11234/2024 |
there are lots of options, I think you got the point now.
What's the best possible way to map each ID with his Case??
this is quiet a challenge .. from my end.. i'd rather fix this first in excel before transferring it on alteryx.