I am working on a workflow where I want to select specific rows from a 26 pages data set. I want to select only the description, quantity, unit price and total along with the relevant data. However, I would like to skip all the fluff such as address, dates and so on. Please advise which tool i can use to retrieve only the relevant data from a very big data set.
Illustration
Page 1
XYZ Statement
01/11/49
Bank of America
Street address
City
State
Zip code
Description. Quantity. Unit price. Total
Xyz. 10. 20. 200
ABC. 5. 10. 50
Page 2
TD bank
Street address
City
State
Description. Quantity. Unit price. Total
Xyz. 50 20. 1000
ABC. 5. 20. 100
Page 26
Description. Quantity. Unit price. Total
Xyz. 10. 20. 200
ABC. 5. 10. 50
@HassimDiallo
What is the Format of of you r input data?
You have mentioned Pages.
The original file was a PDF bank statement (26 pages). Ran it through internal OCR (Optical Character Recognition) to convert from PDF to Excel format. Used the Excel (Xlsx) as input into my workflow. I just have to clean up the funky formatting and select the data that is relevant to me.
for example, my data set contain 6 coulmns. Row C35 to F35 where my first relevant data start from. However it could change based on each client statement so should be dynamic selection to account for future differences.
Column A:A and B:B is not revelant and will use select tool to leave it out of my workflow
Column C1:F34 not relevant data and should be ommitted from selection
C35:F91 is relevant
C92:F150 not relevant data and should be ommitted from selection
C150:F1000 Relevant and should be selected
and so on.
Thank you,
Hassim
Hi @Qiu ,
The original file was a PDF bank statement (26 pages). Ran it through internal OCR (Optical Character Recognition) to convert from PDF to Excel format. Used the Excel (Xlsx) as input into my workflow. I just have to clean up the funky formatting and select the data that is relevant to me.
for example, my data set contain 6 coulmns. Row C35 to F35 where my first relevant data start from. However it could change based on each client statement so should be dynamic selection to account for future differences.
Column A:A and B:B is not revelant and will use select tool to leave it out of my workflow
Column C1:F34 not relevant data and should be ommitted from selection
C35:F91 is relevant
C92:F150 not relevant data and should be ommitted from selection
C150:F1000 Relevant and should be selected
and so on.
Thank you,
Hassim
@HassimDiallo
Thank you for the information. I thought you were importing Excel files.
I am sorry but I dont know much about PDF Import and parsing, since it is really tricky to me.
Hope someone else can help.
Can you upload a sample of what your input data looks like? Multi-Row Formula Tool may be helpful here for grouping rows together that you want to keep.