Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Help selecting certain rows in very large data set

HassimDiallo
7 - Meteor

I am working on a workflow where I want to select specific rows from a 26 pages data set. I want to select only the description, quantity, unit price and total along with the relevant data. However, I would like to skip all the fluff such as address, dates and so on. Please advise which tool i can use to retrieve only the relevant data from a very big data set.

 

Illustration

Page 1

XYZ Statement

01/11/49

Bank of America

Street address 

City

State

Zip code

 

 

Description. Quantity.  Unit price.  Total

Xyz.                     10.            20.            200

ABC.                     5.             10.             50

 

 

Page 2

 

TD bank

Street address 

City

State

 

Description. Quantity.  Unit price.  Total

 

Xyz.                     50            20.            1000

 

ABC.                     5.             20.             100

 

 

Page 26

 

Description. Quantity.  Unit price.  Total

Xyz.                     10.            20.            200

ABC.                     5.             10.             50

5 REPLIES 5
Qiu
21 - Polaris
21 - Polaris

@HassimDiallo 
What is the Format of of you r input data?
You have mentioned Pages.

HassimDiallo
7 - Meteor

@Qiu 

 

The original file was a PDF bank statement (26 pages). Ran it through internal OCR (Optical Character Recognition) to convert from PDF to Excel format. Used the Excel (Xlsx) as input into my workflow. I just have to clean up the funky formatting and select the data that is relevant to me.

 

for example, my data set contain 6 coulmns. Row C35 to F35 where my first relevant data start from. However it could change based on each client statement so should be dynamic selection to account for future differences. 

 

Column A:A and B:B is not revelant and will use select tool to leave it out of my workflow

Column C1:F34 not relevant data and should be ommitted from selection

C35:F91 is relevant

C92:F150 not relevant data and should be ommitted from selection

C150:F1000 Relevant and should be selected

and so on.

 

Thank you,

Hassim

HassimDiallo
7 - Meteor

Hi @Qiu ,

 

The original file was a PDF bank statement (26 pages). Ran it through internal OCR (Optical Character Recognition) to convert from PDF to Excel format. Used the Excel (Xlsx) as input into my workflow. I just have to clean up the funky formatting and select the data that is relevant to me.

 

 

 

for example, my data set contain 6 coulmns. Row C35 to F35 where my first relevant data start from. However it could change based on each client statement so should be dynamic selection to account for future differences. 

 

 

 

Column A:A and B:B is not revelant and will use select tool to leave it out of my workflow

 

Column C1:F34 not relevant data and should be ommitted from selection

 

C35:F91 is relevant

 

C92:F150 not relevant data and should be ommitted from selection

 

C150:F1000 Relevant and should be selected

 

and so on.

 

 

 

Thank you,

 

Hassim

Qiu
21 - Polaris
21 - Polaris

@HassimDiallo 
Thank you for the information. I thought you were importing Excel files.
I am sorry but I dont know much about PDF Import and parsing, since it is really tricky to me.

Hope someone else can help.

CoG
14 - Magnetar

Can you upload a sample of what your input data looks like? Multi-Row Formula Tool may be helpful here for grouping rows together that you want to keep.

Labels
Top Solution Authors