Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Create Calculted Sum and Compare it to Static Total to find OCR Errors

hellyars
13 - Pulsar

So,  I had to build a data set of Congressional marks from some really low quality OCR images.

 

See chart below.

 

My question involves the last two rows with the value Committee Recommended under WHAT -- and specifically the second to law row labeled "Need Calculated Version".   The last row is generated from the OCR scan, however, it is full of errors.  So I need to generate the second to last row and compare against the last row to see if there is a difference .  If the value is >< 0 than there is an OCR error that needs to be corrected.   What is the simplest way to do this.


This is just the walk-down for one line, the real data has hundreds of lines with similar walk downs. 

 

 

 IDNAMEWHOWHATTXTAMOUNT
HARD CODEDN0181-07-02-4162-ASLDTAPBRBudget Request

Request

101595

HARD CODEDN0181-07-02-4162-ASLDTAHACAdjustmentIncrease X-6385
HARD CODEDN0181-07-02-4162-ASLDTAHACAdjustmentDecrease Y-3499
HARD CODEDN0181-07-02-4162-ASLDTAHACSum AdjustmentsSum of all...-9884
NEED CALCULATED VERSIONN0181-07-02-4162-ASLDTAHACCommittee RecommendedHAC Recommended91,711
ALREADY HARD CODEDN0181-07-02-4162-ASLDTAHACCommittee RecommendedHAC Recommended91,711

 

7 REPLIES 7
hellyars
13 - Pulsar

I do know I can probably use some combo of a Sum tool...

atcodedog05
22 - Nova
22 - Nova

Hi @hellyars 

 

I was kind of unsure on what is the calculation used for need calculated row hence i have taken it as sum of above rows.

 

Workflow:

atcodedog05_0-1626803101314.png

I want to separate last rows and rows except last rows into 2 blocks. Since sample tool only let you to skip first row  i am reversing the order with record id and sorting it by descending. I am splitting, summing up and comparing. If it matched it would give a flag "Yes". Assuming you would have to do this for each ID i am using groupby on ID.

 

Hope this helps : )

 

 

hellyars
13 - Pulsar

@atcodedog05  Cool.  Would it be possible to attach the sample workflow.  I would like to see the details behind the tools.

atcodedog05
22 - Nova
22 - Nova

My bad @hellyars . I have updated the post.

hellyars
13 - Pulsar

@atcodedog05Cool. Taking a look.  My question was prompted by the need to develop a series of checks to identify discrepancies between the actual values (what they should be, or total to be in the original PDF source document) versus the values generated by the OCR scan - which were widely off across 50% or more of the scanned data.   

hellyars
13 - Pulsar

@atcodedog05 

 

I want to build a check for HAC Recommended - the 91,711.  My data already contains a "hard coded" value.  This "hard coded" value is the result of an OCR scan.

 

I want to create a new row that calculates HAC Recommended by either adding the sum of all the "Adjustment" Rows from "Budget Request" or sums "Sum of Adjustments" from Request and compares the results against "HAC Recommended".   This will allow me to spot issues where there are OCR errors. 

 

 

atcodedog05
22 - Nova
22 - Nova

Hi @hellyars 

 

Thanks, This gives me more idea about the calculation. Here is the workflow for the usecase.

 

Workflow:

atcodedog05_0-1626847881151.png

 

Hope this helps : )

 

 

Labels