Create Calculted Sum and Compare it to Static Total to find OCR Errors

So, I had to build a data set of Congressional marks from some really low quality OCR images.

See chart below.

My question involves the last two rows with the value Committee Recommended under WHAT -- and specifically the second to law row labeled "Need Calculated Version". The last row is generated from the OCR scan, however, it is full of errors. So I need to generate the second to last row and compare against the last row to see if there is a difference . If the value is >< 0 than there is an OCR error that needs to be corrected. What is the simplest way to do this.

This is just the walk-down for one line, the real data has hundreds of lines with similar walk downs.

	ID	NAME	WHO	WHAT	TXT	AMOUNT
HARD CODED	N0181-07-02-4162-A	SLDTA	PBR	Budget Request	Request	101595
HARD CODED	N0181-07-02-4162-A	SLDTA	HAC	Adjustment	Increase X	-6385
HARD CODED	N0181-07-02-4162-A	SLDTA	HAC	Adjustment	Decrease Y	-3499
HARD CODED	N0181-07-02-4162-A	SLDTA	HAC	Sum Adjustments	Sum of all...	-9884
NEED CALCULATED VERSION	N0181-07-02-4162-A	SLDTA	HAC	Committee Recommended	HAC Recommended	91,711
ALREADY HARD CODED	N0181-07-02-4162-A	SLDTA	HAC	Committee Recommended	HAC Recommended	91,711

Help

Expression

Accepted answers

All comments

hellyars

I do know I can probably use some combo of a Sum tool...

atcodedog05

Hi @hellyars

I was kind of unsure on what is the calculation used for need calculated row hence i have taken it as sum of above rows.

Workflow:

I want to separate last rows and rows except last rows into 2 blocks. Since sample tool only let you to skip first row i am reversing the order with record id and sorting it by descending. I am splitting, summing up and comparing. If it matched it would give a flag "Yes". Assuming you would have to do this for each ID i am using groupby on ID.

Hope this helps : )

calculation match.yxmd

hellyars

@atcodedog05 Cool. Would it be possible to attach the sample workflow. I would like to see the details behind the tools.

hellyars

@atcodedog05

I want to build a check for HAC Recommended - the 91,711. My data already contains a "hard coded" value. This "hard coded" value is the result of an OCR scan.

I want to create a new row that calculates HAC Recommended by either adding the sum of all the "Adjustment" Rows from "Budget Request" or sums "Sum of Adjustments" from Request and compares the results against "HAC Recommended". This will allow me to spot issues where there are OCR errors.

Quick Links

This months top contributors

atcodedog05 19598

Qiu 15867

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703