A normal question / issue / challenge...
How can you search for a specific value in a column when the column position and name might change across input files passed through a batch macro?
Macro reads in multiple files to union large table. Schema variability due to PDF to Text variability. Details below...
CONTEXT:
I have 230 input .csv files (each a page of a very long table).
I used AWS Textract to extract the table data from PDF files.
When using Textract to extract table data the output includes both the table rows and its confidence scores for each table row.
The table data rows will be at the top and the confidence scores at the bottom of each file with a few null() rows between.
The start of the confidence data will also have the phrase "Confidence Scores % (Table Cell)" in the first column in the row before the scores (see image example below).
Importantly, "Confidence" will only ever appear in this context (and only once per file).
Normally, I use a formula tool to create a KEEP field and use an expression if
if startswith([Field_1],'Confidence') then 'No' else null() endif
I then use a Multi-Row Tool with to fill-down "No" to tag all the rows that contain the confidence scores and not table data. I can then filter out the rows containing the confidence data with a Filter Tool.
if isnull([KEEP]) then [Row-1:KEEP] else [KEEP] endif
PROBLEM:
I use a batch macro to union the 230 or so files together.
Due to PDF to Text challenges, the schema is not the same.
The batch macro is configured to Auto Configure by Name setting to allow for these subtle differences.
That means that the value Confidence Scores % (Table Cell) will not always appear in the first column. It may appear in the second (and possibly the third).
It also means that the name of the first column may not always be the same (again due to the challenges of PDF to Text).
IMPORTANT NOTE:
I get rid of the confidence data rows in the batch macro so that the union output only includes table row data and no confidence data rows.
QUESTION:
How can I search for startswidth "Confidence" across multiple fields when I do not know the potential field name?
Can you do something similar to if FieldNumber
and startswidth "Confidence" then TEST = No
something along the lines of (not literally)
if FieldNumber <3 and startswidth "Confidence" then TEST = No
