This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have a workflow that is retrieving data from a SQL Server 2000 table and writing it to an Oracle 12c database. When attempting to write some large string data to Oracle, Alteryx designer was returning the following error:
Output Data (2) DataWrap2OCI::SendBatch: ORA-24801: illegal parameter value in OCI lob function¶¶ Insert into "IR_TESTVARCHAR"("LargeTextData","SmallTextData","BadLargeText") Values (EMPTY_CLOB(),:A2,EMPTY_CLOB());
I found a couple of posts related to this error and similar issues with writing CLOB data, but hadn't found a workflow that resolved it, other than suggestions of using different drivers such as ODBC.
My investigation revealed that whenever Alteryx attempts to create an oracle table and insert character data that exceeds a length of 1023 characters, rather than write it as a varchar2 column, it writes it as a clob. I could not find any way to control this behaviour. Oracle itself can handle varchar2 data up to 4000 (and in some cases, depending on version and settings, 8000) characters.
It turns out that this error occurs when an attempt is made to write an EMPTY STRING to a CLOB in Oracle. NB, an empty string is different to a NULL. If we try to write a NULL, the error does not occur.
To show this error occurring, I produced the following simplified demonstration workflow:
In the above demo workflow, a text input contain fields with some arbitrary data which would not cause any issues, and then a formula tool has been added to create a field called BadLargeText set to '' (an empty string). The data type for BadLargeText has been defined in the formula tool as V_String (2000):
When the workflow is run, the output to Oracle (in this case using OCI) fails with the error shown above.
If the formula tool were instead re-coded as follows:
[so that instead of an empty string, the field were set to null() ], the workflow runs without error. So the error is the direct result of the treatment of writing an empty string to a CLOB in Oracle.
For my purposes, I am happy to store NULL in my database in place of an empty string. (In Oracle, there is no such thing as an empty string anyway, when writing to varchar2 columns: an attempt to place an empty string into a varchar2 column will result in Oracle writing a null there anyway), so I am of the opinion that it won't matter to me if I use the same treatment for CLOB data.
Of course, this is a demo workflow and in reality, I don't have the luxury of simply changing a formula tool to "null"
The trick then is how to handle this both gracefully and generically? I wanted a section of workflow that will handle this situation with any large string (clob) field thrown at it, without any prior knowledge of the field name. At some point, I'll probably create this as a "handleEmptyClobs" macro for use elsewhere.
The following workflow is my solution:
The workflow obtains the field info for the dataset, and extracts NAME, TYPE and SIZE. If there is a String field of 1K in length or greater, it performs a dynamic replace on it. If the value is an empty string, it replaces it with null, otherwise it just leaves it as the current value. The data can then be written to Oracle.
I have uploaded the workflow, in case others might find it useful. [Obviously you'd need your own Oracle db to test against... ]
Of course if anybody knows of any alternative (simpler) ways of handling this, or spots any flaws in this approach, please feel free to share!
I suggest using the Multi-Field Formula tool as it will allow you to remove the blank string values for a whole file at once. You can filter on different column types and select any or all columns. Note that the Data Cleansing tool uses a macro with Multi-Field Formula tools containing if then statements to match the replace and remove checkboxes selected in the tool.
I used this expression in testing:
IF IsString([_CurrentField_]) and IsEmpty([_CurrentField_]) THEN Null() ELSE [_CurrentField_] ENDIF
If this is a common problem, you may want to customize the macro for the Data Cleansing tool, and include your own checkbox in the tool for this issue. There is a recorded live training that shows how to do this. The macro editing discussion starts after minute 27 in this recording: Using the Data Cleansing Macro.
The multifield tool seems to do the job fine. I have looked at using the data cleanse tool and although I have successfully modified the macro to perform the empty-null conversion, I am not sure how to modify it to allow dynamic field selection.
It appears that I always have to configure it with the specific field names selected (there isn't the 'unknown' option on the configuration)
While the macro for the Data Cleansing tool can be customized to include new cleansing options, it is limited to cleansing the columns manually selected in the Select Fields to Cleanse section. Dynamic or automated column selection is not a capability built into the existing macro.
A macro could be configured to start with a List Box from the Interface tool bar that gets its list of column values from an external source. Using an external source file could be a time saver, but the files with the different lists of columns would still have to be created, and a file selected at run time.
So this might not be the best way of achieving the result, but having a play with DataCleanse macro, and I thought what about introducing a new checkbox "Apply to All Fields". I coudn't work out how to get the True/False value of this checkbox into every one of the other formulae, so instead, I used it to modify a new column added via a formula tool into the dataflow.
Here is a section of the modification to show what I mean... hopefully you can see how this fits into the rest of the original datacleanse macro, which I have removed from the diagram for simplicity
The configuration for the macro now contains the additional options at the bottom (I'd probably move the Apply to All String Fields up at the top in reality, but you get the idea)
So the macro has been modified so that we have here is a formula tool that creates a new column on the data flow with the name "CTRL_UPDATE_ALL_STRINGS", and a default value of "N".
In the new action connected to it, we have the formula : "if '" + [#1]+ "'=='True' then 'Y' else 'N' endif"
Then in each of the subsequent action tools that apply to strings, the formula is changed as appropriate to include the text
"... 'IF [_CurrentFieldName_] IN(' + [#3] + ') or ([_CurrentFieldName_] !="CTRL_UPDATE_ALL_STRINGS" and [CTRL_UPDATE_ALL_STRINGS]=="Y") THEN ...."
So now, we have each formula acting on the particular column if either it was ticked in the control's configuration, or the "Apply to All String Fields" box was ticked.
At the far end, before macro output, the control column is stripped from the dataflow again using a SELECT tool.
et voila... my data cleanse tool now handles any column name I throw at it without further configuration.
It appears to do the job from my very basic testing.
What do you think? Is there a better way of passing the control parameter into the other Action controls without modifying the dataflow in the way that I have?