Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Dev Space

Customize and extend the power of Alteryx with SDKs, APIs, custom tools, and more.

Bug with CSV files containing quoted fields

360Andy
7 - Meteor

We have found what appears to be a bug in Alteryx.

If you load data from a CSV file containing quoted fields, then any blank fields get a value from a previous record.

 

The simplest way to replicate the bug is using the SampleOutput component included in the SDKSample. This exhibits the problem out-of-the-box, with a workflow as simple as an Input Data component linked to the SampleOutput component.SampleOutput.png

 

 

Input example1.csv:
"10","MR W DAYTON","PALMER AIR CHARTERS, INC","SUITE 106","7350 AIRPORT ROAD","WILMINGTON","DELAWARE","19801","212 249 7586"
"11","MR K SIPPLE","FEIDER COMPANIES INC","POST OFFICE BOX 56259","FEIDER PARKWAY","MINNEAPOLIS","MINNESOTA","55459",""
"12","TAMMY TASAN","ANDERSEN LAMB INC","2300 MARIE VICTORIN/05CAZ","LONGUEUIL QUEBEC","","","",""

Output example1output.csv (extraneous fields highlighted):
10,MR W DAYTON,PALMER AIR CHARTERS, INC,SUITE 106,7350 AIRPORT ROAD,WILMINGTON,DELAWARE,19801,212 249 7586
11,MR K SIPPLE,FEIDER COMPANIES INC,POST OFFICE BOX 56259,FEIDER PARKWAY,MINNEAPOLIS,MINNESOTA,55459,212 249 7586
12,TAMMY TASAN,ANDERSEN LAMB INC,2300 MARIE VICTORIN/05CAZ,LONGUEUIL QUEBEC,MINNEAPOLIS,MINNESOTA,55459,212 249 7586

Does anyone know of a workaround for this issue?

19 REPLIES 19
cmcclellan
13 - Pulsar

I think it must be a problem with the SampleOutput macro.  I don't have that in my installation, but I used a Browse tool and checked the Input tool and they are both reading the data perfectly.

360Andy
7 - Meteor

SampleOutput is not a macro, it's C++ SDK example.

The problem is in the SDK class FieldBase. The accessor methods to fetch a field value. E.g. GetAsWString() return the value from a previous record when they should return null.

 

There is nothing that I can see wrong with the SampleOutput code:

 

for (unsigned x=0; x<m_recordInfoIn.NumFields(); x++)
{
    const FieldBase* pField = m_recordInfoIn[x];
    if (x > 0)
        m_outfile << m_separator;
    TFieldVal<WStringVal> field = pField->GetAsWString(pRecord);
    if (!field.bIsNull)
        m_outfile << field.value.pValue;
}

 

 

It's just that the FieldBase class is doing something wrong.

The Browse tool doesn't have this problem, so it is obviously doing something different, but using these FieldBase class methods is the only way for a Custom Component developer to access the field values.

 

MichaelCh
Alteryx
Alteryx

What does II_PushRecord look like on the SampleOutput tool?

360Andy
7 - Meteor

That's where the code above was taken from. Here's the whole method:

 

long SampleOutputInterface::II_PushRecord(const RecordData * pRecord)
{
    // Our input is providing us with a new record to process.

    // If this is the first record, then we need to open (or create) the
    // file, then write the field names as the first record.
    if (!m_outfile.is_open())
    {
        m_outfile.open(ConvertToAString(m_strFilename));
        if (!m_outfile.is_open())
            throw Error(L"Cannot open the specified file for writing");

        // Write the field headers.
        for (unsigned x=0; x<m_recordInfoIn.NumFields(); x++)
        {
            const FieldBase* pField = m_recordInfoIn[x];
            if (x > 0)
                m_outfile << m_separator;
            m_outfile << pField->GetFieldName().c_str();
        }
        m_outfile << L"\n";
    }

    // Write the record.
    for (unsigned x=0; x<m_recordInfoIn.NumFields(); x++)
    {
        const FieldBase* pField = m_recordInfoIn[x];
        if (x > 0)
            m_outfile << m_separator;
        TFieldVal<WStringVal> field = pField->GetAsWString(pRecord);
        if (!field.bIsNull)
            m_outfile << field.value.pValue;
    }
    m_outfile << L"\n";

    // Return 1 to signify that we processed the incoming record properly.
    return 1;
}
MichaelCh
Alteryx
Alteryx

There's something fishy going on here, but, like yourself, I can't tell what it is. My suggestion would be to debug it and see what the values are at each step of that for loop and go from there--maybe you'll figure it out, but if not you'll be able to narrow the problem down to something more specific.

MichaelCh
Alteryx
Alteryx

@Nedand I looked into this a little bit more. We both feel like there must be something else going on here, but I did come away with one idea. What happens if you replace

 

        if (!field.bIsNull)
            m_outfile << field.value.pValue;

with

 

        if (!field.bIsNull)
            m_outfile.write(field.value.pValue, field.value.nLength);

 

cmcclellan
13 - Pulsar

I'm probably stating the bleedingly obvious here and I don't know C++ at all, so can't offer some code suggestions.

 

BUT .... the problem is basically that the old values are being "carried over" to the next record.  I can't see the code that wipes out the values for the previous record before loading in the values for the next record.

 

Would the solution be as simple as adding some code that simply loops through all the fields and clears them before adding the new values in ?

MichaelCh
Alteryx
Alteryx

The API for the record classes in the C++ SDK is a little odd. The fields are being used to pull data out of the pRecord, which represents the data passed in from the upstream field. So it doesn't need to be reset in the tool because the upstream tool has already passed new data in. The problem here looks like some kind of caching issue in the FieldBase object. My suspicion (as implied by my last post) is that the string returned from GetAsWString is somehow not being null-terminated correctly, but that the length is still correct. If that's not it, I'm not sure what's going on.

360Andy
7 - Meteor

Thanks for your help guys.

I have stepped through the SampleOutput code but can't find any way to detect that the value being returned is bogus.

 

Looking at the documentation for FieldBase there is a method called GetNull():

"This function determines whether the field's value in the specified RecordData structure is NULL."

 

I changed the SampleOutput code to use this but it doesn't make any difference.

 

        if (!pField->GetNull(pRecord))
        {
            TFieldVal<WStringVal> field = pField->GetAsWString(pRecord);
            if (!field.bIsNull)
                m_outfile << field.value.pValue;
        }

 

(Edit: I thought I'd found the solution here, but actually I'd already tried this back in September)