<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Python SDK Batch Processing in Dev Space</title>
    <link>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/94733#M155</link>
    <description>&lt;P&gt;I've built out an optimization algorithm in Python and I'm preparing to integrate it into the Python SDK in Alteryx. So far, the documentation I've seen seems to be geared towards row-by-row processing of data, but my algorithm processes data in a batch format (all data must be present first). Does Alteryx have any examples/best practice suggestions for how to handle this using the SDK?&lt;/P&gt;</description>
    <pubDate>Tue, 05 Dec 2017 15:51:50 GMT</pubDate>
    <dc:creator>jraad</dc:creator>
    <dc:date>2017-12-05T15:51:50Z</dc:date>
    <item>
      <title>Python SDK Batch Processing</title>
      <link>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/94733#M155</link>
      <description>&lt;P&gt;I've built out an optimization algorithm in Python and I'm preparing to integrate it into the Python SDK in Alteryx. So far, the documentation I've seen seems to be geared towards row-by-row processing of data, but my algorithm processes data in a batch format (all data must be present first). Does Alteryx have any examples/best practice suggestions for how to handle this using the SDK?&lt;/P&gt;</description>
      <pubDate>Tue, 05 Dec 2017 15:51:50 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/94733#M155</guid>
      <dc:creator>jraad</dc:creator>
      <dc:date>2017-12-05T15:51:50Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK Batch Processing</title>
      <link>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/94754#M156</link>
      <description>&lt;P&gt;&lt;a href="https://community.alteryx.com/t5/user/viewprofilepage/user-id/3352"&gt;@jchadwick&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.alteryx.com/t5/user/viewprofilepage/user-id/3835"&gt;@stevea&lt;/a&gt;&amp;nbsp;whats your take on this?&lt;/P&gt;</description>
      <pubDate>Tue, 05 Dec 2017 16:39:18 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/94754#M156</guid>
      <dc:creator>TashaA</dc:creator>
      <dc:date>2017-12-05T16:39:18Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK Batch Processing</title>
      <link>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/95150#M159</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.alteryx.com/t5/user/viewprofilepage/user-id/19947"&gt;@jraad&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Great observation! The&amp;nbsp;reference to each record&amp;nbsp;will only exist in the ii_push_records method. The only way to preserve the incoming data in here would be to store it in memory in a data structure. Then you could apply your algorithm to this data and push the new records to the output anchor in the ii_close() method. Hope this helps!&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Dec 2017 17:34:28 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/95150#M159</guid>
      <dc:creator>Ozzie</dc:creator>
      <dc:date>2017-12-07T17:34:28Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK Batch Processing</title>
      <link>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/95537#M163</link>
      <description>&lt;P&gt;Ozzie has it right. As records come in via &lt;FONT face="courier new,courier"&gt;ii_push_record&lt;/FONT&gt; you can store them either in memory or in a temporary file. After all the records from an input have been sent through, &lt;FONT face="courier new,courier"&gt;ii_close&lt;/FONT&gt; will be called, and at that time you can then run your records through your batch process.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 16:14:02 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/95537#M163</guid>
      <dc:creator>MichaelCh</dc:creator>
      <dc:date>2017-12-11T16:14:02Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK Batch Processing</title>
      <link>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/181099#M475</link>
      <description>&lt;P&gt;Does anyone have an example of writing/reading from a temp file in this context?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Greg&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jul 2018 16:53:20 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/181099#M475</guid>
      <dc:creator>gbonnette</dc:creator>
      <dc:date>2018-07-09T16:53:20Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK Batch Processing</title>
      <link>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/186512#M519</link>
      <description>&lt;P&gt;&lt;a href="https://community.alteryx.com/t5/user/viewprofilepage/user-id/23542"&gt;@gbonnette&lt;/a&gt;&amp;nbsp;I don't know that a temp file is a good solution. Writing and Reading from disk can be expensive in terms processing time. Best to use&amp;nbsp;a collection object&amp;nbsp;like list or deque. Slightly more than pseudo code reduced to just the relevant stuff for buffering input records:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;class IncomingInterface:&lt;BR /&gt;&amp;nbsp; def __init__(self, parent: object):&lt;BR /&gt;&amp;nbsp; &amp;nbsp; self.records = deque()&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; self.record_info_in = None&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; def ii_init(self, record_info_in: object) -&amp;gt; bool:&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp;self.record_info_in = record_info_in&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; self.record_info_in_clone = record_info_in.clone()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; # Instantiate a new instance of the RecordCopier class.&lt;BR /&gt;&amp;nbsp; &amp;nbsp; self.record_copier = Sdk.RecordCopier(self.record_info_in_clone, self.record_info_in)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; # Map each column of the input to where we want in the output.&lt;BR /&gt;&amp;nbsp; &amp;nbsp; for index in range(self.record_info_in.num_fields):&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; # Adding a field index mapping.&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; self.record_copier.add(index, index)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; # Let record copier know that all field mappings have been added.&lt;BR /&gt;&amp;nbsp; &amp;nbsp; self.record_copier.done_adding()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; return True&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; def ii_push_record(self, in_record: object) -&amp;gt; bool:&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; # Creating a new, empty record creator based on record_info_out's record layout.&lt;BR /&gt;&amp;nbsp; &amp;nbsp; record_creator = self.record_info_in_clone.construct_record_creator()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; # Copy the data from the incoming record into the outgoing record.&lt;BR /&gt;&amp;nbsp; &amp;nbsp; record_creator.reset()&lt;BR /&gt;&amp;nbsp; &amp;nbsp; self.record_copier.copy(record_creator, in_record)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; # Append the object&lt;BR /&gt;&amp;nbsp; &amp;nbsp; self.records.appendleft(record_creator)&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; return True&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; def ii_close(self):&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; # Process the records in a loop&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; while len(self.out_records) &amp;gt; 0:&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; buffer_record = self.records.pop().finalize_record()&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; #do something with the record&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jul 2018 15:23:49 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/186512#M519</guid>
      <dc:creator>jwalder</dc:creator>
      <dc:date>2018-07-24T15:23:49Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK Batch Processing</title>
      <link>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/187019#M521</link>
      <description>&lt;P&gt;It is true that writing and reading from disk can be expensive. But keep in mind there is no real upper bound on the number of records that might pass through a tool. Trying to keep them all in memory may be prohibitive or even impossible. The ideal solution is to keep the records in memory up to a certain threshold and then start writing/reading from disk instead. We have some utilities internally that do this all seamlessly, and I've spoken with &lt;a href="https://community.alteryx.com/t5/user/viewprofilepage/user-id/3529"&gt;@TashaA&lt;/a&gt; about making it available in the Python SDK.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jul 2018 16:44:33 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/187019#M521</guid>
      <dc:creator>MichaelCh</dc:creator>
      <dc:date>2018-07-24T16:44:33Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK Batch Processing</title>
      <link>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/187030#M522</link>
      <description>&lt;P&gt;True enough. Disk is also a finite resource though in most physical or virtual environments. To realize "no real upper bound" requires other technology like Snowflake or Redshift Spectrum and set based operations instead of the cursors that we are effectively talking about here.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jul 2018 17:01:50 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/187030#M522</guid>
      <dc:creator>jwalder</dc:creator>
      <dc:date>2018-07-24T17:01:50Z</dc:date>
    </item>
    <item>
      <title>Re: Python SDK Batch Processing</title>
      <link>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/187884#M523</link>
      <description>&lt;P&gt;Exposing those utilities to the SDK would be awesome&amp;nbsp;&lt;a href="https://community.alteryx.com/t5/user/viewprofilepage/user-id/7428"&gt;@MichaelCh&lt;/a&gt;!&amp;nbsp; I say with 99% confidence that you guys will be able to handle memory management much better than we ever would...and that's not something I want to be good at, anyway.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jul 2018 23:06:38 GMT</pubDate>
      <guid>https://community.alteryx.com/t5/Dev-Space/Python-SDK-Batch-Processing/m-p/187884#M523</guid>
      <dc:creator>tlarsen7572</dc:creator>
      <dc:date>2018-07-24T23:06:38Z</dc:date>
    </item>
  </channel>
</rss>

