Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Limit Rows using Start Date

jagjit_singh
8 - Asteroid

Hi All,

 

I'm preparing reports with the ability show daily status of properties based on the property start and end date. The downside is that the workflow takes 10 mins to finish and the huge size of the alteryx database file.

 

To increase the workflow runtime, I use the start date for the property to start from a particular date for example 01/07/2015. Please let me know if this is the right approach if i want to build a report to show active properties on a daily basis within a particular period, lets say from 01/07/2015 - 31/08/2016

 

I have attached the sample workflow

 

Thanks

Jag

4 REPLIES 4
MarqueeCrew
20 - Arcturus
20 - Arcturus

@jagjit_singh,

 

Sounds like you've been watching YouTube:  https://www.youtube.com/watch?v=hcAcZ5ZAZPg ( @TatianaS sent me the link to this video)

 

Suggestions:

  1. Is your incoming data really from Excel?  If you are performing any data hygiene prior to this and have an opportunity to access from yxdb formats, the workflow will load data and execute faster.
  2. Instead of using a multi-row formula use a standard formula.  You aren't referencing multiple rows.
    IIF(IsNull([End date]),DateTimeLastOfMonth(),[End date])
  3. Instead of calculating a "NEW" End Date, I update the existing end date field.  I wouldn't add a new column of data if I don't need both fields later in calculations.  Less Fields on each record will make the workflow run faster.
  4. Generate rows:  
    1. I think that you're looking for an Initialization Expression that looks something like:
      MAX("2016-07-01", [Start Date])
    2. The condition expression that would bring you to then end of August, 2016 could be:
      [Daily] <= MIN([End Date],"2016-08-31")
    3. You would want to externalize these values and either use a macro to update the hard-coded dates (likely faster) or append the From/To dates to every record (this is slower) and compare the limits to the fields instead of the hard-coded values.
    4. As a general rule, after your initial read of data you would have a SELECT tool and would get rid of ALL fields that are in your data which are not used in the report.  Since your example has no data content, I don't know if this is applicable.
    5. Post the generation of the daily field, if your start date and end date fields are not needed, you should put a select following the GENERATE ROWS tool and drop the fields.
    6. A minor enhancement that might show improvements on a large set of data is to autofield the data types.  The incoming Prop Code is read as a DOUBLE, yet it has a domain of 1001 and 1002.  If the data type of Int16 (limit = 32,767) were used, then the data is reduced from 62b to 50b (a 20% reduction in data).

From the runtime tab, you can enable performance profiling.  The largest amount of time is seen in READING data and BROWSING/OUTPUT of data.  The multi-row formula replacement changed the formula time from 8.91ms to .42ms.  I think that you'll see an improvement in running with larger sets of data with that one change.  The generate rows are unequal, because I only generate rows that are contained in the desired time-frame.  If that's what you are ultimately trying to accomplish, it is better to NOT generate them in the first place than to filter the results.  The GENERATE ROWS was reduced from 3.74ms to .83ms.  If my results are not what you wanted, then that difference is moot.

 

  1. Original Profile:
    Info: Browse (4): Profile Time: 23.98ms, 41.26%
    Info: Input Data (1): Profile Time: 21.47ms, 36.96%
    Info: Multi-Row Formula (3): Profile Time: 8.91ms, 15.33%
    Info: Generate Rows (2): Profile Time: 3.74ms, 6.44%
    
  2. Revised Profile:
    Info: Input Data (1): Profile Time: 35.55ms, 55.35%
    Info: Browse (4): Profile Time: 26.81ms, 41.75%
    Info: Generate Rows (2): Profile Time: 0.83ms, 1.30%
    Info: Formula (5): Profile Time: 0.42ms, 0.66%
    Info: Select (6): Profile Time: 0.32ms, 0.50%
    Info: Select (7): Profile Time: 0.29ms, 0.45%

 

Hopefully this will help you.

 

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
jagjit_singh
8 - Asteroid

Hi Mark,

 

The video did make me laugh but its true.

 

1. My data is coming from an SQL database and data goes back to 2010 but since the report is required from 2015 onwards i put in the manual dates. Once the data is prepare i will be exporting the data to a tableau tde file\publish to tableau server. I noticed when the data is exported to tableau tde the file size is much smaller than alteryx database file. What would be the best approach to read data from sql database?

 

2. For the end date, i like it to automatically increament to the next month ( the active month the workflow is refreshed.) hence the use of the active month end function

 

3.I used multi-row as I wanted to show blanks for end date if the property is not handed back and have a seperate column for Daily Date which i then use to join with the calendar macro to bring in  weeks, months, quarter etc.

 

4. Yeah use the select tool in the begining to remove unwanted columns

 

5. Post generation...did not understand this point. Drop which fields?

 

 

Regards,

Jag

MarqueeCrew
20 - Arcturus
20 - Arcturus

It makes more sense now. 

 

When you write to a large .tde file, Alteryx seems to go through a two-stage output process in the construction of the dataset.  Much of your time may be in the construction of the data extract.  You might want to time a workflow that writes to a .CSV and compare the time for the .TDE.  I think that you'll see that a large amount of time is spent post-Alteryx processing.

 

Post generation, if there are any fields that are not needed, they should be dropped.  If you are using the fields, then you won't need to drop them.  But if you are using a 4 byte field to determine the rental status, you could be creating a boolean field which is 1 Bit (true/false) for Rented and droping the 4-Byte field (1 bit vs 32 bits).

 

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
jagjit_singh
8 - Asteroid

Hi Mark,

 

This forumula helped limit the rows

 

MAX("2016-07-01", [Start Date])

 

 

Thanks

Jag

Labels