Join the Alteryx Community’s Maveryx Summer Cup event! Compete, network with others, and earn your gold through a series of challenges from July 24th to August 11th. Learn more about the event here.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Fetching Incremental data from source in excel

Maitry_Joshi03
6 - Meteoroid

Problem Statement:

I have a daily updating Excel file. (approx. 100rows)

I load it through input tool and using formula tool do some calculations on it create an output file and save it.

Now, the next day the excel sheets is updated with new data (incremental data) so now it has 150 rows.

But as I have already done calculations for 100rows I want to fetch only the incremental data from the source excel file itself.

So, I want only new 50 rows in my input data run the calculations and get updated in the output file.

I am very new to Alteryx, and I could only find ways to load the whole data and then filtering it through various logics as my solution, but I want to fetch only the incremental data from the start. How can I do that in excel?

Attached a screenshot of the sample file and workflow.

In the workflow I want the input data to only give me incremental data for the newer dates through input itself.

7 REPLIES 7
Prometheus
12 - Quasar

@Maitry_Joshi03 When the Excel file updates every day, is there a run date in the data? If so, you could use this to filter out only what you need. If not, you can create your own history and join your historical records to your Excel file. The unjoined records are the new records. These are useful in theory, but it would help us tremendously if you could upload some dummy data as a baseline to work with. 

Rags1982
10 - Fireball

Agree with @Prometheus , Either having a run date in the Excel file, or joining and using un-joined data is the way, but some data would be useful. 

Maitry_Joshi03
6 - Meteoroid

@Prometheus Yes, I have a daily date updated in the file but every day the file gets bigger thus I want only the new dates data to be fetched. The method you told, in my understanding first I would have to load the whole file through input data and then filter it by using the filter tool this is an issue moving forward as the size will keep increasing and to initially load the file through input tool is unnecessary. I want to know is there any way through which I can extract only the newer dates data from the excel files?

Sharing you the screenshot of the sample data and workflow.

 

Screenshot 2023-10-31 183714.pngScreenshot 2023-10-31 183913.png

Prometheus
12 - Quasar

@Maitry_Joshi03 In this solution, the historical data file (SampleSuperstore10302023.csv) includes all data up to yesterday and is connected to the L input anchor of the Join tool. The file called SampleSuperstore10312023.csv includes all data up to today and is connected to the R input anchor of the Join tool. I joined them on Order Date and Product Name. The data coming out of the L output anchor of the Join tool is new data. I can now process that new data by itself. I can also take the outputs of L and J and Union them to output and overwrite my historical file. Another option there is to take the data coming out of the L anchor and append it to your historical file. That makes more sense with an Excel file. I've typically done this type of historical file comparison with a YXDB, which you cannot append to.

Maitry_Joshi03
6 - Meteoroid

@Prometheus Thank you for the solution and the workflow. The problem I am facing is that you are loading the whole excel/csv files first and then separating it through join tool correct me if I am wrong. What I want is whether it is possible that I get incremental data from the source itself while fetching through input tool or any tool if it's possible? I have one file which gets updated daily and over the months it's size will increase and just to make the process streamline I only want to extract the incremental data from the file from the start itself in my workflow. Is it possible to do so? 

Prometheus
12 - Quasar

@Maitry_Joshi03 This is possible if you're connected to a database. That way you can modify your SQL query with a dynamic date filter. Since it's Excel you'll have to bring it all in. How long does it take to run? I can't imagine that bringing data in from an Excel file will take long, even on your desktop.

CharlieS
17 - Castor
17 - Castor

Hi @Maitry_Joshi03 

 

Unfortunately what you're looking for is not possible with this data source. Here's a few alternatives and pros/cons:

1) One Excel file: What you have now is a csv file, not an Excel file (Excel can open it, but it's not a native Excel file). If you have an xlsx file, you can specify a cell range to open in the Input tool. Of course, this would require you to know the exact row where your daily data starts before opening the file that day (like keep a log of row positions). This might be the most simple approach, but would be unreliable when it's based on position instead of data value, and knowing that position.

 

2) Daily csv files: A balanced approach might be to go to the data source that is creating/updating this and ask that instead of updating the same file, that a new csv file be created each day. This way you can specify the date and only load the day(s) needed with a Dynamic Input tool.

 

3) One Calgary file: Again go back to the file/data source but instead have it write an Alteryx Calgary file. This is a ysdb files with index support. This would all you to specify the date and only return the rows you like. An index on the data is necessary so the workflow knows exactly which rows are associated with which data values. So you would create an index on the [Date] field when writing the file each day. 

 

4) Database: As others have mentioned, store this data in a database. Like the Calgary file, databases support indices so it knows what rows to return when specific dates are requested. 

 

Does this make sense? With the csv file you are currently working with, there's no way to know what rows to load, or a way to partially load them from that file type.

Labels