Hello Alteryx,
I recently started using Alteryx and honestly it has been a great learning experience. I am currently working on a file and need some help with selecting some data. The attached data file represents a structure of the file which I will be using. The spreadsheet has several tables in it and I want to select the information listed under the transaction summary table.
Transaction Summary | ||||||
Trans Type | Trans Code | ID | CSA | New CSA | Dep S | Dep Name |
EE | F1 | 12345061314561 | 0104474-010-00000 | |||
D | S1 | 12345061314561 | 0104474-010-00000 | 1 | Jane Doe | |
D | S1 | 12345061314561 | 0104474-010-00000 | 2 | John Doe | |
Term | T | 31655474864298 | 0686745-010-00000 |
The attached file is just a sample file and I will have to select this data from over 100 files which has same structure. My goal is to develop a script which will just extract the information under "Transaction Summary" Table. Since I am new to using Alteryx, I am hoping someone on the forum might be able to help me. Looking forward to your responses.
Solved! Go to Solution.
As long as the cell range is in the same place in each file (i.e. it is always in cells A59:M63 as shown in your sample file), you should be able to modify the Table or Query section of the Input Data tool configuration to reference only the cells in that range. You can also name the range in Excel, if that's an option for you in your template, and that named range name will appear on the list of items you can select where you would usually designate a sheet name.
So when you are prompted to select a sheet when connecting to your initial file in the Input Tool, select the sheet, then switch it to SQL Editor view and add the cell range reference to the end of the statement (before the ' mark but after the $ sign designating the end of the sheet name you have selected). This will bring in the cells only in that data range from your files.
This will work if you change your filename to a * wildcard as well, if you're trying to bring in multiple workbooks into the same workflow all at once. Will cycle through each file in that location and bring in only the data that is located in the range specified in your configuration settings.
Hope that makes sense... let me know if you need more info!
Cheers,
NJ
Hello Nicole - Thanks for the prompt response. Unfortunately the cell range is variable and will change in other spreadsheets. I am thinking that there might be a way to use "transaction Summary" header as one of the unique elements and fetch the data under that header.
Ah yes, that does make it a bit trickier :) But not impossible! (Few things are!)
See attached/image below. You can use some creative filtering & appending to figure out which line the table starts on (based on your comment about starting with Transaction Summary line), and use that to figure out which data to eliminate to isolate just the table you need.
Give it a shot, let me know if it works for you!
NJ
There are course many ways to approach this. The first step is to identify the logic you want to use to identify a table, this can be as complex as you can imagine.
The logic I went with for the attached is: A row that is fully blank denotes a change in table.
I commented the attached workflow (opens in v11+)
The first section is loading the data, all sheets from all Excel files based on the Directory tool settings, in flexible baseline structure that enables lots of different logic to be built upon, and it uses the attached macro. For more details, on this see my presentation titled "Techniques for Handling Data from Excel and Other Messy Unpredictable Files" from https://community.alteryx.com/t5/Inspire-2017-Buzz/Inspire-2017-sessions-available-on-demand/m-p/711...
This next section generate the Table ID based on logic, and a subsection to pull out the Top Left Cell Value:
This final section may need to be copy-pasted for each table you want to pull out, since each table may have different header logic.
User | Count |
---|---|
19 | |
14 | |
13 | |
9 | |
8 |