Okay, I have an annual file.
The format of the filename is "fy##-name.abc"
Within each file year's are identified as Prior Year, Current Year, Year One, Year Two, Year Three, Year Four, Year Five.
How can I build a formula that dynamically uses the two ## digits in the file name to convert Prior Year to ## - 2, Current Year to ## -1, Year One to ##, Year Two to ## + 1, etc.?
Solved! Go to Solution.
I think your best bet here is to use information from the Directory Tool to grab the year.
I've attached a sample workflow that uses this method:
What I've done is use Directory Tool -> Dynamic input to read my data in (you'll likely have to make the directory tool look for more specific files, or add a filtering step before the Dynamic input/parse tool.
I then create a target column, with the name current year, so I can match to the transposed columns of the input file. Once I make this match, I use the multi-row formula to rename them. I've created a duplicate of the Name column here, but I don't think it's necessary in general, I just wanted to show that the numbers were generated correctly.
For this it is important that the "Values for Rows that don't Exist" is set to NULL, since I use a Null check to set Prior Year correctly.
After removing unnecessary information, I CrossTab back into the original format.
Let me know if you have any questions, or this workflow doesn't work for you for whatever reason.
Cheers!
I can kind of follow what you are trying to do. But, I think there might be another challenge. The PY, CY, Yr1... fields are not available at source input. I have to parse to extract them.
Once parsed, the data looks like this.
PY | CY | Yr1 | Yr2 | Yr3 | Yr4 | |
fy15_filename | 10 | 15 | 20 | 25 | 30 | 35 |
fy16_filename | 10 | 15 | 20 | 25 | 30 | 35 |
fy17_filename | 10 | 15 | 20 | 25 | 30 | 35 |
fy18_filename | 10 | 15 | 20 | 25 | 30 | 35 |
Individually, I simply use the select tool to rename PY, CY, Yr1... to the correct year. But, I am trying to avoid building a long lengthy formula that has to be updated every year.
14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | |
fy15_filename | 10 | 15 | 20 | 25 | 30 | 35 | 40 | |||
fy16_filename | 10 | 15 | 20 | 25 | 30 | 35 | 40 | |||
fy17_filename | 10 | 15 | 20 | 25 | 30 | 35 | 40 | |||
fy18_filename | 10 | 15 | 20 | 25 | 30 | 35 | 40 |
I think you may still be able to do something like what I've done above, you'll just need to use the GroupBy feature and group by Filename.
Instead of using the find/replace Tool, I'm just joining in the results of my Regex before the Transpose so I can group by year, then my formula just creates that same column that the previous find replace did, replacing Current Year with the year, and setting all the others to the Name column. The multi-row tool is unchanged, and the CrossTab just needs to be set to GroupBy the RegExOut field as well.
I've attached an updated workflow, let me know if this helps.
Sorry, don't mean to be retarded. The example has no input data. So, I can't see what you are joining. And, the real data has dozens of fields. So, its throwing me a curve ball.
I've just created some files with your naming scheme filled with 1s. as the data.
If you can add some mock data of what an input to this process would look like, I could put together a more realistic example workflow.
When you say it has dozens of fields, are you saying that there is like Year+12, or you're saying that there are multiple fields that need to be transformed like this?
i.e. you'd have:
Prev Year | Current | Year `1 | Year 2 | Year 3 | Year 4 | Year 5 | |
fy15_name_field1 | 49 | 51 | 52 | 53 | 55 | 58 | 62 |
fy15_name_field2 | 23 | 25 | 26 | 27 | 28 | 30 | 35 |
fy16_name_field1 | 51 | 52 | 56 | 57 | 59 | 63 | 68 |
fy16_name_field2 | 25 | 28 | 29 | 31 | 35 | 36 | 39 |
fy17_name_field1 | 52 | 54 | 55 | 58 | 60 | 61 | 62 |
If so, I think you'll want to find a way to get this field name into the workflow as a field, and group by it as I did with the dates. I can help you out if I can see a more realistic example of the data.
Hope this helps,
Okay, definitely not there ...but some small progress
I switched the filetype input from xlsx to my the xml.
The filenames are cy##_1234abc.zzz. I want the ## after cy.
I stripped everything going into the RegEx tool except filename.
I then reset the RegEx tool to (d\{2}) from (d\{4}). This grabs the 2 digits that follow cy (aka fy).
I can't sort out what the Join tool is doing just yet. It is current set on Join by Record position, which is only finding 4 joins and spitting out 29+ left.
I had the join tool on Record Position because there is one date I want to add per file. so the first file corresponds with the first date, the second with the second, etc.
The reason it's spitting out the 29+ is because there are multiple rows for each input, obviously something I should have accounted for in my workflow.
Are you able to get any sort of identifier for the file that a row comes from? This could be used in conjunction with the RecordID tool to join that identifier on RecordID instead of by position.
Is the number of rows in each file the same every time? If so, you could use the RecordID tool and a Formula tool to change it so that the first, lets say 10 rows are assigned to 1, the next 10 are assigned to 2, etc. Then, join on the RecordID from the RegEx.
The formula you'd want here is: CEIL([RecordID]/<numberofrowsperrecord>)
The attached workflow implements this, but again this will only work if the number of rows per file is constant across all files.
Let me try a different angle.
I can input all the files together in one step using *.zzz.
File Name aside, there is a string field called Calendar Year. It's just the 4 digit year.
So, I can get all the files into the data stream in one fell swoop.
This is still dummy data. But this is essentially what the data looks like going into the two (2) Select tools before the Union. (The workflow was originally developed to accept 1 year. In that case, PY, CY, Yr1... are annually changed using the Select tools.)
File Name | Calendar Year (V_String) | Project | Prior Year | Current Year | Year 1 | Year 2 | Year 3 | Year 4 | Year 5 |
cy15_idnmbr.zzz | 2015 | Tacos | $ 10 | $ 12 | $ 14 | $ 16 | $ 18 | $ 20 | $ 22 |
cy16_idnmbr.zzz | 2016 | Tacos | $ 12 | $ 14 | $ 15 | $ 17 | $ 19 | $ 21 | $ 23 |
cy17_idnmbr.zzz | 2017 | Tacos | $ 13 | $ 12 | $ 15 | $ 18 | $ 19 | $ 22 | $ 23 |
cy15_idnmbr.zzz | 2018 | Tacos | $ 17 | $ 15 | $ 19 | $ 22 | $ 24 | $ 23 | $ 26 |
It seems like calendar year isn't quite a unique identifier for the file, since the 2015 file has both 2018 and 2015.
If the calendar year 20## was always the same as the cy##, then, you could create the Calendar year from your RegEx output by adding the string "20" in front.
However, in this case I'm not sure you really need to do anything with the file name, so I must be missing something. You could simply do the transposing, renaming, and cross-tabbing using the Calendar Year field as the Grouping Field.
I'm sure I've missed something here so please let me know what it is,
Cheers!
EDIT: