I have a set of data that lists all of the dividend transactions for hundreds of different stocks (each with a corresponding unique CUSIP#), but many of these stocks pay multiple dividends throughout the year, so this data has anywhere from 10 to 2000 entries per stock. If I build a formula and append a "# to Sample" field to that data set, is there anyway I can sample "X" number of dividend records for a given stock based on that value in the "# to Sample" field? Within the standard sample tool, I don't see a sample based on formula. If it let me put N equal to a variable or field (i.e. "# to Sample") then it would be problem solved I guess.
Any thoughts?
That # to Sample would be calculated using a summarize function and then a formula to calculate how many per CUSIP, and then I guess I would run the sample off of the raw data.
Hopefully this makes sense. TIA for your help
Solved! Go to Solution.
I'm not sure if there's a way to do it exactly as you've explained, but I've done something similar where I grouped by CUSIP, and then sampled a given percent of the records. This should allow you to sample, lets say 50% of each stock's entries, or 1 for every 5 records. That being said, this uses the standard Sample Tool, and so it will not be possible to ensure you get the exact same sample more than once (set a seed).
If you can give a more precise explanation of what sort of # to sample you'd be looking for, it might be easier to say whether or not its feasible to do it this way.
Let me know if this helps,
Cheers!
Hmm.....I might be misunderstanding, but it sounds like a batch macro might work. What you would do is have your table of raw data with a CUSIP# field. You also have a table that lists each CUSIP# uniquely, and the number of random records you want for each unique CUSIP#.
What you would do is take that data that has the number of random records for each unique CUSIP#, and feed it into a "tool" that asks you to identify your CUSIP# field as well as the number of random records field. You run it and the output is a union of each iteration of the CUSIP#-random-records-to-grab combination from the raw file.
Does that sound like what you're looking for? If so, I've attached the guts of it (the module and the macro).
Your understanding sounds right, I'm taking a look at this now. Will keep you posted on whether its what I need. Thanks for your help!
Its more for an audit-type use so it all depends on the dollar value, so since each CUSIP would have varying $ amounts, they would all have different sample sizes. I think the post below is more of what I would need but still digging and testing to see. Thanks a lot for your help and time though!
Thank you for posting your Alteryx workflow in the comment.
I have just recently started with Alteryx and I am faced with a challenge, that I would need to create an output so that each location has randomly selected 3 different contract numbers. In the attached workflow you can see, that I have file "Records_revB" where my locations and number of records to grab per location data is placed and then I have "test_revB" where I have the data I want to sample.
I have tried to generate sampled data based on field "LocationLang" and "Location", but the latter does not give me any result even if I put the in the macro the filter "[Location]==7201" I get 0 results. Even though it reads in the 79 different groups.
When going with "LocationLang" I get 79*3=237 records but all for the same location I have in the filter tools condition [LocationLang]=="7201 KC KÄRRA".
I have also attached my workflow in the comments, but would very glad if you could be able to identify or explain to me why my workflow is not producing the same results that your workflow is producing effortlessly.
Hello,
Your assets in your workflow didn't come through. In your module, go to Options, Export workflow. Follow the prompts. A YXZP file will be created. Re-post with the YXZP attached.
Thanks.
Hi
Thanks for the reply. Please find the workflow package below.
I really appreciate the help!
Best regards
Jaanis
Okay, a couple things. First, what you did was great and will work just fine. Just need a couple tweaks in your macro that I'll go over shortly. Second, I built a macro for myself some time ago that will allow you to just connect the revB file to it, and not have to worry about creating a records_revB file.
For your macro, in the action tool for the location, what you're doing is telling Alteryx to replace something that doesn't exist. You have it replacing "12345" in the line "7201 KC KARRA". It's not doing anything because there is no "12345" in the line "7201 KC KARRA". In order to make the action tool do what you want here, you need to replace 7201 KC KÄRRA (without the quotes). So it should look like this:
Same thing with the other action tool. You don't have the correct line at the top selected, and you're not replacing the correct value. This is what is should look like:
If you make those changes, then it should work just fine.
That being said, I attached the macro I built a while back. Just connect it directly to your test_revB.xlsx file, select "3" for the Random N Records, and in the grouping section, select the field from which you want to select the random records (here, either Location or LocationLang):
Your macro worked perfectly! Thank you for that. It seems at first glance very technical for my eyes so I have to educate myself and then hopefully I am able to read it and maybe even produce something similar.
On another note, after applying the changes to my Test.yxmd workflow I have gotten the following error:
"Test_Macro (8) Record #79: Tool #2: Parse Error at char(5): Parse Error."
After looking into the action tool #2 I could see that since I was using LocationLang, which is a 4 digit number+space+text, therefore the 5th digit was space. Could you perhaps shed light, why I got this error even though the whole text should be interpreted as is between quotes.
I might as well be far from the truth regarding char(5) parse error, but that was something I could figure out.
BUT when I changed the grouping to Location a 4 digit double, then my workflow worked as well, which was a nice feeling I must say.
Nevertheless I must compliment you on your vast skills and your kindness to help.