Alteryx Designer Desktop Discussions

AbdulBalogun · ‎04-18-2024

I have a code that works successfully, it reads all the filenames and content of those file names within my s3 bucket for a specific date and then puts those into a data frame and prints those data frames.

code below.

 

session = boto3.Session( aws_access_key_id='', aws_secret_access_key='') 

s3 = session.resource('')

#body = 


#Create an empty list
data={"File Name":[],"Contents":[]}




my_bucket = s3.Bucket('bucketname')
for i in my_bucket.objects.filter(Prefix="2024-04-02"):
    #Fill the list with values
    data["File Name"].append(i.key)
    data["Contents"].append(i.get()['Body'].read())
#print(data)
s3DF = pd.DataFrame.from_dict(data)

#print(s3DF)
Alteryx.write(s3DF,1)

what I'm struggling with now is creating a loop, instead of using a static date as a filter, I have a input file with a list of dates, I'm using that as an input into my python tool, I want my code to create a max date field by reading the max date within that input file, and then create a new field called todays date which results in today's date, and then calculate the difference in days between by max date and todays date.

Then I want to use that information in my code so that instead of bringing in the data for a specific date using a string prefix, I want it to bring in data for all the days from my max date up until today (I'm assuming the best and most efficient way to do that is with a loop) and then similarly to the old code, put that data into a data frame and print the data.

I have sone pseudo code written below that outlines my intended logic,

session = boto3.Session( aws_access_key_id='', aws_secret_access_key='') 

s3 = session.resource('')

#Create an list/dict that contains the names of exisitng files 
# prev_data = {<READ FILE>}

#Create an empty dict
data={"File Name":[],"Contents":[]}


#Create a prefix variable
#Figure out last date on files
#Extract that date and loop from that date to today
#lastFileDate = MAX(PARSING(FileName)
#todaysDate = Today()
#DateDiff = todaysDate-lastFileDate (EXAMPLE: 4/12 - 4/9 = 3)

my_bucket = s3.Bucket('Bucketname')
for j in range(DateDiff):
    prefix_config = DATEADD(j,lastFileDate)
    finalPrefix = STIRNG(prefix_config) (Example:"4-9-2024", "4-10-2024", "4-11-2024")
    for i in my_bucket.objects.filter(Prefix=finalPrefix):
        #IF NOT(i.key in prev_data) THEN do stuff below
        
        #Fill the list with values
        data["File Name"].append(i.key)
        data["Contents"].append(i.get()['Body'].read())
#print(data)
s3DF = pd.DataFrame.from_dict(data)

#print(s3DF)
Alteryx.write(s3DF,1)

but I keep running into syntax errors, can does anybody know if this is possible? and if so, how would I accomplish this via python code? I've attached a sample input file with dates below if needed.

Thanks

apathetichell · ‎04-18-2024

Scratching my head on how Alteryx is coming into play here. if you want to do this in Python - do it in Python. If you want to do it in some combo - create the date outside of alteryx using abs(datetimeadd(datetimenow(),[date],"days")) to get the date difference. you can then feed that into a summarize tool - collect the max dates you need - and feed that into python. you can use the alteryx python tool to read that in and loop through your max date column.

What kind of errors are you seeing? Are you looking for Python help or Alteryx help?

Alteryx Designer Desktop Discussions

Hot to create a loop using the alteryx python tool / s3 buckets

Re: Change Data Type of Input Data before Reading

Re: Change Data Type of Input Data before Reading

Re: Join versus Union

Re: Filter

Re: Regex help please - Parsing a big text area