Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Login and download file from site with crsf

jason_scarlett
10 - Fireball

Requirement

Download a csv file from a website that requires a login (and is blocking cross site scripting - crsf).

 

Solution Attempt 1 - use curl

A simple curl command from DOS seemed to fail due to csrf token requirement. After many hours of searching and trial and error, I wasn't able to build the command myself. I was able to use Chrome+F12+copy as cURL (cmd) ... to get the actual code that works. The code seems to hardcode the csrf token + cookie? + username/password into the cURL command, but I don't really understand it, and the next day it stops working.

 

From another post (https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Using-Download-Tool-to-get-file-from-s...)  @Troy 

curl -c cookies.txt -X POST -F username=<Your_UserName> -F password=<Your_Password> https://<URL of login page> - H "Accept: application/json" --next -X GET https://<URL>/internal/export_Report_EstimatedBilling?format=excel --output text.xls

.. doesn't work in my case.

 

Solution Attempt 2 - use the download tool

Use the Download tool, add username/password, but it gives errors that I don't understand. I suspect it is a similar issue as above. Maybe the csrf token is not being sent.

 

Any thoughts/approaches on how to do this better?

 

 

HTTP/1.1 302 Found
Connection: keep-alive
Server: skilljar
Date: Wed, 15 Jan 2020 15:25:59 GMT
Location: /auth/login?next=/analytics/enrollments.csv%3Fdraw%3D1%26start%3D0%26length%3D25%26skip_total_count%3Dtrue%26order%255B0%255D%255Bcolumn%255D%3D5%26order%255B0%255D%255Bdir%255D%3Ddesc%26registered_at%3Dall%26completed_at%3Dall%26latest_activity_at%3Dall
Vary: Cookie
X-Ua-Compatible: IE=edge
P3p: CP="This is not a P3P policy!"
Pragma: no-cache
Cache-Control: no-cache, no-store, must-revalidate, private
Content-Type: text/html; charset=utf-8
Expires: Wed, 15 Jan 2020 15:25:59 GMT
Set-Cookie: sj_ubid=bb8c7d95-0590-4690-9b92-9aaee94aad49; expires=Tue, 10-Jan-2040 15:25:59 GMT; httponly; Max-Age=630720000; Path=/; secure
Content-Length: 0
Via: 1.1 vegur

HTTP/1.1 302 Found
Connection: keep-alive
Server: skilljar
Date: Wed, 15 Jan 2020 15:25:59 GMT
Content-Type: text/html; charset=utf-8
X-Ua-Compatible: IE=edge
Location: https://dashboard.skilljar.com/login?next=%2Fanalytics%2Fenrollments.csv%3Fdraw%3D1%26start%3D0%26le...
P3p: CP="This is not a P3P policy!"
Expires: Wed, 15 Jan 2020 15:25:59 GMT
Pragma: no-cache
Vary: Cookie
Cache-Control: no-cache, no-store, must-revalidate, private
Set-Cookie: sj_ubid=52247fc9-ff79-46d6-bfbb-830cb27d69fd; expires=Tue, 10-Jan-2040 15:25:59 GMT; httponly; Max-Age=630720000; Path=/; secure
Content-Length: 0
Via: 1.1 vegur

HTTP/1.1 200 OK
Connection: keep-alive
Server: skilljar
Date: Wed, 15 Jan 2020 15:25:59 GMT
Vary: Cookie
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-Xss-Protection: 1
X-Ua-Compatible: IE=edge
P3p: CP="This is not a P3P policy!"
Pragma: no-cache
Cache-Control: no-cache, no-store, must-revalidate, private
Content-Type: text/html; charset=utf-8
Content-Security-Policy: frame-ancestors 'self' academy.skilljar.com
Expires: Wed, 15 Jan 2020 15:25:59 GMT
Set-Cookie: csrftoken=X4J5NsP4AehPs6u59tLwTOVXXbvYPlmPeezhN7GF69Y5MSeMjyEZ2z3CEKr2BaTw; expires=Wed, 13-Jan-2021 15:25:59 GMT; Max-Age=31449600; Path=/; secure
Set-Cookie: sj_ubid=6db248e3-241a-49f4-b538-0cb35dc24adf; expires=Tue, 10-Jan-2040 15:25:59 GMT; httponly; Max-Age=630720000; Path=/; secure
Content-Length: 4004
Via: 1.1 vegur

 

 

2 REPLIES 2
jason_scarlett
10 - Fireball

A colleague of mine @klonergan  solved it using python.

I will sanitize and post the solution. 

jason_scarlett
10 - Fireball

Run this via a python tool:

 

# List all non-standard packages to be imported by your 
# script here (only missing packages will be installed)
from ayx import Package
#Package.installPackages(['pandas','numpy'])

from ayx import Alteryx
import requests
import pandas as pd
from io import StringIO

user = 'your.email@address.ca'
password = 'your_password_for_Skill_jar_login'
site_url = 'https://dashboard.skilljar.com/login'

# Read in fileURL
file_url = Alteryx.read("#1")
print(file_url)

#parameters for csv download
params = {'draw':'1',
          'start':'0',
          'length':'25',
          'skip_total_count':'true',
          'order[0][column]':'5',
          'order[0][dir]':'desc',
          'registered_at':'all',
          'completed_at':'all',
          'latest_activity_at':'all'}

#didn't work without required referer header ...
#https://stackoverflow.com/questions/13567507/passing-csrftoken-with-python-requests
#looked at Chrome inspection to get required field names (email/password1/csrfmiddlewaretoken)
with requests.Session() as session:
    session.get(site_url)
    csrftoken = session.cookies['csrftoken']
    login_data = dict(email=user, password1=password, csrfmiddlewaretoken=csrftoken)
    post = session.post(site_url, data=login_data, headers=dict(Referer='https://dashboard.skilljar.com/'))
    data = session.get(file_url.FileURL[0],data=login_data, params=params)

#make a pandas dataframe for exporting to Alteryx
df = pd.read_csv(StringIO(data.text))
#columns after 13th are all null/garbage
df = df.iloc[:, : 13]
#Output to Alteryx workflow/anchor 1
Alteryx.write(df,1)

 

Labels