Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Login and download file from site with crsf

jason_scarlett
10 - Fireball

Requirement

Download a csv file from a website that requires a login (and is blocking cross site scripting - crsf).

 

Solution Attempt 1 - use curl

A simple curl command from DOS seemed to fail due to csrf token requirement. After many hours of searching and trial and error, I wasn't able to build the command myself. I was able to use Chrome+F12+copy as cURL (cmd) ... to get the actual code that works. The code seems to hardcode the csrf token + cookie? + username/password into the cURL command, but I don't really understand it, and the next day it stops working.

 

From another post (https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Using-Download-Tool-to-get-file-from-s...)  @Troy 

curl -c cookies.txt -X POST -F username=<Your_UserName> -F password=<Your_Password> https://<URL of login page> - H "Accept: application/json" --next -X GET https://<URL>/internal/export_Report_EstimatedBilling?format=excel --output text.xls

.. doesn't work in my case.

 

Solution Attempt 2 - use the download tool

Use the Download tool, add username/password, but it gives errors that I don't understand. I suspect it is a similar issue as above. Maybe the csrf token is not being sent.

 

Any thoughts/approaches on how to do this better?

 

 

HTTP/1.1 302 Found
Connection: keep-alive
Server: skilljar
Date: Wed, 15 Jan 2020 15:25:59 GMT
Location: /auth/login?next=/analytics/enrollments.csv%3Fdraw%3D1%26start%3D0%26length%3D25%26skip_total_count%3Dtrue%26order%255B0%255D%255Bcolumn%255D%3D5%26order%255B0%255D%255Bdir%255D%3Ddesc%26registered_at%3Dall%26completed_at%3Dall%26latest_activity_at%3Dall
Vary: Cookie
X-Ua-Compatible: IE=edge
P3p: CP="This is not a P3P policy!"
Pragma: no-cache
Cache-Control: no-cache, no-store, must-revalidate, private
Content-Type: text/html; charset=utf-8
Expires: Wed, 15 Jan 2020 15:25:59 GMT
Set-Cookie: sj_ubid=bb8c7d95-0590-4690-9b92-9aaee94aad49; expires=Tue, 10-Jan-2040 15:25:59 GMT; httponly; Max-Age=630720000; Path=/; secure
Content-Length: 0
Via: 1.1 vegur

HTTP/1.1 302 Found
Connection: keep-alive
Server: skilljar
Date: Wed, 15 Jan 2020 15:25:59 GMT
Content-Type: text/html; charset=utf-8
X-Ua-Compatible: IE=edge
Location: https://dashboard.skilljar.com/login?next=%2Fanalytics%2Fenrollments.csv%3Fdraw%3D1%26start%3D0%26le...
P3p: CP="This is not a P3P policy!"
Expires: Wed, 15 Jan 2020 15:25:59 GMT
Pragma: no-cache
Vary: Cookie
Cache-Control: no-cache, no-store, must-revalidate, private
Set-Cookie: sj_ubid=52247fc9-ff79-46d6-bfbb-830cb27d69fd; expires=Tue, 10-Jan-2040 15:25:59 GMT; httponly; Max-Age=630720000; Path=/; secure
Content-Length: 0
Via: 1.1 vegur

HTTP/1.1 200 OK
Connection: keep-alive
Server: skilljar
Date: Wed, 15 Jan 2020 15:25:59 GMT
Vary: Cookie
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-Xss-Protection: 1
X-Ua-Compatible: IE=edge
P3p: CP="This is not a P3P policy!"
Pragma: no-cache
Cache-Control: no-cache, no-store, must-revalidate, private
Content-Type: text/html; charset=utf-8
Content-Security-Policy: frame-ancestors 'self' academy.skilljar.com
Expires: Wed, 15 Jan 2020 15:25:59 GMT
Set-Cookie: csrftoken=X4J5NsP4AehPs6u59tLwTOVXXbvYPlmPeezhN7GF69Y5MSeMjyEZ2z3CEKr2BaTw; expires=Wed, 13-Jan-2021 15:25:59 GMT; Max-Age=31449600; Path=/; secure
Set-Cookie: sj_ubid=6db248e3-241a-49f4-b538-0cb35dc24adf; expires=Tue, 10-Jan-2040 15:25:59 GMT; httponly; Max-Age=630720000; Path=/; secure
Content-Length: 4004
Via: 1.1 vegur

 

 

2 REPLIES 2
jason_scarlett
10 - Fireball

A colleague of mine @klonergan  solved it using python.

I will sanitize and post the solution. 

jason_scarlett
10 - Fireball

Run this via a python tool:

 

# List all non-standard packages to be imported by your 
# script here (only missing packages will be installed)
from ayx import Package
#Package.installPackages(['pandas','numpy'])

from ayx import Alteryx
import requests
import pandas as pd
from io import StringIO

user = 'your.email@address.ca'
password = 'your_password_for_Skill_jar_login'
site_url = 'https://dashboard.skilljar.com/login'

# Read in fileURL
file_url = Alteryx.read("#1")
print(file_url)

#parameters for csv download
params = {'draw':'1',
          'start':'0',
          'length':'25',
          'skip_total_count':'true',
          'order[0][column]':'5',
          'order[0][dir]':'desc',
          'registered_at':'all',
          'completed_at':'all',
          'latest_activity_at':'all'}

#didn't work without required referer header ...
#https://stackoverflow.com/questions/13567507/passing-csrftoken-with-python-requests
#looked at Chrome inspection to get required field names (email/password1/csrfmiddlewaretoken)
with requests.Session() as session:
    session.get(site_url)
    csrftoken = session.cookies['csrftoken']
    login_data = dict(email=user, password1=password, csrfmiddlewaretoken=csrftoken)
    post = session.post(site_url, data=login_data, headers=dict(Referer='https://dashboard.skilljar.com/'))
    data = session.get(file_url.FileURL[0],data=login_data, params=params)

#make a pandas dataframe for exporting to Alteryx
df = pd.read_csv(StringIO(data.text))
#columns after 13th are all null/garbage
df = df.iloc[:, : 13]
#Output to Alteryx workflow/anchor 1
Alteryx.write(df,1)

 

Labels