Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Webscraping: How do I add "Country" & "Competition" column to the output (Python Question)

HW1
9 - Comet

This is not a designer question. Apologies for posting it here. I asked it in the Off-Topic forum but I did not get any response. I guess its not as visited as it is here.

 

I started with Alteryx Python tool however I find using pure python is easier and much more manageable in this case hence the detour.

 

I have a code that scrapes OddsPortal Using Selenium

 

from selenium import webdriver
import pandas as pd

browser = webdriver.Chrome()

class GameData:

    def __init__(self):
        self.dates = []
        self.games = []
        self.scores = []
        self.home_odds = []
        self.draw_odds = []
        self.away_odds = []


def parse_data(url):
    browser.get(url)
    df = pd.read_html(browser.page_source, header=0)[0]
    game_data = GameData()
    game_date = None
    for row in df.itertuples():
        if not isinstance(row[1], str):
            continue
        elif ':' not in row[1]:
            game_date = row[1].split('-')[0]
            continue
        game_data.dates.append(game_date)
        game_data.games.append(row[2])
        game_data.scores.append(row[3])
        game_data.home_odds.append(row[4])
        game_data.draw_odds.append(row[5])
        game_data.away_odds.append(row[6])

    return game_data


urls = {"https://www.oddsportal.com/soccer/australia/a-league/results/",
"https://www.oddsportal.com/soccer/europe/champions-league/results/",
"https://www.oddsportal.com/soccer/europe/europa-league/results/"}

if __name__ == '__main__':

    results = None

    for url in urls:
        game_data = parse_data(url)
        result = pd.DataFrame(game_data.__dict__)
        if results is None:
            results = result
        else:
            results = results.append(result, ignore_index=True)

 

The output is in the format:

|    |   Unnamed: 0 | dates       | games                    | scores   |   home_odds |   draw_odds |   away_odds |
|----|--------------|-------------|--------------------------|----------|-------------|-------------|-------------|
|  0 |            0 | 24 Feb 2018 | Slovacko - Sparta Prague | 1:1      |        4.27 |        3.14 |        1.93 |
|  1 |            1 | 24 Feb 2018 | Brno - Sigma Olomouc     | 1:0      |        2.93 |        3.14 |        2.45 |
|  2 |            2 | 24 Feb 2018 | Liberec - Mlada Boleslav | 1:0      |        1.91 |        3.46 |        3.89 |
|  3 |            3 | 23 Feb 2018 | Dukla Prague - Jablonec  | 0:1      |        2.65 |        3.25 |        2.6  |
|  4 |            4 | 18 Feb 2018 | Sparta Prague - Liberec  | 2:0      |        1.51 |        3.86 |        6.67 |

How can I add the "Country" and "Competition" to the output column?

The inspect element has the league information but I am unsure how to get it.

 

Inspect ElementInspect Element

0 REPLIES 0
Labels