Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

How do I add "Country" & "Competition" column to the output (Python Question)

HW1
9 - Comet

I have a code that scrapes OddsPortal Using Selenium

 

from selenium import webdriver
import pandas as pd

browser = webdriver.Chrome()

class GameData:

    def __init__(self):
        self.dates = []
        self.games = []
        self.scores = []
        self.home_odds = []
        self.draw_odds = []
        self.away_odds = []


def parse_data(url):
    browser.get(url)
    df = pd.read_html(browser.page_source, header=0)[0]
    game_data = GameData()
    game_date = None
    for row in df.itertuples():
        if not isinstance(row[1], str):
            continue
        elif ':' not in row[1]:
            game_date = row[1].split('-')[0]
            continue
        game_data.dates.append(game_date)
        game_data.games.append(row[2])
        game_data.scores.append(row[3])
        game_data.home_odds.append(row[4])
        game_data.draw_odds.append(row[5])
        game_data.away_odds.append(row[6])

    return game_data


urls = {"https://www.oddsportal.com/soccer/australia/a-league/results/",
"https://www.oddsportal.com/soccer/europe/champions-league/results/",
"https://www.oddsportal.com/soccer/europe/europa-league/results/"}

if __name__ == '__main__':

    results = None

    for url in urls:
        game_data = parse_data(url)
        result = pd.DataFrame(game_data.__dict__)
        if results is None:
            results = result
        else:
            results = results.append(result, ignore_index=True)

 

The output is in the format:

|    |   Unnamed: 0 | dates       | games                    | scores   |   home_odds |   draw_odds |   away_odds |
|----|--------------|-------------|--------------------------|----------|-------------|-------------|-------------|
|  0 |            0 | 24 Feb 2018 | Slovacko - Sparta Prague | 1:1      |        4.27 |        3.14 |        1.93 |
|  1 |            1 | 24 Feb 2018 | Brno - Sigma Olomouc     | 1:0      |        2.93 |        3.14 |        2.45 |
|  2 |            2 | 24 Feb 2018 | Liberec - Mlada Boleslav | 1:0      |        1.91 |        3.46 |        3.89 |
|  3 |            3 | 23 Feb 2018 | Dukla Prague - Jablonec  | 0:1      |        2.65 |        3.25 |        2.6  |
|  4 |            4 | 18 Feb 2018 | Sparta Prague - Liberec  | 2:0      |        1.51 |        3.86 |        6.67 |

How can I add the "Country" and "Competition" to the output column?

The inspect element has the league information but I am unsure how to get it.

 

MDuYp.png

 

Also, Is there any way I can define the "Competition" as per the URL? The URL has the "Country" and the "Competition" but I am too new to this to make the best of the information available

 

1 REPLY 1
pedrodrfaria
13 - Pulsar

Hi @HW1 

 

You can locate both the country as well as the competition using the Full XPATH.

 

Use the Find Element By XPATH function and you can bring that info as well

 

XPATH for Country:

/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[3]/div[2]/div/div[1]/div/h2/span 

 

XPATH for Competition:

/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[2]/div[1]/h1

 

Work that into Python and you can append that to your table as an output.

Labels
Top Solution Authors