I have a code that scrapes OddsPortal Using Selenium
from selenium import webdriver
import pandas as pd
browser = webdriver.Chrome()
class GameData:
def __init__(self):
self.dates = []
self.games = []
self.scores = []
self.home_odds = []
self.draw_odds = []
self.away_odds = []
def parse_data(url):
browser.get(url)
df = pd.read_html(browser.page_source, header=0)[0]
game_data = GameData()
game_date = None
for row in df.itertuples():
if not isinstance(row[1], str):
continue
elif ':' not in row[1]:
game_date = row[1].split('-')[0]
continue
game_data.dates.append(game_date)
game_data.games.append(row[2])
game_data.scores.append(row[3])
game_data.home_odds.append(row[4])
game_data.draw_odds.append(row[5])
game_data.away_odds.append(row[6])
return game_data
urls = {"https://www.oddsportal.com/soccer/australia/a-league/results/",
"https://www.oddsportal.com/soccer/europe/champions-league/results/",
"https://www.oddsportal.com/soccer/europe/europa-league/results/"}
if __name__ == '__main__':
results = None
for url in urls:
game_data = parse_data(url)
result = pd.DataFrame(game_data.__dict__)
if results is None:
results = result
else:
results = results.append(result, ignore_index=True)
The output is in the format:
| | Unnamed: 0 | dates | games | scores | home_odds | draw_odds | away_odds |
|----|--------------|-------------|--------------------------|----------|-------------|-------------|-------------|
| 0 | 0 | 24 Feb 2018 | Slovacko - Sparta Prague | 1:1 | 4.27 | 3.14 | 1.93 |
| 1 | 1 | 24 Feb 2018 | Brno - Sigma Olomouc | 1:0 | 2.93 | 3.14 | 2.45 |
| 2 | 2 | 24 Feb 2018 | Liberec - Mlada Boleslav | 1:0 | 1.91 | 3.46 | 3.89 |
| 3 | 3 | 23 Feb 2018 | Dukla Prague - Jablonec | 0:1 | 2.65 | 3.25 | 2.6 |
| 4 | 4 | 18 Feb 2018 | Sparta Prague - Liberec | 2:0 | 1.51 | 3.86 | 6.67 |
How can I add the "Country" and "Competition" to the output column?
The inspect element has the league information but I am unsure how to get it.
Also, Is there any way I can define the "Competition" as per the URL? The URL has the "Country" and the "Competition" but I am too new to this to make the best of the information available
Hi @HW1
You can locate both the country as well as the competition using the Full XPATH.
Use the Find Element By XPATH function and you can bring that info as well
XPATH for Country:
/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[3]/div[2]/div/div[1]/div/h2/span
XPATH for Competition:
/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[2]/div[1]/h1
Work that into Python and you can append that to your table as an output.