This is not a designer question. Apologies for posting it here. I asked it in the Off-Topic forum but I did not get any response. I guess its not as visited as it is here.
I started with Alteryx Python tool however I find using pure python is easier and much more manageable in this case hence the detour.
I have a code that scrapes OddsPortal Using Selenium
from selenium import webdriver
import pandas as pd
browser = webdriver.Chrome()
class GameData:
def __init__(self):
self.dates = []
self.games = []
self.scores = []
self.home_odds = []
self.draw_odds = []
self.away_odds = []
def parse_data(url):
browser.get(url)
df = pd.read_html(browser.page_source, header=0)[0]
game_data = GameData()
game_date = None
for row in df.itertuples():
if not isinstance(row[1], str):
continue
elif ':' not in row[1]:
game_date = row[1].split('-')[0]
continue
game_data.dates.append(game_date)
game_data.games.append(row[2])
game_data.scores.append(row[3])
game_data.home_odds.append(row[4])
game_data.draw_odds.append(row[5])
game_data.away_odds.append(row[6])
return game_data
urls = {"https://www.oddsportal.com/soccer/australia/a-league/results/",
"https://www.oddsportal.com/soccer/europe/champions-league/results/",
"https://www.oddsportal.com/soccer/europe/europa-league/results/"}
if __name__ == '__main__':
results = None
for url in urls:
game_data = parse_data(url)
result = pd.DataFrame(game_data.__dict__)
if results is None:
results = result
else:
results = results.append(result, ignore_index=True)
The output is in the format:
| | Unnamed: 0 | dates | games | scores | home_odds | draw_odds | away_odds |
|----|--------------|-------------|--------------------------|----------|-------------|-------------|-------------|
| 0 | 0 | 24 Feb 2018 | Slovacko - Sparta Prague | 1:1 | 4.27 | 3.14 | 1.93 |
| 1 | 1 | 24 Feb 2018 | Brno - Sigma Olomouc | 1:0 | 2.93 | 3.14 | 2.45 |
| 2 | 2 | 24 Feb 2018 | Liberec - Mlada Boleslav | 1:0 | 1.91 | 3.46 | 3.89 |
| 3 | 3 | 23 Feb 2018 | Dukla Prague - Jablonec | 0:1 | 2.65 | 3.25 | 2.6 |
| 4 | 4 | 18 Feb 2018 | Sparta Prague - Liberec | 2:0 | 1.51 | 3.86 | 6.67 |
How can I add the "Country" and "Competition" to the output column?
The inspect element has the league information but I am unsure how to get it.