Skip to main content

How do I find a tag using BeautifulSoup?

I am trying to scrape a German Billiard League website for results, to tabulate and submit to a rating system that requires a certain format. I am no python expert, but I have muddled through how to pull a complete list of links from the root League page, and to get Date, Home/Visitor teams, and now I am trying to capture individual match data.

Here is the relevant HTML:

<tr>
<td colspan="3" nowrap="" rowspan="2" width="100"><b>
                                Spiel 2<br/>8-Ball                      </b>
</td>
<td class="home up" colspan="6" valign="top">Christian Fachinger</td>
<td class="visitor up" colspan="7" valign="top">Michael Schneider</td>
</tr>
<tr>
<td class="home down" colspan="6" valign="top">7</td>
<td class="visitor down" colspan="7" valign="top">4</td>

Site: https://hbu.billardarea.de/cms_leagues/matchday/344947

I am trying to find the "td" tag that contains the text string "Spiel 2". I should then be able to pull the game - "8-ball", and then move on to figuring out how to capture the data inside the relevant "class" tags. For the life of me, I cannot get a result back. I have tried many permutations of various soup commands, but either get a "None", or "[]". I "think" it might have something to do with the extra spaces, but have tried various regex-centric commands, but have not been able to "select" this td tag to do further data gathering.

What am I doing wrong? I know I am not coding this in the most efficient manner, and this is the first time I have tried to write a web scraper, and in general, am a python newb.

'''

import requests
import re
import os
from bs4 import BeautifulSoup

URL = "https://hbu.billardarea.de/cms_leagues/plan/7870/10406"

def import_all_links():
    page = requests.get(URL).text
    soup = BeautifulSoup(page, "html.parser")
    path = soup.select("a[href*=matchday]")

    for link in path:
        file1 = open("league.txt", "a")  # append mode
        file1.write("https://hbu.billardarea.de" + link['href'] + '\n')
        file1.close()

def get_date():
    links_file = open(r'C:\Users\Russ\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Python 3.10\league.txt', "r") 
    for day_link in links_file:
        day_link = day_link.rstrip("\n")
        soup = requests.get(day_link).text
        day_links_parse = BeautifulSoup(soup, "html.parser")
        date = day_links_parse.select('label:contains(Datum)')
        league = day_links_parse.select('label:contains(Saison)')
        home = day_links_parse.find(attrs={"class": "home"}).text
        home = home.partition(":")[2]
        visitor = day_links_parse.find(attrs={"class": "visitor"}).text
        visitor = visitor.partition(":")[2]
        print(day_links_parse)
        **play_table = day_links_parse.td.find_all(text = re.compile('Spiel 2'))**  <<<<< Issue
        **print(play_table)**                                                 <<<<< Returns 0 results

        for item in date:
            date = item.next_sibling.next_sibling.text
            date = date.partition(" ")[0]
            date = date.split(".")
            date = date[1] + "\\" + date[0] + "\\" + date[2]
        for item in league:
            league = item.next_sibling.next_sibling.text
            league = league.partition(" ")[0]

        print(date, ",", league, ",", home, " (H) vs ", visitor, "(V)", sep='')

import_all_links()
get_date() '''


source https://stackoverflow.com/questions/70475129/how-do-i-find-a-tag-using-beautifulsoup

Comments

Popular posts from this blog

ValueError: X has 10 features, but LinearRegression is expecting 1 features as input

So, I am trying to predict the model but its throwing error like it has 10 features but it expacts only 1. So I am confused can anyone help me with it? more importantly its not working for me when my friend runs it. It works perfectly fine dose anyone know the reason about it? cv = KFold(n_splits = 10) all_loss = [] for i in range(9): # 1st for loop over polynomial orders poly_order = i X_train = make_polynomial(x, poly_order) loss_at_order = [] # initiate a set to collect loss for CV for train_index, test_index in cv.split(X_train): print('TRAIN:', train_index, 'TEST:', test_index) X_train_cv, X_test_cv = X_train[train_index], X_test[test_index] t_train_cv, t_test_cv = t[train_index], t[test_index] reg.fit(X_train_cv, t_train_cv) loss_at_order.append(np.mean((t_test_cv - reg.predict(X_test_cv))**2)) # collect loss at fold all_loss.append(np.mean(loss_at_order)) # collect loss at order plt.plot(np.log(al...

Sorting large arrays of big numeric stings

I was solving bigSorting() problem from hackerrank: Consider an array of numeric strings where each string is a positive number with anywhere from to digits. Sort the array's elements in non-decreasing, or ascending order of their integer values and return the sorted array. I know it works as follows: def bigSorting(unsorted): return sorted(unsorted, key=int) But I didnt guess this approach earlier. Initially I tried below: def bigSorting(unsorted): int_unsorted = [int(i) for i in unsorted] int_sorted = sorted(int_unsorted) return [str(i) for i in int_sorted] However, for some of the test cases, it was showing time limit exceeded. Why is it so? PS: I dont know exactly what those test cases were as hacker rank does not reveal all test cases. source https://stackoverflow.com/questions/73007397/sorting-large-arrays-of-big-numeric-stings

How to load Javascript with imported modules?

I am trying to import modules from tensorflowjs, and below is my code. test.html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Document</title </head> <body> <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script> <script type="module" src="./test.js"></script> </body> </html> test.js import * as tf from "./node_modules/@tensorflow/tfjs"; import {loadGraphModel} from "./node_modules/@tensorflow/tfjs-converter"; const MODEL_URL = './model.json'; const model = await loadGraphModel(MODEL_URL); const cat = document.getElementById('cat'); model.execute(tf.browser.fromPixels(cat)); Besides, I run the server using python -m http.server in my command prompt(Windows 10), and this is the error prompt in the console log of my browser: Failed to loa...