So i want to get the text from a specific article(not only one) so heres the function therefore:
def get_article():
for url in get_href():
options = webdriver.ChromeOptions()
options.add_argument("--ignore-certificate-error")
options.add_argument("--ignore-ssl-errors")
service = Service(executable_path='chromedriver.exe')
driver = webdriver.Chrome(service=service, options=options)
driver.get(url)
time.sleep(4)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.minimize_window()
text1 = url.get('div.Paragraph__component > span')
print(text1)
The error i get is:
Traceback (most recent call last): File "c:\Users\user\Desktop\Informatik\Praktik\Projekte\Python\stiil_working_on\news_automation\try.py", line 116, in get_article() File "c:\Users\user\Desktop\Informatik\Praktik\Projekte\Python\stiil_working_on\news_automation\try.py", line 99, in get_article text1 = url.get('div.Paragraph__component > span') AttributeError: 'str' object has no attribute 'get'
What i want to do in this function is use the url got from get_href():
def get_href():
all_results = []
for h3 in soup.select('h3.cnn-search__result-headline > a'):
title = h3.text
url_ = h3.get('href')
abs_url = 'https:'+ url_
all_results.append(abs_url)
return all_results
and then open it up and webscrape the article text from it but its not wrking and i don't know how to figure it out. Someone know how to do it?
source https://stackoverflow.com/questions/71517696/webscraping-from-cnn-function-to-get-text-from-a-article-error-in-python
Comments
Post a Comment