I have a code in python
from bs4 import BeautifulSoup
import requests
data0 = []
data1 = []
response = requests.get(
"https://www.comicshoplocator.com/StoreLocatorPremier?query=75077&showCsls=true"
)
soup = BeautifulSoup(response.text, "html.parser")
for tag in soup.find_all('div', class_="LocationName"):
title = tag.text
data0.append({
'title': title
})
for button in soup.find_all('div', class_="LocationDetails"):
for childdiv in button.find_all('div', class_="LocationShopProfile"):
for zb in childdiv.find_all('a'):
if zb.get_text() == 'Shop Profile':
website = zb.get('href')
forsite = requests.get('https://www.comicshoplocator.com/' + website)
soup = BeautifulSoup(forsite.text, "html.parser")
for tag in soup.find_all('div', class_="StoreWeb"):
site = tag.text.replace('Web: http://', '')
data7.append({
'site': site
})
df = pd.DataFrame(columns=['Name', 'Website'])
df[df.columns[0]] = pd.DataFrame(data0)
df[df.columns[1]] = pd.DataFrame(data1)
My print is:
Name Website
0 TWENTY ELEVEN COMICS WWW.TWENTYELEVENCOMICS.COM
1 READ COMICS www.boomerangcomics.com
2 BOOMERANG COMICS www.facebook.com/morefuncomics
3 MORE FUN COMICS AND GAMES www.madnesscomicsandgames.com
4 MADNESS COMICS & GAMES NaN
5 SANCTUARY BOOKS AND GAMES NaN
Correct print should be:
Name Website
0 TWENTY ELEVEN COMICS WWW.TWENTYELEVENCOMICS.COM
1 READ COMICS NaN
2 BOOMERANG COMICS www.boomerangcomics.com
3 MORE FUN COMICS AND GAMES www.facebook.com/morefuncomics
4 MADNESS COMICS & GAMES www.madnesscomicsandgames.com
5 SANCTUARY BOOKS AND GAMES NaN
Some stores may not have a "LocationShopProfile" or "StoreWeb" class. That is why second column have a wrong order
How can I fix that?
Thanks
source https://stackoverflow.com/questions/72999097/parse-data-with-no-class
Comments
Post a Comment