i want to scrap product images from https://society6.com/art/i-already-want-to-take-a-nap-tomorrow-pink of each product >
step=1 first i go in div', class_='card_card__l44w (which is having each product link) step=2 then parse the href of each product >
but its getting back only first 15 product link inspite of all 44
============================== second thing is when i parse each product link and then grab json from there ['product']['response']['product']['data']['attributes']['media_map']
after media_map key there are many other keys like b , c , d , e , f , g (all having src: in it with the image link i only want to parse .jpg image from every key) below is my code
import requests
import json
from bs4 import BeautifulSoup
import pandas as pd
baseurl = 'https://society6.com/'
headers = {
"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'
}
r = requests.get('https://society6.com/art/flamingo-cone501586', headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
productslist = soup.find_all('div', class_='card_card__l44w')
productlinks = []
for item in productslist:
for link in item.find_all('a', href=True):
productlinks.append(baseurl + link['href'])
newlist = []
for link in productlinks:
r = requests.get(link, headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
scripts = soup.find_all('script')[9].text.strip()[24:]
data = json.loads(scripts)
url = data['product']['response']['product']['data']['attributes']['media_map']
detail = {
'links' : url
}
newlist.append(detail)
print('saving')
df = pd.DataFrame(newlist)
df.to_csv('haja.csv')`
[1]: https://i.stack.imgur.com/qdhXP.png
source https://stackoverflow.com/questions/74137958/python-requests-for-load-more-data
Comments
Post a Comment