Skip to main content

scraping a URL address for reviews

So I need to extract the reviews from the URL of a product on this site, more specifically the username, date, text, and score. However, I have some issues with it because I keep getting an error: failed to retrieve reviews for page 1. Error: "Connection broken: InvalidChunkLength(got length b'', 0 bytes read)"; "InvalidChunkLength(got length b'', 0 bytes read)"; I tried adding a time delay but it still doesn't work. How can I modify this?

import json
import requests
from bs4 import BeautifulSoup

url = "https://www.emag.ro/covor-antiderapant-negru-poliester-80-x-300-cm-c027-80x300/pd/DBY5YJMBM/?ref=sponsored_products_fill_a_b_5_3&provider=rec&recid=rec_73_c449bb3e50b63cc8f6da4a42a31af359f6cbfb3c547bc5748cb6d45501a29685_1684315709&scenario_ID=73&aid=034a897a-956c-11ed-9004-0ab644dfda7c&oid=89847310"

review_url = "https://www.emag.ro/review/get-review-listing-page?id={product_id}&page={page}"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/112.0'}


product_id = url.split("/pd/")[1].split("/")[0]

reviews = []

page = 1
while True:
    r_url = review_url.format(product_id=product_id, page=page)
    try:
        response = requests.get(r_url, headers=headers)
        response.raise_for_status()  
        data = response.json()
    except (requests.RequestException, json.JSONDecodeError) as e:
        print(f"Failed to retrieve reviews for page {page}. Error: {str(e)}")
        break

    if not data['reviews']:
        break

    for r in data['reviews']:
        review_text = r['content']
        author = r['author']['name']
        date = r['date']
        score = r['rating']
        reviews.append({"author": author, "date": date, "review_text": review_text, "score": score})

    page += 1

with open('reviews.json', 'w') as f:
    json.dump(reviews, f, indent=4)


source https://stackoverflow.com/questions/76276163/scraping-a-url-address-for-reviews

Comments

Popular posts from this blog

Prop `className` did not match in next js app

I have written a sample code ( Github Link here ). this is a simple next js app, but giving me error when I refresh the page. This seems to be the common problem and I tried the fix provided in the internet but does not seem to fix my issue. The error is Warning: Prop className did not match. Server: "MuiBox-root MuiBox-root-1" Client: "MuiBox-root MuiBox-root-2". Did changes for _document.js, modified _app.js as mentioned in official website and solutions in stackoverflow. but nothing seems to work. Could someone take a look and help me whats wrong with the code? Via Active questions tagged javascript - Stack Overflow https://ift.tt/2FdjaAW

How to show number of registered users in Laravel based on usertype?

i'm trying to display data from the database in the admin dashboard i used this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count(); echo $users; ?> and i have successfully get the correct data from the database but what if i want to display a specific data for example in this user table there is "usertype" that specify if the user is normal user or admin i want to user the same code above but to display a specific usertype i tried this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count()->WHERE usertype =admin; echo $users; ?> but it didn't work, what am i doing wrong? source https://stackoverflow.com/questions/68199726/how-to-show-number-of-registered-users-in-laravel-based-on-usertype

Why is my reports service not connecting?

I am trying to pull some data from a Postgres database using Node.js and node-postures but I can't figure out why my service isn't connecting. my routes/index.js file: const express = require('express'); const router = express.Router(); const ordersCountController = require('../controllers/ordersCountController'); const ordersController = require('../controllers/ordersController'); const weeklyReportsController = require('../controllers/weeklyReportsController'); router.get('/orders_count', ordersCountController); router.get('/orders', ordersController); router.get('/weekly_reports', weeklyReportsController); module.exports = router; My controllers/weeklyReportsController.js file: const weeklyReportsService = require('../services/weeklyReportsService'); const weeklyReportsController = async (req, res) => { try { const data = await weeklyReportsService; res.json({data}) console