Skip to main content

Fastest way to find the size of the union of two lists/sets? Coding Efficiency

I am conducting research on the relationships between different communities on Reddit. As part of my data research I have about 49,000 CSV's for sub I scraped and of all the posts I could get I got each commentator and thier total karma and number of comments (see pic for format) Each CSV contains all the commentors I have collected for that individual sub.

I want to take each sub and then compare it to another sub and identify how many users they have in common. I dont need a list of what users they are just the amount the two have in common. I also want to set a karma threshold for the code I currently have it is set to more than 30 karma.

For the code I have I prepare two lists of users who are over the threshold then I convert both those lists to sets and then "&" them together. Here is my code:   

Since I am re-reading the same CSVs over and over again to compare to each other can I prepare them to make this more efficient, sort user names alphabetically? I plan on re-doing all the CSVs  to get rid of the "removal users" which are all bots beforehand, should I sort by karma instead of users so rather than comparing if karma is above a certain level it only reads up to a certain point?Memory usage is not a problem only speed. I have billions of comparisons to make so any reduction in time is appreciated.  

remove_list = dt.fread('D:/Spring 2023/red/c7/removal_users.csv')
drops = remove_list['C0'].to_list()[0]
def get_scores(current):
    sub = dt.fread(current)
    sub = pd.read_csv(current, sep=',', header =None)
    current = get_name(current)
    sub.columns = ['user','score']
    sub = sub[~sub['user'].isin(drops)]
    sub = sub[sub['score'] > 30]
    names =list(sub['user'])
    return names
def get_name(current):
    current = current.split('/')[-1].split('.')[-2]
    return current
def common_users(list1,list2):
    return len(set(list1) & set(list2))


source https://stackoverflow.com/questions/76576093/fastest-way-to-find-the-size-of-the-union-of-two-lists-sets-coding-efficiency

Comments

Popular posts from this blog

Confusion between commands.Bot and discord.Client | Which one should I use?

Whenever you look at YouTube tutorials or code from this website there is a real variation. Some developers use client = discord.Client(intents=intents) while the others use bot = commands.Bot(command_prefix="something", intents=intents) . Now I know slightly about the difference but I get errors from different places from my code when I use either of them and its confusing. Especially since there has a few changes over the years in discord.py it is hard to find the real difference. I tried sticking to discord.Client then I found that there are more features in commands.Bot . Then I found errors when using commands.Bot . An example of this is: When I try to use commands.Bot client = commands.Bot(command_prefix=">",intents=intents) async def load(): for filename in os.listdir("./Cogs"): if filename.endswith(".py"): client.load_extension(f"Cogs.{filename[:-3]}") The above doesnt giveany response from my Cogs ...

How to show number of registered users in Laravel based on usertype?

i'm trying to display data from the database in the admin dashboard i used this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count(); echo $users; ?> and i have successfully get the correct data from the database but what if i want to display a specific data for example in this user table there is "usertype" that specify if the user is normal user or admin i want to user the same code above but to display a specific usertype i tried this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count()->WHERE usertype =admin; echo $users; ?> but it didn't work, what am i doing wrong? source https://stackoverflow.com/questions/68199726/how-to-show-number-of-registered-users-in-laravel-based-on-usertype

Why is my reports service not connecting?

I am trying to pull some data from a Postgres database using Node.js and node-postures but I can't figure out why my service isn't connecting. my routes/index.js file: const express = require('express'); const router = express.Router(); const ordersCountController = require('../controllers/ordersCountController'); const ordersController = require('../controllers/ordersController'); const weeklyReportsController = require('../controllers/weeklyReportsController'); router.get('/orders_count', ordersCountController); router.get('/orders', ordersController); router.get('/weekly_reports', weeklyReportsController); module.exports = router; My controllers/weeklyReportsController.js file: const weeklyReportsService = require('../services/weeklyReportsService'); const weeklyReportsController = async (req, res) => { try { const data = await weeklyReportsService; res.json({data}) console...