Skip to main content

for loop for finding and storing words that is present in supplied word dataset returning name error

I have a list named 'result' as below

>>> result
    
[[['apple'],['banana'],['green','grapes'],nan],[['orange'],['hat'],['party','hat','2'],nan],[['blue'],['navy'],['red','t'],['angry']]]

and I'm using gensim to match the words in the pretrained word2vec model with the words I have and get corresponding vectors.

Given that the pretrained_model.key_to_index is structured as below, I used below code to store list of words within 'result' that is present in pretrained model named 'pretrained_model' and to filter the words that are not in pre trained model.

>>> pretrained_model.key_to_index
{'</s>': 0,
     'in': 1,
     'for': 2,
     'that': 3,
     'is': 4,
     'on': 5,
     '##': 6,
     'The': 7,
     'with': 8,
     'said': 9,
     'was': 10,
     'the': 11,
     'at': 12,
    ...}

   


    import gensim
    
    
    pretrained_model = gensim.models.KeyedVectors.load_word2vec_format('Downloads/GoogleNews-vectors-negative300.bin', binary=True) 

    vocabulary = pretrained_model.key_to_index
    
    len(vocabulary)
    3000000


    documents = []
    for x in result:
                document = [i for i in j for j in x if i in pretrained_model.key_to_index]
                documents.append(document)

now this documents have only those words which are present in pre trained model's vocab.

So the desired output documents might look like

 [[['apple'],['banana'],['green','grapes']],[['orange'],['hat'],['party','hat']],[['blue'],['navy'],['red','t'],['angry']]]

However above code returns NameError as below

NameError                                 Traceback (most recent call last)
/var/folders/jd/lh_mnln92n17ysb4p01g000gn/T/ipykernel_2855/2806541.py in <module>
      1 documents = []
      2 for x in result:
----> 3     document = [i for i in j for j in x if i in pretrained_model.key_to_index]
      4     documents.append(document)
      5 #now this document have only those words which are present in our model's vocab

NameError: name 'j' is not defined

Can anyone help on me this please? Any help would be greatly appreciated!



source https://stackoverflow.com/questions/73805534/for-loop-for-finding-and-storing-words-that-is-present-in-supplied-word-dataset

Comments