Encoding Multiple Categorical Data with Python using sklearn.preprocessing.LabelEncoder() takes too much processing time on 2D array inputs
Consider for some reason I am trying to encode a feature. Let's say my feature name is title. For the title feature, for one record I might have different words: title = 'Apple', 'Jobs'. Let me illustrate:
ID title
0 ['Apple', 'Jobs']
1 ['Wozniak']
2 ['Apple', 'Wozniak']
3 ['Jobs', 'Wozniak']
As you could see my unique values are :
unique = ['Apple','Jobs','Wozniak']
And previously I was using label encoder as:
from sklearn.preprocessing import LabelEncoder
le.fit(unique)
for i in df['title'].index:
df['title'][i] = le.transform(df['title'][i])
And I used to get something like:
ID title
0 [782, 256]
1 [331]
2 [782, 331]
3 [256, 331]
which was exactly what I wanted; yet, this takes too much time because I have too many values to iterate and encode. Thus, I am looking for an algorithm that is smarter and preferably with a lower time complexity or smaller running time.
source https://stackoverflow.com/questions/74843758/encoding-multiple-categorical-data-with-python-using-sklearn-preprocessing-label
Comments
Post a Comment