I have data that came from sys.stdout mapper.py program as follows:
input from stdout of previous mapper.py
chevy, {mod: spark | col: brown}
chevy, {mod: equinox | col: red}
honda, {mod:civic | col:black}
honda, {mod:accord | col:white}
honda, {mod:crv | col:pink}
honda, {mod:hrv | col:gray}
toyota, {mod:corola | col:white}
I would like to write a reducer.py or maybe even another mapper that takes this information and produces an output such as:
Expected output
chevy, {mod: spark | col: brown | total:2}
chevy, {mod: equinox | col: red | total:2}
honda, {mod:civic | col:black | total:4}
honda, {mod:accord | col:white | total:4}
honda, {mod:crv | col:pink | total:4}
honda, {mod:hrv | col:gray | total:4}
toyota, {mod:corola | col:white | total:1}
the total
is only for the keys (car brand), so chevy
appears twice, honda
appears 4 times, and toyota
1.
I have tried a reducer.py program and it did not work. The program I wrote looks like this:
curr_k = None
curr_v = None
k = None
curr_count = 0
for car in sys.stdin:
car_split = car.split('|')
k = car_split[0]
v = car_split[1]
if curr_k == k:
print(curr_k, curr_v, 'total:',curr_count)
curr_count += 1
else:
if curr_k:
print(curr_k, curr_v, 'total:',curr_count)
curr_k = k
curr_count = 1
if curr_k == k:
print(curr_k, curr_v, 'total:',curr_count)
The above code gave me the following answer:
chevy, {mod: spark | col: brown | total:1}
chevy, {mod: equinox | col: red | total:2}
honda, {mod:civic | col:black | total:1}
honda, {mod:accord | col:white | total:2}
honda, {mod:crv | col:pink | total:3}
honda, {mod:hrv | col:gray | total:4}
toyota, {mod:corola | col:white | total:1}
But that is not what I am looking for.
source https://stackoverflow.com/questions/74173010/adding-count-information-to-mapreduce-output
Comments
Post a Comment