Quick way to create a pivot-table with totals from 2 dimensional data-points (part-id, year, units)?
I read out a set of data from an SQL table using pypyodbc... The result from cur.fetchall() is this :
sqldata=[
('part00', '2023', 15), ('part01', '2023', 4), ('part02', '2023', 1), ('part00', '2022', 72),
('part03', '2022', 48), ('part04', '2022', 22), ('part05', '2022', 16), ('part06', '2022', 16),
('part07', '2022', 15), ('part08', '2022', 12), ('part09', '2022', 6), ('part10', '2022', 4),
('part01', '2022', 4), ('part11', '2022', 3), ('part12', '2022', 3), ('part13', '2022', 2),
('part14', '2022', 1), ('part15', '2022', 1), ('part02', '2022', 1), ('part16', '2022', 1),
('part04', '2021', 35), ('part00', '2021', 20), ('part08', '2021', 18), ('part05', '2021', 3),
('part07', '2021', 3), ('part09', '2021', 1), ('part16', '2021', 1), ('part03', '2021', 1),
('part04', '2020', 43), ('part08', '2020', 15), ('part10', '2020', 9), ('part00', '2020', 9),
('part07', '2020', 6), ('part05', '2020', 4), ('part17', '2020', 4), ('part03', '2020', 4),
('part18', '2020', 3), ('part19', '2020', 2), ('part16', '2020', 2), ('part12', '2020', 2),
('part20', '2020', 1), ('part09', '2020', 1), ('part21', '2020', 1), ('part00', '2019', 37),
('part04', '2019', 18), ('part05', '2019', 8), ('part11', '2019', 5), ('part10', '2019', 4),
('part22', '2019', 4), ('part16', '2019', 4), ('part23', '2019', 3), ('part03', '2019', 3),
('part24', '2019', 3), ('part25', '2019', 2), ('part09', '2019', 2), ('part21', '2019', 2),
('part20', '2019', 1), ('part13', '2019', 1), ('part14', '2019', 1), ('part19', '2019', 1),
('part12', '2019', 1), ('part01', '2019', 1), ('part26', '2019', 1), ('part15', '2019', 1),
('part04', '2018', 47), ('part00', '2018', 27), ('part13', '2018', 10), ('part12', '2018', 6),
('part07', '2018', 4), ('part21', '2018', 3), ('part09', '2018', 2), ('part05', '2018', 2),
('part15', '2018', 1), ('part04', '2017', 37), ('part00', '2017', 17), ('part12', '2017', 15),
('part13', '2017', 7), ('part19', '2017', 6), ('part25', '2017', 2), ('part07', '2017', 2),
('part03', '2017', 2), ('part27', '2017', 2), ('part21', '2017', 1), ('part09', '2017', 1),
('part28', '2017', 1), ('part04', '2016', 49), ('part00', '2016', 29), ('part12', '2016', 12),
('part02', '2016', 5), ('part15', '2016', 2), ('part24', '2016', 2), ('part08', '2016', 2),
('part29', '2016', 1), ('part01', '2016', 1), ('part21', '2016', 1), ('part13', '2016', 1),
('part28', '2016', 1), ('part25', '2016', 1), ('part04', '2015', 30), ('part00', '2015', 15),
('part15', '2015', 8), ('part28', '2015', 5), ('part09', '2015', 4), ('part12', '2015', 2),
('part02', '2015', 2), ('part01', '2015', 1), ('part13', '2015', 1), ('part29', '2015', 1),
('part00', '2014', 6), ('part12', '2014', 2), ('part28', '2014', 1)
]
Based on this, I currently have my second attempt (a lot faster than the first) to create a table on the data.
This time by first creating "order" of the above - using a loop - inserting the data into a numpy array, then looping through the array to get:
2023 2022 2021 2020 2019 2018 2017 2016 2015
Part00: - 72 20 9 37 27 17 29 15 . 226
Part01: 15 4 - - 1 - - 1 1 . 22
Part02: 4 1 - - - - - 5 2 . 12
Part03: 1 - 1 4 3 - 2 - - . 11
Part04: - 48 35 43 18 47 37 49 30 . 307
Part05: - 22 3 4 8 2 - - - . 39
Part06: - 16 - - - - - - - . 16
Part07: - 16 3 6 - 4 2 - - . 31
Part08: - 15 18 15 - - - 2 - . 50
Part09: - 12 1 1 2 2 1 - 4 . 23
Part10: - 6 - 9 4 - - - - . 19
Part11: - 4 - - 5 - - - - . 9
...
Total: 908
Not all ^^^ shown as The S.O. Editor somehow barfed on that portion (WHY?!)
... but I still feel that my code is "non-pythonic"; as it essentially is looping over the the data items several times... might there be a more compact way to actually create the above table,
Part of the problem is sorting: so additionally have it EASILY sorted by the numbers; most basically by the right hand side sums, *or by some other "smart" order, even parametrically selectable in the end - that might display it in a more "overview-able" manner?
Might it be numpy isn't the best suited tool?
Or is there a quicker way to insert the data?
Maybe pandas is easier, more convenient?
I have limited experience with either of these.
So.
- Avoiding looping, a more
pythonicway to create a python data structure of the source data? - Given 1: Simple means to display the table, sorted and summed in a reconfigurable way - e.g. avoiding loops, with the least possible code.
I'm currently working on this using pandas, a simple
import pandas as pd
df=pd.DataFrame(sqldata)
print(df)
is a good start i believe.
From here rearranging the data in the df DataFrame (to get the table above) seems doable.
Working on it...
... so a list of parts and years; do I REALLY have to create a double-loop based on these?
partslist=sorted(list(set(df[0])))
print(partslist)
years=sorted(list(set(df[1])))
print(years)
... I imagine there is some slice-based transformation in pandas that will create the above "Matrix". Then; allowing that matrix to be manipulated based on parameters e.g. given on the command line at start.
Reading https://pandas.pydata.org/docs/user_guide/indexing.html, but cannot see the right tree because of the woods...
source https://stackoverflow.com/questions/75175477/quick-way-to-create-a-pivot-table-with-totals-from-2-dimensional-data-points-pa
Comments
Post a Comment