Quick way to create a pivot-table with totals from 2 dimensional data-points (part-id, year, units)?
I read out a set of data from an SQL table using pypyodbc
... The result from cur.fetchall()
is this :
sqldata=[ ('part00', '2023', 15), ('part01', '2023', 4), ('part02', '2023', 1), ('part00', '2022', 72), ('part03', '2022', 48), ('part04', '2022', 22), ('part05', '2022', 16), ('part06', '2022', 16), ('part07', '2022', 15), ('part08', '2022', 12), ('part09', '2022', 6), ('part10', '2022', 4), ('part01', '2022', 4), ('part11', '2022', 3), ('part12', '2022', 3), ('part13', '2022', 2), ('part14', '2022', 1), ('part15', '2022', 1), ('part02', '2022', 1), ('part16', '2022', 1), ('part04', '2021', 35), ('part00', '2021', 20), ('part08', '2021', 18), ('part05', '2021', 3), ('part07', '2021', 3), ('part09', '2021', 1), ('part16', '2021', 1), ('part03', '2021', 1), ('part04', '2020', 43), ('part08', '2020', 15), ('part10', '2020', 9), ('part00', '2020', 9), ('part07', '2020', 6), ('part05', '2020', 4), ('part17', '2020', 4), ('part03', '2020', 4), ('part18', '2020', 3), ('part19', '2020', 2), ('part16', '2020', 2), ('part12', '2020', 2), ('part20', '2020', 1), ('part09', '2020', 1), ('part21', '2020', 1), ('part00', '2019', 37), ('part04', '2019', 18), ('part05', '2019', 8), ('part11', '2019', 5), ('part10', '2019', 4), ('part22', '2019', 4), ('part16', '2019', 4), ('part23', '2019', 3), ('part03', '2019', 3), ('part24', '2019', 3), ('part25', '2019', 2), ('part09', '2019', 2), ('part21', '2019', 2), ('part20', '2019', 1), ('part13', '2019', 1), ('part14', '2019', 1), ('part19', '2019', 1), ('part12', '2019', 1), ('part01', '2019', 1), ('part26', '2019', 1), ('part15', '2019', 1), ('part04', '2018', 47), ('part00', '2018', 27), ('part13', '2018', 10), ('part12', '2018', 6), ('part07', '2018', 4), ('part21', '2018', 3), ('part09', '2018', 2), ('part05', '2018', 2), ('part15', '2018', 1), ('part04', '2017', 37), ('part00', '2017', 17), ('part12', '2017', 15), ('part13', '2017', 7), ('part19', '2017', 6), ('part25', '2017', 2), ('part07', '2017', 2), ('part03', '2017', 2), ('part27', '2017', 2), ('part21', '2017', 1), ('part09', '2017', 1), ('part28', '2017', 1), ('part04', '2016', 49), ('part00', '2016', 29), ('part12', '2016', 12), ('part02', '2016', 5), ('part15', '2016', 2), ('part24', '2016', 2), ('part08', '2016', 2), ('part29', '2016', 1), ('part01', '2016', 1), ('part21', '2016', 1), ('part13', '2016', 1), ('part28', '2016', 1), ('part25', '2016', 1), ('part04', '2015', 30), ('part00', '2015', 15), ('part15', '2015', 8), ('part28', '2015', 5), ('part09', '2015', 4), ('part12', '2015', 2), ('part02', '2015', 2), ('part01', '2015', 1), ('part13', '2015', 1), ('part29', '2015', 1), ('part00', '2014', 6), ('part12', '2014', 2), ('part28', '2014', 1) ]
Based on this, I currently have my second attempt (a lot faster than the first) to create a table on the data.
This time by first creating "order" of the above - using a loop - inserting the data into a numpy array, then looping through the array to get:
2023 2022 2021 2020 2019 2018 2017 2016 2015 Part00: - 72 20 9 37 27 17 29 15 . 226 Part01: 15 4 - - 1 - - 1 1 . 22 Part02: 4 1 - - - - - 5 2 . 12 Part03: 1 - 1 4 3 - 2 - - . 11 Part04: - 48 35 43 18 47 37 49 30 . 307 Part05: - 22 3 4 8 2 - - - . 39 Part06: - 16 - - - - - - - . 16 Part07: - 16 3 6 - 4 2 - - . 31 Part08: - 15 18 15 - - - 2 - . 50 Part09: - 12 1 1 2 2 1 - 4 . 23 Part10: - 6 - 9 4 - - - - . 19 Part11: - 4 - - 5 - - - - . 9 ... Total: 908
Not all ^^^ shown as The S.O. Editor somehow barfed on that portion (WHY?!)
... but I still feel that my code is "non-pythonic"; as it essentially is looping over the the data items several times... might there be a more compact way to actually create the above table,
Part of the problem is sorting: so additionally have it EASILY sorted by the numbers; most basically by the right hand side sums, *or by some other "smart" order, even parametrically selectable in the end - that might display it in a more "overview-able" manner?
Might it be numpy
isn't the best suited tool?
Or is there a quicker way to insert the data?
Maybe pandas
is easier, more convenient?
I have limited experience with either of these.
So.
- Avoiding looping, a more
pythonic
way to create a python data structure of the source data? - Given 1: Simple means to display the table, sorted and summed in a reconfigurable way - e.g. avoiding loops, with the least possible code.
I'm currently working on this using pandas
, a simple
import pandas as pd
df=pd.DataFrame(sqldata)
print(df)
is a good start i believe.
From here rearranging the data in the df
DataFrame (to get the table above) seems doable.
Working on it...
... so a list of parts and years; do I REALLY have to create a double-loop based on these?
partslist=sorted(list(set(df[0])))
print(partslist)
years=sorted(list(set(df[1])))
print(years)
... I imagine there is some slice-based transformation in pandas that will create the above "Matrix". Then; allowing that matrix to be manipulated based on parameters e.g. given on the command line at start.
Reading https://pandas.pydata.org/docs/user_guide/indexing.html, but cannot see the right tree because of the woods...
source https://stackoverflow.com/questions/75175477/quick-way-to-create-a-pivot-table-with-totals-from-2-dimensional-data-points-pa
Comments
Post a Comment