Quick way to create a pivot-table with totals from 2 dimensional data-points (part-id, year, units)?

I read out a set of data from an SQL table using pypyodbc... The result from cur.fetchall() is this :

sqldata=[
('part00', '2023', 15), ('part01', '2023', 4), ('part02', '2023', 1), ('part00', '2022', 72), 
('part03', '2022', 48), ('part04', '2022', 22), ('part05', '2022', 16), ('part06', '2022', 16), 
('part07', '2022', 15), ('part08', '2022', 12), ('part09', '2022', 6), ('part10', '2022', 4), 
('part01', '2022', 4), ('part11', '2022', 3), ('part12', '2022', 3), ('part13', '2022', 2), 
('part14', '2022', 1), ('part15', '2022', 1), ('part02', '2022', 1), ('part16', '2022', 1), 
('part04', '2021', 35), ('part00', '2021', 20), ('part08', '2021', 18), ('part05', '2021', 3), 
('part07', '2021', 3), ('part09', '2021', 1), ('part16', '2021', 1), ('part03', '2021', 1), 
('part04', '2020', 43), ('part08', '2020', 15), ('part10', '2020', 9), ('part00', '2020', 9), 
('part07', '2020', 6), ('part05', '2020', 4), ('part17', '2020', 4), ('part03', '2020', 4), 
('part18', '2020', 3), ('part19', '2020', 2), ('part16', '2020', 2), ('part12', '2020', 2), 
('part20', '2020', 1), ('part09', '2020', 1), ('part21', '2020', 1), ('part00', '2019', 37), 
('part04', '2019', 18), ('part05', '2019', 8), ('part11', '2019', 5), ('part10', '2019', 4), 
('part22', '2019', 4), ('part16', '2019', 4), ('part23', '2019', 3), ('part03', '2019', 3), 
('part24', '2019', 3), ('part25', '2019', 2), ('part09', '2019', 2), ('part21', '2019', 2), 
('part20', '2019', 1), ('part13', '2019', 1), ('part14', '2019', 1), ('part19', '2019', 1), 
('part12', '2019', 1), ('part01', '2019', 1), ('part26', '2019', 1), ('part15', '2019', 1), 
('part04', '2018', 47), ('part00', '2018', 27), ('part13', '2018', 10), ('part12', '2018', 6), 
('part07', '2018', 4), ('part21', '2018', 3), ('part09', '2018', 2), ('part05', '2018', 2), 
('part15', '2018', 1), ('part04', '2017', 37), ('part00', '2017', 17), ('part12', '2017', 15), 
('part13', '2017', 7), ('part19', '2017', 6), ('part25', '2017', 2), ('part07', '2017', 2), 
('part03', '2017', 2), ('part27', '2017', 2), ('part21', '2017', 1), ('part09', '2017', 1), 
('part28', '2017', 1), ('part04', '2016', 49), ('part00', '2016', 29), ('part12', '2016', 12), 
('part02', '2016', 5), ('part15', '2016', 2), ('part24', '2016', 2), ('part08', '2016', 2), 
('part29', '2016', 1), ('part01', '2016', 1), ('part21', '2016', 1), ('part13', '2016', 1), 
('part28', '2016', 1), ('part25', '2016', 1), ('part04', '2015', 30), ('part00', '2015', 15), 
('part15', '2015', 8), ('part28', '2015', 5), ('part09', '2015', 4), ('part12', '2015', 2), 
('part02', '2015', 2), ('part01', '2015', 1), ('part13', '2015', 1), ('part29', '2015', 1), 
('part00', '2014', 6), ('part12', '2014', 2), ('part28', '2014', 1)
]

Based on this, I currently have my second attempt (a lot faster than the first) to create a table on the data.
This time by first creating "order" of the above - using a loop - inserting the data into a numpy array, then looping through the array to get:

          2023  2022  2021  2020  2019  2018  2017  2016  2015
Part00:      -    72    20     9    37    27    17    29    15  . 226
Part01:     15     4     -     -     1     -     -     1     1  . 22
Part02:      4     1     -     -     -     -     -     5     2  . 12
Part03:      1     -     1     4     3     -     2     -     -  . 11
Part04:      -    48    35    43    18    47    37    49    30  . 307
Part05:      -    22     3     4     8     2     -     -     -  . 39
Part06:      -    16     -     -     -     -     -     -     -  . 16
Part07:      -    16     3     6     -     4     2     -     -  . 31
Part08:      -    15    18    15     -     -     -     2     -  . 50
Part09:      -    12     1     1     2     2     1     -     4  . 23
Part10:      -     6     -     9     4     -     -     -     -  . 19
Part11:      -     4     -     -     5     -     -     -     -  . 9
...
Total: 908

Not all ^^^ shown as The S.O. Editor somehow barfed on that portion (WHY?!)

... but I still feel that my code is "non-pythonic"; as it essentially is looping over the the data items several times... might there be a more compact way to actually create the above table,

Part of the problem is sorting: so additionally have it EASILY sorted by the numbers; most basically by the right hand side sums, *or by some other "smart" order, even parametrically selectable in the end - that might display it in a more "overview-able" manner?

Might it be numpy isn't the best suited tool?
Or is there a quicker way to insert the data?

Maybe pandas is easier, more convenient?
I have limited experience with either of these.

So.

Avoiding looping, a more pythonic way to create a python data structure of the source data?
Given 1: Simple means to display the table, sorted and summed in a reconfigurable way - e.g. avoiding loops, with the least possible code.

I'm currently working on this using pandas, a simple

import pandas as pd
df=pd.DataFrame(sqldata)
print(df)

is a good start i believe.

From here rearranging the data in the df DataFrame (to get the table above) seems doable.

Working on it...

... so a list of parts and years; do I REALLY have to create a double-loop based on these?

partslist=sorted(list(set(df[0])))
print(partslist)

years=sorted(list(set(df[1])))
print(years)

... I imagine there is some slice-based transformation in pandas that will create the above "Matrix". Then; allowing that matrix to be manipulated based on parameters e.g. given on the command line at start.

Reading https://pandas.pydata.org/docs/user_guide/indexing.html, but cannot see the right tree because of the woods...

source https://stackoverflow.com/questions/75175477/quick-way-to-create-a-pivot-table-with-totals-from-2-dimensional-data-points-pa

StacksPedia

Search This Blog

Quick way to create a pivot-table with totals from 2 dimensional data-points (part-id, year, units)?

Labels

Comments

Post a Comment

Popular posts from this blog

How to show number of registered users in Laravel based on usertype?

Why is my reports service not connecting?

ValueError: X has 10 features, but LinearRegression is expecting 1 features as input