Skip to main content

Quick way to create a pivot-table with totals from 2 dimensional data-points (part-id, year, units)?

I read out a set of data from an SQL table using pypyodbc... The result from cur.fetchall() is this :

sqldata=[
('part00', '2023', 15), ('part01', '2023', 4), ('part02', '2023', 1), ('part00', '2022', 72), 
('part03', '2022', 48), ('part04', '2022', 22), ('part05', '2022', 16), ('part06', '2022', 16), 
('part07', '2022', 15), ('part08', '2022', 12), ('part09', '2022', 6), ('part10', '2022', 4), 
('part01', '2022', 4), ('part11', '2022', 3), ('part12', '2022', 3), ('part13', '2022', 2), 
('part14', '2022', 1), ('part15', '2022', 1), ('part02', '2022', 1), ('part16', '2022', 1), 
('part04', '2021', 35), ('part00', '2021', 20), ('part08', '2021', 18), ('part05', '2021', 3), 
('part07', '2021', 3), ('part09', '2021', 1), ('part16', '2021', 1), ('part03', '2021', 1), 
('part04', '2020', 43), ('part08', '2020', 15), ('part10', '2020', 9), ('part00', '2020', 9), 
('part07', '2020', 6), ('part05', '2020', 4), ('part17', '2020', 4), ('part03', '2020', 4), 
('part18', '2020', 3), ('part19', '2020', 2), ('part16', '2020', 2), ('part12', '2020', 2), 
('part20', '2020', 1), ('part09', '2020', 1), ('part21', '2020', 1), ('part00', '2019', 37), 
('part04', '2019', 18), ('part05', '2019', 8), ('part11', '2019', 5), ('part10', '2019', 4), 
('part22', '2019', 4), ('part16', '2019', 4), ('part23', '2019', 3), ('part03', '2019', 3), 
('part24', '2019', 3), ('part25', '2019', 2), ('part09', '2019', 2), ('part21', '2019', 2), 
('part20', '2019', 1), ('part13', '2019', 1), ('part14', '2019', 1), ('part19', '2019', 1), 
('part12', '2019', 1), ('part01', '2019', 1), ('part26', '2019', 1), ('part15', '2019', 1), 
('part04', '2018', 47), ('part00', '2018', 27), ('part13', '2018', 10), ('part12', '2018', 6), 
('part07', '2018', 4), ('part21', '2018', 3), ('part09', '2018', 2), ('part05', '2018', 2), 
('part15', '2018', 1), ('part04', '2017', 37), ('part00', '2017', 17), ('part12', '2017', 15), 
('part13', '2017', 7), ('part19', '2017', 6), ('part25', '2017', 2), ('part07', '2017', 2), 
('part03', '2017', 2), ('part27', '2017', 2), ('part21', '2017', 1), ('part09', '2017', 1), 
('part28', '2017', 1), ('part04', '2016', 49), ('part00', '2016', 29), ('part12', '2016', 12), 
('part02', '2016', 5), ('part15', '2016', 2), ('part24', '2016', 2), ('part08', '2016', 2), 
('part29', '2016', 1), ('part01', '2016', 1), ('part21', '2016', 1), ('part13', '2016', 1), 
('part28', '2016', 1), ('part25', '2016', 1), ('part04', '2015', 30), ('part00', '2015', 15), 
('part15', '2015', 8), ('part28', '2015', 5), ('part09', '2015', 4), ('part12', '2015', 2), 
('part02', '2015', 2), ('part01', '2015', 1), ('part13', '2015', 1), ('part29', '2015', 1), 
('part00', '2014', 6), ('part12', '2014', 2), ('part28', '2014', 1)
]
 

Based on this, I currently have my second attempt (a lot faster than the first) to create a table on the data.
This time by first creating "order" of the above - using a loop - inserting the data into a numpy array, then looping through the array to get:

          2023  2022  2021  2020  2019  2018  2017  2016  2015
Part00:      -    72    20     9    37    27    17    29    15  . 226
Part01:     15     4     -     -     1     -     -     1     1  . 22
Part02:      4     1     -     -     -     -     -     5     2  . 12
Part03:      1     -     1     4     3     -     2     -     -  . 11
Part04:      -    48    35    43    18    47    37    49    30  . 307
Part05:      -    22     3     4     8     2     -     -     -  . 39
Part06:      -    16     -     -     -     -     -     -     -  . 16
Part07:      -    16     3     6     -     4     2     -     -  . 31
Part08:      -    15    18    15     -     -     -     2     -  . 50
Part09:      -    12     1     1     2     2     1     -     4  . 23
Part10:      -     6     -     9     4     -     -     -     -  . 19
Part11:      -     4     -     -     5     -     -     -     -  . 9
...
Total: 908

Not all ^^^ shown as The S.O. Editor somehow barfed on that portion (WHY?!)

... but I still feel that my code is "non-pythonic"; as it essentially is looping over the the data items several times... might there be a more compact way to actually create the above table,

Part of the problem is sorting: so additionally have it EASILY sorted by the numbers; most basically by the right hand side sums, *or by some other "smart" order, even parametrically selectable in the end - that might display it in a more "overview-able" manner?

Might it be numpy isn't the best suited tool?
Or is there a quicker way to insert the data?

Maybe pandas is easier, more convenient?
I have limited experience with either of these.

So.

  1. Avoiding looping, a more pythonic way to create a python data structure of the source data?
  2. Given 1: Simple means to display the table, sorted and summed in a reconfigurable way - e.g. avoiding loops, with the least possible code.

I'm currently working on this using pandas, a simple

import pandas as pd
df=pd.DataFrame(sqldata)
print(df)

is a good start i believe.

From here rearranging the data in the df DataFrame (to get the table above) seems doable.

Working on it...


... so a list of parts and years; do I REALLY have to create a double-loop based on these?
partslist=sorted(list(set(df[0])))
print(partslist)

years=sorted(list(set(df[1])))
print(years)

... I imagine there is some slice-based transformation in pandas that will create the above "Matrix". Then; allowing that matrix to be manipulated based on parameters e.g. given on the command line at start.

Reading https://pandas.pydata.org/docs/user_guide/indexing.html, but cannot see the right tree because of the woods...



source https://stackoverflow.com/questions/75175477/quick-way-to-create-a-pivot-table-with-totals-from-2-dimensional-data-points-pa

Comments

Popular posts from this blog

ValueError: X has 10 features, but LinearRegression is expecting 1 features as input

So, I am trying to predict the model but its throwing error like it has 10 features but it expacts only 1. So I am confused can anyone help me with it? more importantly its not working for me when my friend runs it. It works perfectly fine dose anyone know the reason about it? cv = KFold(n_splits = 10) all_loss = [] for i in range(9): # 1st for loop over polynomial orders poly_order = i X_train = make_polynomial(x, poly_order) loss_at_order = [] # initiate a set to collect loss for CV for train_index, test_index in cv.split(X_train): print('TRAIN:', train_index, 'TEST:', test_index) X_train_cv, X_test_cv = X_train[train_index], X_test[test_index] t_train_cv, t_test_cv = t[train_index], t[test_index] reg.fit(X_train_cv, t_train_cv) loss_at_order.append(np.mean((t_test_cv - reg.predict(X_test_cv))**2)) # collect loss at fold all_loss.append(np.mean(loss_at_order)) # collect loss at order plt.plot(np.log(al...

Sorting large arrays of big numeric stings

I was solving bigSorting() problem from hackerrank: Consider an array of numeric strings where each string is a positive number with anywhere from to digits. Sort the array's elements in non-decreasing, or ascending order of their integer values and return the sorted array. I know it works as follows: def bigSorting(unsorted): return sorted(unsorted, key=int) But I didnt guess this approach earlier. Initially I tried below: def bigSorting(unsorted): int_unsorted = [int(i) for i in unsorted] int_sorted = sorted(int_unsorted) return [str(i) for i in int_sorted] However, for some of the test cases, it was showing time limit exceeded. Why is it so? PS: I dont know exactly what those test cases were as hacker rank does not reveal all test cases. source https://stackoverflow.com/questions/73007397/sorting-large-arrays-of-big-numeric-stings

How to load Javascript with imported modules?

I am trying to import modules from tensorflowjs, and below is my code. test.html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Document</title </head> <body> <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script> <script type="module" src="./test.js"></script> </body> </html> test.js import * as tf from "./node_modules/@tensorflow/tfjs"; import {loadGraphModel} from "./node_modules/@tensorflow/tfjs-converter"; const MODEL_URL = './model.json'; const model = await loadGraphModel(MODEL_URL); const cat = document.getElementById('cat'); model.execute(tf.browser.fromPixels(cat)); Besides, I run the server using python -m http.server in my command prompt(Windows 10), and this is the error prompt in the console log of my browser: Failed to loa...