I am trying to model a distribution for a particular stock in terms of the amount of times the order book is updated at a certain timeframe.
The issue I am having is in relation to data engineering and pandas. Because I am working only with trading hours and trading days, my dataset has multiple gaps and so the data does not appear continuous. The graph below clearly shows that:
You can see that the large gaps are weekends and the small gaps are post trading hours. The small black squares (if zoomed in) would look something like this: 
The dataframe would look like this:
arrivalTime value date
0 days 09:30:02.231 1 2021-05-03
0 days 09:30:02.981 3 2021-05-03
0 days 09:30:02.999 99 2021-05-03
0 days 09:30:10.284 11 2021-05-03
0 days 09:30:10.293 92 2021-05-03
... ... ...
0 days 15:59:42.654 82 2021-05-28
0 days 15:59:42.655 19 2021-05-28
0 days 15:59:42.651 122 2021-05-28
0 days 15:59:42.941 199 2021-05-28
0 days 15:59:44.721 19 2021-05-28
The exact thing that I would need is that once a trading day is over, the next day continues exactly after that day has ended. Let me know if there are any questions
Thanks!
source https://stackoverflow.com/questions/68535354/remove-lags-gaps-in-time-series-data-for-a-particular-dataframe-in-pandas

Comments
Post a Comment