Reshaping and combining data from netCDF in Python

I'm currently reading in a netCDF file using xarray in Python with a variety of 3 hourly temperature (t2m) data. The format of the data is (time: 2920, latitude: 189, longitude: 521) or (2920, 189, 521) which represents a year of data. I have 30 of these files 2GB each.

longitude (longitude) float32         -170.0 -169.8 ... -40.25 -40.0
latitude  (latitude)  float32         82.0 81.75 81.5 ... 35.5 35.25 35.0
time      (time)      datetime64[ns]  1979-01-01T01:00:00 ... 1979-12-...

I would like to reshape this data into a format which I can feed into scikit-learn's

sklearn.model_selection.train_test_split

i.e. I would like to generate the following DataFrame for each file/year:

index   time                  lat   lon       t2m
0       1979-01-01T00:00:00   35    -170      270
1       1979-01-01T00:00:00   35    -169.75   269
2       1979-01-01T00:00:00   35    -169.5    271
...
n-1     1979-12-31T21:00:00   82    -40.25    241
n       1979-12-31T21:00:00   82    -40       244

Note that we would have 521 lat=35 rows before moving onto the next latitude value. After we get through all 189 latitude values we then go to the next timestep and repeat until finished.

I assume there is a way to achieve what I want with some combination of melting and reshaping of the xarray ds but I've yet to find anything that works. Any advice would be appreciated.

source https://stackoverflow.com/questions/73492964/reshaping-and-combining-data-from-netcdf-in-python

StacksPedia

Search This Blog

Reshaping and combining data from netCDF in Python

Labels

Comments

Post a Comment

Popular posts from this blog

Confusion between commands.Bot and discord.Client | Which one should I use?

How to show number of registered users in Laravel based on usertype?

Why is my reports service not connecting?