I'm currently reading in a netCDF file using xarray in Python with a variety of 3 hourly temperature (t2m) data. The format of the data is (time: 2920, latitude: 189, longitude: 521) or (2920, 189, 521) which represents a year of data. I have 30 of these files 2GB each.
longitude (longitude) float32 -170.0 -169.8 ... -40.25 -40.0
latitude (latitude) float32 82.0 81.75 81.5 ... 35.5 35.25 35.0
time (time) datetime64[ns] 1979-01-01T01:00:00 ... 1979-12-...
I would like to reshape this data into a format which I can feed into scikit-learn's
sklearn.model_selection.train_test_split
i.e. I would like to generate the following DataFrame for each file/year:
index time lat lon t2m
0 1979-01-01T00:00:00 35 -170 270
1 1979-01-01T00:00:00 35 -169.75 269
2 1979-01-01T00:00:00 35 -169.5 271
...
n-1 1979-12-31T21:00:00 82 -40.25 241
n 1979-12-31T21:00:00 82 -40 244
Note that we would have 521 lat=35 rows before moving onto the next latitude value. After we get through all 189 latitude values we then go to the next timestep and repeat until finished.
I assume there is a way to achieve what I want with some combination of melting and reshaping of the xarray ds but I've yet to find anything that works. Any advice would be appreciated.
source https://stackoverflow.com/questions/73492964/reshaping-and-combining-data-from-netcdf-in-python
Comments
Post a Comment