TripID time Latitude Longitude
42 7 52.9 4.4
42 8 53.0 4.6
42 9 53.0 4.7 * missing value
42 10 53.1 4.9
42 11 53.2 4.9
42 12 53.3 5.3 * missing value
42 15 53.7 5.6
5 9 53.0 4.5
5 10 53.0 4.7 * missing value
5 11 53.2 5.0
5 12 53.4 5.2
5 14 53.6 5.3 * missing value
5 17 53.4 5.5
5 18 53.3 5.7
34 19 53.0 4.5
34 20 53.0 4.7
34 24 53.9 4.8 ** value already exists
34 25 53.8 4.9
34 27 53.8 5.3
34 28 53.8 5.3 * missing value
34 31 53.7 5.6
34 32 53.6 5.7
import numpy as np
import pandas as pd
#import data
df = pd.read_csv("test.txt", delim_whitespace=True)
#set floating point output precision to prevent excessively long columns
pd.set_option("display.precision", 2)
#remember original column order
cols = df.columns
#define the sector borders
sectors = [4.8, 5.4]
#create all combinations of sector borders and TripIDs
dfborders = pd.DataFrame(index = pd.MultiIndex.from_product([df.TripID.unique(), sectors], names = ["TripID", "Longitude"])).reset_index()
#delete those combinations of TripID and Longitude that already exist in the original dataframe
dfborders = pd.merge(df, dfborders, on = ["TripID", "Longitude"], how = "right")
dfborders = dfborders[dfborders.isnull().any(axis = 1)]
#insert missing data points
df = pd.concat([df, dfborders])
#and sort dataframe to insert the missing data points in the right position
df = df[cols].groupby("TripID", sort = False).apply(pd.DataFrame.sort_values, ["Longitude", "time", "Latitude"])
#temporarily set longitude as index for value-based interpolation
df.set_index(["Longitude"], inplace = True, drop = False)
#interpolate group-wise
df = df.groupby("TripID", sort = False).apply(lambda g: g.interpolate(method = "index"))
#create sector ID column assuming that longitude is between -180 and +180
df["SectorID"] = np.digitize(df["Longitude"], bins = [-180] + sectors + [180])
#and reset index
df.reset_index(drop = True, inplace = True)
print(df)
TripID time Latitude Longitude SectorID
0 42 7.00 52.90 4.4 1
1 42 8.00 53.00 4.6 1
2 42 9.00 53.00 4.7 1
3 42 9.50 53.05 4.8 2 * interpolated data point
4 42 10.00 53.10 4.9 2
5 42 11.00 53.20 4.9 2
6 42 12.00 53.30 5.3 2
7 42 13.00 53.43 5.4 3 * interpolated data point
8 42 15.00 53.70 5.6 3
9 5 9.00 53.00 4.5 1
10 5 10.00 53.00 4.7 1
11 5 10.33 53.07 4.8 2 * interpolated data point
12 5 11.00 53.20 5.0 2
13 5 12.00 53.40 5.2 2
14 5 14.00 53.60 5.3 2
15 5 15.50 53.50 5.4 3 * interpolated data point
16 5 17.00 53.40 5.5 3
17 5 18.00 53.30 5.7 3
18 34 19.00 53.00 4.5 1
19 34 20.00 53.00 4.7 1
20 34 24.00 53.90 4.8 2
21 34 25.00 53.80 4.9 2
22 34 27.00 53.80 5.3 2
23 34 28.00 53.80 5.3 2
24 34 29.00 53.77 5.4 3 * interpolated data point
25 34 31.00 53.70 5.6 3
26 34 32.00 53.60 5.7 3
现在请注意。我不知道,如何添加丢失的行。我会问一个问题,怎么做。如果我得到答案,我会在这里更新我的。在此之前,副作用是表在每个
TripID
Longitude
假设是