代码之家  ›  专栏  ›  技术社区  ›  Jan Kaiser

根据组插入数据帧中的特定条目

  •  2
  • Jan Kaiser  · 技术社区  · 7 年前

    您可以在下图中看到我为这次旅行决定的边界。穿越边界的经度总是正好在边界线上。我需要确定的是这条线被船划过的纬度。

    3 42 10 53.1 4.9 2 5 5 10 53.0 4.7 1 0 42 7 52.9 4.4 1 8 42 9.5 53.05 4.8 1 3 42 10 53.1 4.9 2 4 5 9 53.0 4.5 1 5 5 10 53.0 4.7 1 9 5 10.3 53.06 4.8 1 6 5 11 53.2 5.0 2 7 5 12 53.3 5.2 2

    
    

    我也会很高兴并且能够得到这样的结果:

    离开时间 42 1 53.05 4.8 9.5 5 1 53.06 4.8 10.3

    如果我对问题的描述不太清楚,请提问。

    Map of my borders

           TripID  time  Latitude Longitude  SectorID
    0      42       7    52.9     4.4        1
    1      42       8    53.0     4.6        1
    2      42       9    53.0     4.7        1
    3      42      10    53.1     4.9        2
    4       5       9    53.0     4.5        1
    5       5      10    53.0     4.7        1
    6       5      11    53.2     5.0        2
    7       5      12    53.3     5.2        2
    

    df.groupby(['TripID', 'SectorID'])

    interpolate但是添加条目对我来说需要大约一个小时,并且插入缺失的值会立即崩溃。

           TripID  time  Latitude Longitude  SectorID
    0      42       7    52.9     4.4        1
    1      42       8    53.0     4.6        1
    2      42       9    53.0     4.7        1
    8      42     9.5   53.05     4.8        1
    3      42      10    53.1     4.9        2
    4       5       9    53.0     4.5        1
    5       5      10    53.0     4.7        1
    9       5    10.3   53.06     4.8        1
    6       5      11    53.2     5.0        2
    7       5      12    53.3     5.2        2
    

     TripID  SectorID  leave_lat  leave_lon  leave_time
     42      1         53.05      4.8        9.5
     5       1         53.06      4.8        10.3
    

    2 回复  |  直到 6 年前
        1
  •  1
  •   Mr. T Andres Pinzon    6 年前

    TripID  time  Latitude Longitude  
    42       7    52.9     4.4        
    42       8    53.0     4.6        
    42       9    53.0     4.7 * missing value
    42      10    53.1     4.9 
    42      11    53.2     4.9         
    42      12    53.3     5.3 * missing value
    42      15    53.7     5.6    
    5        9    53.0     4.5        
    5       10    53.0     4.7  * missing value
    5       11    53.2     5.0       
    5       12    53.4     5.2        
    5       14    53.6     5.3  * missing value
    5       17    53.4     5.5        
    5       18    53.3     5.7  
    34      19    53.0     4.5  
    34      20    53.0     4.7          
    34      24    53.9     4.8  ** value already exists
    34      25    53.8     4.9        
    34      27    53.8     5.3        
    34      28    53.8     5.3  * missing value
    34      31    53.7     5.6        
    34      32    53.6     5.7 
    

    import numpy as np
    import pandas as pd
    
    #import data
    df = pd.read_csv("test.txt", delim_whitespace=True)
    
    #set floating point output precision to prevent excessively long columns
    pd.set_option("display.precision", 2)
    #remember original column order
    cols = df.columns
    #define the sector borders
    sectors = [4.8, 5.4]
    
    #create all combinations of sector borders and TripIDs
    dfborders = pd.DataFrame(index = pd.MultiIndex.from_product([df.TripID.unique(), sectors], names = ["TripID", "Longitude"])).reset_index()
    #delete those combinations of TripID and Longitude that already exist in the original dataframe
    dfborders = pd.merge(df, dfborders, on = ["TripID", "Longitude"], how = "right")
    dfborders = dfborders[dfborders.isnull().any(axis = 1)]
    #insert missing data points
    df = pd.concat([df, dfborders])
    #and sort dataframe to insert the missing data points in the right position
    df = df[cols].groupby("TripID", sort = False).apply(pd.DataFrame.sort_values, ["Longitude", "time", "Latitude"])
    
    #temporarily set longitude as index for value-based interpolation
    df.set_index(["Longitude"], inplace = True, drop = False)
    #interpolate group-wise
    df = df.groupby("TripID", sort = False).apply(lambda g: g.interpolate(method = "index"))
    #create sector ID column assuming that longitude is between -180 and +180
    df["SectorID"] = np.digitize(df["Longitude"], bins = [-180] + sectors + [180])
    #and reset index
    df.reset_index(drop = True, inplace = True)
    print(df)
    

        TripID   time  Latitude  Longitude  SectorID
    0       42   7.00     52.90        4.4         1
    1       42   8.00     53.00        4.6         1
    2       42   9.00     53.00        4.7         1
    3       42   9.50     53.05        4.8         2 * interpolated data point
    4       42  10.00     53.10        4.9         2
    5       42  11.00     53.20        4.9         2
    6       42  12.00     53.30        5.3         2
    7       42  13.00     53.43        5.4         3 * interpolated data point
    8       42  15.00     53.70        5.6         3
    9        5   9.00     53.00        4.5         1
    10       5  10.00     53.00        4.7         1
    11       5  10.33     53.07        4.8         2 * interpolated data point
    12       5  11.00     53.20        5.0         2
    13       5  12.00     53.40        5.2         2
    14       5  14.00     53.60        5.3         2
    15       5  15.50     53.50        5.4         3 * interpolated data point
    16       5  17.00     53.40        5.5         3
    17       5  18.00     53.30        5.7         3
    18      34  19.00     53.00        4.5         1
    19      34  20.00     53.00        4.7         1
    20      34  24.00     53.90        4.8         2
    21      34  25.00     53.80        4.9         2
    22      34  27.00     53.80        5.3         2
    23      34  28.00     53.80        5.3         2
    24      34  29.00     53.77        5.4         3 * interpolated data point
    25      34  31.00     53.70        5.6         3
    26      34  32.00     53.60        5.7         3
    

    现在请注意。我不知道,如何添加丢失的行。我会问一个问题,怎么做。如果我得到答案,我会在这里更新我的。在此之前,副作用是表在每个 TripID Longitude 假设是

        2
  •  0
  •   Jan Kaiser    6 年前

    df

            TripID  time  Latitude Longitude  SectorID
    0      42       7    52.9     4.4        1
    1      42       8    53.0     4.6        1
    2      42       9    53.0     4.7        1
    3      42      10    53.1     4.9        2
    4       5       9    53.0     4.5        1
    5       5      10    53.0     4.7        1
    6       5      11    53.2     5.0        2
    7       5      12    53.3     5.2        2
    

    df = df.sort_values('time')
    
    df['next_lat'] = df.groupby('TripID')['Latitude'].shift(-1)
    df['next_lon'] = df('TripID')['Longitude'].shift(-1)
    df['next_time'] = df('TripID')['time'].shift(-1)
    df['next_sector_id'] = df('TripID')['sector'].shift(-1)
    df = df.sort_values(['TripID', 'time'])
    df['next_trip_id'] = df['TripID'].shift(-1)
    
    lasts = df[df['SectorID'] != df['next_sector_id']]
    
    lasts.loc[lasts['SectorID'] == '1', 'sector_leave_lon'] = 4.8
    
    lasts.loc[lasts['sector'] == '2', 'sector_leave_lat'] = lasts.loc[lasts['sector'] == '2', 'Latitude'] + ((lasts.loc[lasts['sector'] == '2', 'sector_leave_lon'] - lasts.loc[lasts['sector'] == '2', 'Longitude']) / (lasts.loc[lasts['sector'] == '2', 'next_lon'] - lasts.loc[lasts['sector'] == '2', 'Longitude'])) * (lasts.loc[lasts['sector'] == '2', 'next_lon'] - lasts.loc[lasts['sector'] == '2', 'Longitude'])
    
    lasts.loc[lasts['sector'] == '2', 'sector_leave_time'] = lasts.loc[lasts['sector'] == '2', 'time'] + ((lasts.loc[lasts['sector'] == '2', 'sector_leave_lon'] - lasts.loc[lasts['sector'] == '2', 'Longitude']) / (lasts.loc[lasts['sector'] == '2', 'next_lon'] - lasts.loc[lasts['sector'] == '2', 'Longitude'])) * (lasts.loc[lasts['sector'] == '2', 'next_time'] - lasts.loc[lasts['sector'] == '2', 'time'])
    
    df['sector_leave_lat'] = lasts['sector_leave_lat']
    df['sector_leave_time'] = lasts['sector_leave_time']
    
    df['sector_leave_lat'] = df(['TripID', 'sector'])['sector_leave_lat'].transform('last')
    df['sector_leave_time'] = df(['TripID', 'sector'])['sector_leave_time'].transform('last')
    
    df = df.drop(['next_lat', 'next_lon', 'next_time', 'next_sector_id', 'next_trip_id'], axis = 1)
    

            TripID  time  Latitude Longitude  SectorID  sector_leave_lat  sector_leave_time
    0      42       7    52.9     4.4        1          53.05              9.5
    1      42       8    53.0     4.6        1          53.05              9.5
    2      42       9    53.0     4.7        1          53.05              9.5
    3      42      10    53.1     4.9        2          NaN               NaN
    4       5       9    53.0     4.5        1          53.06             10.3
    5       5      10    53.0     4.7        1          53.06             10.3
    6       5      11    53.2     5.0        2          NaN               NaN
    7       5      12    53.3     5.2        2          NaN               NaN
    

    我希望这能帮助那些实际解决方案无法解决问题的人。