代码之家 › 专栏 › 技术社区 › stone rock

如何根据其他列的值计算NaN值?

pandas python

stone rock · 技术社区 · 6 年前

数据框中有两列

1)工作经验(年)

2)公司类型

我想在工作经验栏的基础上加上公司类型栏。“公司类型”列有NaN值,我想根据“工作经验”列填写。“工作经验”列没有任何缺少的值。

这里work-exp是数值数据,company-type是分类数据。

示例数据:

Work_exp      company_type
   10            PvtLtd
   0.5           startup
   6           Public Sector
   8               NaN
   1             startup
   9              PvtLtd
   4               NaN
   3           Public Sector
   2             startup
   0               NaN

我已经确定了估算NaN值的阈值。

Startup if work_exp < 2yrs
Public sector if work_exp > 2yrs and <8yrs
PvtLtd if work_exp >8yrs

基于上述阈值标准,我如何估算列company_type中缺少的分类值。

2 回复 | 直到 6 年前

jpp 6 年前

你可以用 numpy.select 具有 numpy.where 以下内容:

# define conditions and values
conditions = [df['Work_exp'] < 2, df['Work_exp'].between(2, 8), df['Work_exp'] > 8]
values = ['Startup', 'PublicSector', 'PvtLtd']

# apply logic where company_type is null
df['company_type'] = np.where(df['company_type'].isnull(),
                              np.select(conditions, values),
                              df['company_type'])

print(df)

   Work_exp  company_type
0      10.0        PvtLtd
1       0.5       startup
2       6.0  PublicSector
3       8.0  PublicSector
4       1.0       startup
5       9.0        PvtLtd
6       4.0  PublicSector
7       3.0  PublicSector
8       2.0       startup
9       0.0       Startup

pd.Series.between 默认情况下包括起始值和结束值,并允许比较 float 价值观使用 inclusive=False 省略边界的参数。

s = pd.Series([2, 2.5, 4, 4.5, 5])

s.between(2, 4.5)

0     True
1     True
2     True
3     True
4    False
dtype: bool

gyx-hh 6 年前

很好的回答@jpp。只想在这里添加一个不同的方法 pandas.cut() .

df['company_type'] = pd.cut(
    df.Work_exp,
    bins=[0,2,8,100],
    right=False,
    labels=['Startup', 'Public', 'Private']
)



   Work_exp company_type
0   10.0    Private
1   0.5     Startup
2   6.0     Public
3   8.0     Private
4   1.0     Startup
5   9.0     Private
6   4.0     Public
7   3.0     Public
8   2.0     Public
9   0.0     Startup