代码之家 › 专栏 › 技术社区 › Sudhi

比较两个列,一个按浮点,另一个按字符串,以获得匹配的值

string-comparison pandas python-3.x python

Sudhi · 技术社区 · 6 年前

我有两个数据帧,其中有两列非常重要。其中一列由float64个值组成,另一列是string。数据帧的大小不同。

Number 列和 Item 列,然后仅获取匹配的列。

df1 = pd.DataFrame({ 'Number':[1.0,3.0,4.0,5.0,8.0,12.0,32.0,58.0] , 'Item': ['Phone', 'Watch', 'Pen', 'Pencil', 'Pencil','toolkit','box','fork']})

df2 = pd.DataFrame({'Number':[3.0,4.0,8.0,12.0,15.0,32.0,54.0,58.0,72.0], 'Item':['Watch','Pen','Pencil','Eraser','bottle','box','toolkit','fork','Phone']})

df1
Number     Item
0     1.0    Phone
1     3.0    Watch
2     4.0      Pen
3     5.0   Pencil
4     8.0   Pencil
5    12.0  toolkit
6    32.0      box
7    58.0     fork

df2
   Number     Item
0     3.0    Watch
1     4.0      Pen
2     8.0   Pencil
3    12.0   Eraser
4    15.0   bottle
5    32.0      box
6    54.0  toolkit
7    58.0     fork
8    72.0    Phone

我试着用forloop,循环很长。这似乎是一个非常糟糕的方法来实现这一点。我试图使用掩模操作,但不确定如何实现这一点。感谢您以最短的方式提供帮助。

      Item  Matching  Number
0    Phone  No Match     1.0
1    Watch   Matched     3.0
2      Pen   Matched     4.0
3   Pencil  No Match     5.0
4   Pencil   Matched     8.0
5  toolkit  No Match    12.0
6      box   Matched    32.0
7     fork   Matched    58.0

3 回复 | 直到 6 年前

Sreeram TP 6 年前

loc 和 isin 如下图所示

df = df1.copy()

df['Matching'] = np.nan
df.loc[(df.Number.isin(df2.Number)) & (df.Item.isin(df2.Item)), 'Matching'] = 'Matched'
df.Matching.fillna('No Match', inplace=True)

Number    Item      Matching

1.0   Phone     No Match
3.0   Watch     Matched
4.0   Pen       Matched
5.0   Pencil    No Match
8.0   Pencil    Matched
12.0  toolkit   Matched
32.0  box       Matched
58.0  fork      Matched

jezrael 6 年前

1000 转换成整数然后 merge 对于left join,因为匹配应该有问题,所以两列中的baciuse float精度应该不同:

df1['Number1'] = df1['Number'].mul(1000).astype(int)
df2['Number1'] = df2['Number'].mul(1000).astype(int)

df = pd.merge(df1, df2.drop('Number', 1), how='left', on=['Item','Number1'], indicator=True)
df['Matching'] = df['_merge'].map({'left_only':'No Match', 'both':'Match'})

df = df.drop(['Number1','_merge'], axis=1)
print (df)

   Number     Item  Matching
0     1.0    Phone  No Match
1     3.0    Watch     Match
2     4.0      Pen     Match
3     5.0   Pencil  No Match
4     8.0   Pencil     Match
5    12.0  toolkit  No Match
6    32.0      box     Match
7    58.0     fork     Match

jpp 6 年前

indicator=True

res = pd.merge(df1, df2, how='left', indicator=True)

print(res)

      Item  Number     _merge
0    Phone     1.0  left_only
1    Watch     3.0       both
2      Pen     4.0       both
3   Pencil     5.0  left_only
4   Pencil     8.0       both
5  toolkit    12.0  left_only
6      box    32.0       both
7     fork    58.0       both

一般来说,避免显式的 for 当有专门构建的方法可用时循环,因为这些方法通常针对性能进行了优化。如果愿意,可以通过字典映射替换字符串:

d = {'left_only': 'No Match', 'both': 'Matched'}
df['_merge'] = df['_merge'].map(d)

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

5 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

5 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

5 月前

user29715306 · from_users=和chats=电视节目中的差异

5 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

5 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

5 月前

prayner · 更新嵌套字典包含列表中的项

5 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

5 月前

Dave · 如何在for循环中修改列表值

5 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

5 月前