代码之家 › 专栏 › 技术社区 › haroon khan

如何修复numpy TypeError:不支持-:“str”和“str”的操作数类型

machine-learning numpy python

-1

haroon khan · 技术社区 · 6 年前

我一直在尝试在spyder IDE上用python实现多项式回归模型,一切都很好,最后,当我试图从numpy中添加arrange函数时,会出现以下错误!!

import pandas as pd 
import matplotlib.pyplot as plt
import numpy as np

dataset = pd.read_csv("Position_Salaries.csv")
X = dataset.iloc[:, 1:2]
y = dataset.iloc[:, 2]

#fitting the linear regression model
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X,y)

#fitting the polynomial linear Regression
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
lin_reg2 = LinearRegression()
lin_reg2.fit(X_poly,y)

#visualising the linear regression results
plt.scatter(X,y ,color = 'red')
plt.plot(X,lin_reg.predict(X), color='blue')
plt.title('linear regression model')
plt.xlabel('positive level')
plt.ylabel('salary')
plt.show()

#the code doesnt work here on this np.arrange linee !!!
#visualisng the polynomial results
X_grid = np.arange(min(X),max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X,y ,color = 'red')
plt.plot(X_grid,lin_reg2.predict( poly_reg.fit_transform(X_grid)), color='blue')
plt.title('linear regression model')
plt.xlabel('positive level')
plt.ylabel('salary')
plt.show()

它应该运行和执行没有任何错误!

TypeError                                 Traceback (most recent call last)

<ipython-input-24-428026f3698c> in <module>()
----> 1 x_grid = np.arange(min(x),max(x),0.1)
      2 print(x_grid, x)
      3 x_grid = x_grid.reshape((len(x_grid),1))
      4 
      5 plt.scatter(x, y, color = 'red')

TypeError: unsupported operand type(s) for -: 'str' and 'str'

2 回复 | 直到 5 年前

hpaulj 6 年前

np.arange(min(X),max(X), 0.1)

一定是因为 min(X) 和 max(X) 是弦。

In [385]: np.arange('123','125')                                                                                
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-385-0a55b396a7c3> in <module>
----> 1 np.arange('123','125')

TypeError: unsupported operand type(s) for -: 'str' and 'str'

因为 X pandas 对象(数据帧还是序列?)这并不奇怪。 熊猫

X = dataset.iloc[:, 1:2]

np.arange(np.array('123'),np.array('125')) 生成关于“U3”数据类型的其他错误。

linearregression调用的事实与此相关 十 有点令人费解,但我不知道它是如何清理输入的。

无论如何,我会查一下 最小值(X) 在 arange 调用,查看其值和类型。如果它是一个字符串,那么探索 十

你在评论中说: there are two columns and all have integers from 1-10 and 45k to 100k. 45k是整数还是字符串?

让我们在一个虚拟数据帧上做一个测试:

In [392]: df = pd.DataFrame([[1,45000],[2,46000],[3,47000]], columns=('A','B'))                                 
In [393]: df                                                                                                    
Out[393]: 
   A      B
0  1  45000
1  2  46000
2  3  47000
In [394]: min(df)                                                                                               
Out[394]: 'A'
In [395]: max(df)                                                                                               
Out[395]: 'B'

min 和 max

fit 函数可能正在处理dataframe的数组值:

In [397]: df.to_numpy()                                                                                         
Out[397]: 
array([[    1, 45000],
       [    2, 46000],
       [    3, 47000]])

别以为事情会成功的!测试、调试、打印可疑值。

min/max

In [399]: np.min(df)      # delegates to df.min()                                                                                      
Out[399]: 
A        1
B    45000
dtype: int64
In [400]: np.max(df)                                                                                            
Out[400]: 
A        3
B    47000
dtype: int64

虽然这些都不是 阿兰奇

你到底打算用这个做什么 阿兰奇 打电话?

阿兰奇 关于范围一列数据帧工程:

In [405]: np.arange(np.min(df['A']), np.max(df['A']),.1)                                                        
Out[405]: 
array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
       2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])

Linda 5 年前

X_grid = np.arange(min(X ['Level']), max(X['Level']), 0.01, dtype= float) 
X_grid = X_grid.reshape((len(X_grid), 1))

#plotting
plt.scatter(X,y, color = 'red')
plt.plot(X,lin_reg2.predict(poly_reg.fit_transform(X)), color = 'blue') ``
plt.title('Truth or Bluff (Polynomial Regression)')
plt.xlabel('Position Level')
plt.ylabel('Salary')

JChao 6 年前

你需要确保你的输入是正确的类型。在我看来,两种类型的手术都有 str float(x) 或者类似的功能?

Akaisteph7 6 年前

您应该检查X和y中的内容。它们可能是包含字符串的系列对象。您需要的是提取X和y中的值,并在处理它们之前将它们转换为float/int。

X = dataset.iloc[:, 1:2].astype(float)
y = dataset.iloc[:, 2].astype(float)

jwpfox Amit 5 年前

使用这个:

x = dataset.iloc[:, 1:2].values

y = dataset.iloc[:, -1:].values

因为你只需要接受 x y .

使用 dataset.iloc[].values 不包括 Level 和 Salary 姓名 是的 数据集。

Hoppo Guest 5 年前

X = dataset.iloc[:, 1:2] and y = dataset.iloc[:, 2]

有了,

X = dataset.iloc[:, 1:2].values and y = dataset.iloc[:, 2].values