代码之家 › 专栏 › 技术社区 › gabboshow

关于回归的均方根误差的计算

linear-regression regression dataframe pandas python

gabboshow · 技术社区 · 6 年前

假设我得到了下面的熊猫数据框架用于回归分析。

import pandas
import math
import numpy

df = pandas.DataFrame(numpy.random.randint(0,100,size=(100, 2)), columns=['labels','predictions'])

我现在想把RMSE计算为

math.sqrt(numpy.mean((df["predictions"] - df["lables"]) ** 2))

对于间隔为7的标签值

这是一个非常难看的代码,能完成这项工作……如果你能帮我把它弄脏,那就太好了……

# define step
step = 7
# initialize counter
idx = 0
# initialize empty dataframe
rmse = pandas.DataFrame(columns=['bout' , 'rmse'],index=range(0,len(range(int(df['labels'].min())+step,int(df['labels'].max()),step))))

# start loop to calculate rmse every 7 units
for i in range(int(df['labels'].min())+step,int(df['labels'].max()),step):

    # select values in interval
    df_bout = df[(df['labels']>=i-step) & (df['labels']<i)]

    # calculate rmse in interval
    rmse.loc[idx] = [str(i-step)+'-'+str(i),math.sqrt(numpy.mean((df_bout.predictions - df_bout.labels) ** 2))]

    # increment counter
    idx = idx + 1

1 回复 | 直到 6 年前

Lukas Thaler 6 年前

我为一开始的误会道歉。下面的代码片段给出了您希望的结果

from sklearn.metrics import mean_squared_error
import pandas
import math
import numpy

df = pandas.DataFrame(numpy.random.randint(0, 100, size = (100, 2)), columns = ['labels','predictions']).sort_values(by = 'labels', ascending = True)
def rmse(df):
    return numpy.sqrt(mean_squared_error(df['labels'], df['predictions']))

output = df.groupby(numpy.floor(numpy.array(df['labels'] / 7))).apply(rmse)
rmse_df = pandas.DataFrame({'bout': [str(int(output.index[i] * 7)) + ' - ' + str(int(output.index[i + 1] * 7)) for i in range(output.shape[0] - 1)], 'rmse': output.values[:-1]})

您可以为变量更改我的代码中的7s step 如果要动态更改步骤大小

推荐文章

Shawn Hemelstrand · 如何在ggplot中的许多回归线中为一条特定回归线上色以匹配注释?

3 年前

Arvind Sharma · 在R中编写一个用于回归的循环,替换自变量进行稳健性检查

3 年前

krassowski · 如何将'VGAM::cumulative'包装到助手函数中('object not found'问题)?

3 年前

Nazanin · CNN模型的核尺寸

3 年前

Anna Carolina de RoldÃ£o · R中的错误消息:错误`[.data.frame`(m,labs):选择了未定义的列

3 年前

D. Smel · 如何在线性模型的两侧生成对数函数

7 年前

MTT · 最适合散点图的回归

7 年前

santobedi · scikit学习中的多输出高斯过程回归

7 年前

Ben · FELM+Stargazer-将工具变量估计与OLS对齐

7 年前

Anx8 · minepy:缓冲区的维度数错误

7 年前