代码之家 › 专栏 › 技术社区 › Steve

计算CSV多类数据集的精度和召回率。

precision-recall machine-learning python

Steve · 技术社区 · 8 年前

我需要计算精确和 回忆起 来自包含多类分类的CSV。

更具体地说,我的csv结构如下:

real_class1, classified_class1
real_class2, classified_class3
real_class3, classified_class4
real_class4, classified_class2

总共有六个类别。

在二进制示例中,我很容易理解如何计算真阳性、假阳性、真阴性和假阴性。但对于一个多类课程,我不知道如何进行。

谁能给我举个例子吗?可能是python?

1 回复 | 直到 8 年前

-2

Aso Strife 8 年前

正如评论中所建议的,您必须创建混淆矩阵并遵循以下步骤:

(我假设您使用spark是为了在机器学习处理方面有更好的性能)

from __future__ import division
import pandas as pd
import numpy as np
import pickle
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext, functions as fn
from sklearn.metrics import confusion_matrix

def getFirstColumn(line):
    parts = line.split(',')
    return parts[0]

def getSecondColumn(line):
    parts = line.split(',')
    return parts[1]

# Initialization
conf= SparkConf()
conf.setAppName("ConfusionMatrixPrecisionRecall")

sc = SparkContext(conf= conf) # SparkContext
sqlContext = SQLContext(sc) # SqlContext

data = sc.textFile('YOUR_FILE_PATH') # Load dataset

y_true = data.map(getFirstColumn).collect() # Split from line the class
y_pred = data.map(getSecondColumn).collect() # Split from line the tags

confusion_matrix = confusion_matrix(y_true, y_pred)
print("Confusion matrix:\n%s" % confusion_matrix)

# The True Positives are simply the diagonal elements
TP = np.diag(confusion_matrix)
print("\nTP:\n%s" % TP)

# The False Positives are the sum of the respective column, minus the diagonal element (i.e. the TP element
FP = np.sum(confusion_matrix, axis=0) - TP
print("\nFP:\n%s" % FP)

# The False Negatives are the sum of the respective row, minus the         diagonal (i.e. TP) element:
FN = np.sum(confusion_matrix, axis=1) - TP
print("\nFN:\n%s" % FN)

num_classes = INTEGER #static kwnow a priori, put your number of classes
TN = []

for i in range(num_classes):
    temp = np.delete(confusion_matrix, i, 0)    # delete ith row
    temp = np.delete(temp, i, 1)  # delete ith column
    TN.append(sum(sum(temp)))
print("\nTN:\n%s" % TN)




precision = TP/(TP+FP)
recall = TP/(TP+FN)

print("\nPrecision:\n%s" % precision)

print("\nRecall:\n%s" % recall)

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

1 年前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

1 年前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

1 年前

user29715306 · from_users=和chats=电视节目中的差异

1 年前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

1 年前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

1 年前

prayner · 更新嵌套字典包含列表中的项

1 年前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

1 年前

Dave · 如何在for循环中修改列表值

1 年前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

1 年前