代码之家  ›  专栏  ›  技术社区  ›  user2836189

scikit学习-带置信区间的ROC曲线

  •  36
  • user2836189  · 技术社区  · 12 年前

    我可以使用获得ROC曲线 scikit-learn 具有 fpr , tpr , thresholds = metrics.roc_curve(y_true,y_pred, pos_label=1) 哪里 y_true 是基于我的黄金标准的值列表(即。, 0 对于负极和 1 对于阳性病例)和 y_pred 是对应的分数列表(例如。, 0.053497243 , 0.008521122 , 0.022781548 , 0.101885263 , 0.012913795 , 0.0 , 0.042881547 [...])

    我正试图找出如何将置信区间添加到曲线中,但没有找到任何简单的方法来使用sklearn。

    2 回复  |  直到 9 年前
        1
  •  41
  •   ogrisel    4 年前

    您可以引导ROC计算(带有替换的新版本的示例 y_true / y_pred 从原件中删除 _真 / y_预处理 并重新计算的新值 roc_curve 每次),并且以这种方式估计置信区间。

    为了将列车测试拆分引起的可变性考虑在内,您也可以使用 ShuffleSplit CV迭代器多次,在火车分割上拟合模型,生成 y_预处理 对于每个模型,从而收集 过程曲线(_C) s,并最终计算它们的置信区间。

    编辑 :在python中引导

    这里是一个从单个模型的预测中引导ROC AUC分数的例子。我选择引导ROC AUC,以使其更容易作为堆栈溢出答案,但它可以改为引导整个曲线:

    import numpy as np
    from scipy.stats import sem
    from sklearn.metrics import roc_auc_score
    
    y_pred = np.array([0.21, 0.32, 0.63, 0.35, 0.92, 0.79, 0.82, 0.99, 0.04])
    y_true = np.array([0,    1,    0,    0,    1,    1,    0,    1,    0   ])
    
    print("Original ROC area: {:0.3f}".format(roc_auc_score(y_true, y_pred)))
    
    n_bootstraps = 1000
    rng_seed = 42  # control reproducibility
    bootstrapped_scores = []
    
    rng = np.random.RandomState(rng_seed)
    for i in range(n_bootstraps):
        # bootstrap by sampling with replacement on the prediction indices
        indices = rng.randint(0, len(y_pred), len(y_pred))
        if len(np.unique(y_true[indices])) < 2:
            # We need at least one positive and one negative sample for ROC AUC
            # to be defined: reject the sample
            continue
    
        score = roc_auc_score(y_true[indices], y_pred[indices])
        bootstrapped_scores.append(score)
        print("Bootstrap #{} ROC area: {:0.3f}".format(i + 1, score))
    

    您可以看到,我们需要拒绝一些无效的重采样。然而,在具有许多预测的真实数据中,这是一个非常罕见的事件,不应显著影响置信区间(您可以尝试改变 rng_seed 检查)。

    这是直方图:

    import matplotlib.pyplot as plt
    plt.hist(bootstrapped_scores, bins=50)
    plt.title('Histogram of the bootstrapped ROC AUC scores')
    plt.show()
    

    Histogram of the bootstrapped ROC AUC scores

    请注意,重新采样的分数在[0-1]范围内被审查,导致最后一个bin中有大量分数。

    要获得置信区间,可以对样本进行排序:

    sorted_scores = np.array(bootstrapped_scores)
    sorted_scores.sort()
    
    # Computing the lower and upper bound of the 90% confidence interval
    # You can change the bounds percentiles to 0.025 and 0.975 to get
    # a 95% confidence interval instead.
    confidence_lower = sorted_scores[int(0.05 * len(sorted_scores))]
    confidence_upper = sorted_scores[int(0.95 * len(sorted_scores))]
    print("Confidence interval for the score: [{:0.3f} - {:0.3}]".format(
        confidence_lower, confidence_upper))
    

    其给出:

    Confidence interval for the score: [0.444 - 1.0]
    

    置信区间很宽,但这可能是我选择预测的结果(9个预测中有3个错误),而且预测的总数很小。

    关于这个图的另一个注释是:分数是量化的(许多空的直方图仓)。这是少量预测的结果。可以在分数上引入一点高斯噪声(或者 y_预处理 值)来平滑分布并使直方图看起来更好。但是,平滑带宽的选择很棘手。

    最后,如前所述,这个置信区间是针对您的训练集的。为了更好地估计由模型类和参数引起的ROC的可变性,您应该进行迭代交叉验证。然而,这通常要昂贵得多,因为你需要为每个随机的训练/测试分割训练一个新的模型。

    编辑:自从我第一次写这个回复以来,scipy中直接有一个引导程序实现:

    https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bootstrap.html

        2
  •  29
  •   Raul    4 年前

    德隆解决方案 [无引导]

    正如这里的一些人所建议的那样 pROC R中的包对于开箱即用的ROC AUC置信区间非常方便,但在python中找不到该包。根据 中华人民共和国 documentation ,置信区间通过DeLong计算:

    DeLong是一种评估不确定性的渐近精确方法 AUC(DeLong等人(1988))。自版本1.9以来,pROC使用 Sun和Xu(2014)提出的具有O(N log N)的算法 复杂性,并且总是比自举更快。默认情况下,pROC 将尽可能选择DeLong方法。

    幸运的是,Yandex Data School在其公共回购中实现了Fast DeLong:

    https://github.com/yandexdataschool/roc_comparison

    因此,本例中使用的DeLong实现归功于他们。 以下是通过DeLong获取CI的方法:

    #!/usr/bin/env python3
    # -*- coding: utf-8 -*-
    """
    Created on Tue Nov  6 10:06:52 2018
    
    @author: yandexdataschool
    
    Original Code found in:
    https://github.com/yandexdataschool/roc_comparison
    
    updated: Raul Sanchez-Vazquez
    """
    
    import numpy as np
    import scipy.stats
    from scipy import stats
    
    # AUC comparison adapted from
    # https://github.com/Netflix/vmaf/
    def compute_midrank(x):
        """Computes midranks.
        Args:
           x - a 1D numpy array
        Returns:
           array of midranks
        """
        J = np.argsort(x)
        Z = x[J]
        N = len(x)
        T = np.zeros(N, dtype=np.float)
        i = 0
        while i < N:
            j = i
            while j < N and Z[j] == Z[i]:
                j += 1
            T[i:j] = 0.5*(i + j - 1)
            i = j
        T2 = np.empty(N, dtype=np.float)
        # Note(kazeevn) +1 is due to Python using 0-based indexing
        # instead of 1-based in the AUC formula in the paper
        T2[J] = T + 1
        return T2
    
    
    def compute_midrank_weight(x, sample_weight):
        """Computes midranks.
        Args:
           x - a 1D numpy array
        Returns:
           array of midranks
        """
        J = np.argsort(x)
        Z = x[J]
        cumulative_weight = np.cumsum(sample_weight[J])
        N = len(x)
        T = np.zeros(N, dtype=np.float)
        i = 0
        while i < N:
            j = i
            while j < N and Z[j] == Z[i]:
                j += 1
            T[i:j] = cumulative_weight[i:j].mean()
            i = j
        T2 = np.empty(N, dtype=np.float)
        T2[J] = T
        return T2
    
    
    def fastDeLong(predictions_sorted_transposed, label_1_count, sample_weight):
        if sample_weight is None:
            return fastDeLong_no_weights(predictions_sorted_transposed, label_1_count)
        else:
            return fastDeLong_weights(predictions_sorted_transposed, label_1_count, sample_weight)
    
    
    def fastDeLong_weights(predictions_sorted_transposed, label_1_count, sample_weight):
        """
        The fast version of DeLong's method for computing the covariance of
        unadjusted AUC.
        Args:
           predictions_sorted_transposed: a 2D numpy.array[n_classifiers, n_examples]
              sorted such as the examples with label "1" are first
        Returns:
           (AUC value, DeLong covariance)
        Reference:
         @article{sun2014fast,
           title={Fast Implementation of DeLong's Algorithm for
                  Comparing the Areas Under Correlated Receiver Oerating Characteristic Curves},
           author={Xu Sun and Weichao Xu},
           journal={IEEE Signal Processing Letters},
           volume={21},
           number={11},
           pages={1389--1393},
           year={2014},
           publisher={IEEE}
         }
        """
        # Short variables are named as they are in the paper
        m = label_1_count
        n = predictions_sorted_transposed.shape[1] - m
        positive_examples = predictions_sorted_transposed[:, :m]
        negative_examples = predictions_sorted_transposed[:, m:]
        k = predictions_sorted_transposed.shape[0]
    
        tx = np.empty([k, m], dtype=np.float)
        ty = np.empty([k, n], dtype=np.float)
        tz = np.empty([k, m + n], dtype=np.float)
        for r in range(k):
            tx[r, :] = compute_midrank_weight(positive_examples[r, :], sample_weight[:m])
            ty[r, :] = compute_midrank_weight(negative_examples[r, :], sample_weight[m:])
            tz[r, :] = compute_midrank_weight(predictions_sorted_transposed[r, :], sample_weight)
        total_positive_weights = sample_weight[:m].sum()
        total_negative_weights = sample_weight[m:].sum()
        pair_weights = np.dot(sample_weight[:m, np.newaxis], sample_weight[np.newaxis, m:])
        total_pair_weights = pair_weights.sum()
        aucs = (sample_weight[:m]*(tz[:, :m] - tx)).sum(axis=1) / total_pair_weights
        v01 = (tz[:, :m] - tx[:, :]) / total_negative_weights
        v10 = 1. - (tz[:, m:] - ty[:, :]) / total_positive_weights
        sx = np.cov(v01)
        sy = np.cov(v10)
        delongcov = sx / m + sy / n
        return aucs, delongcov
    
    
    def fastDeLong_no_weights(predictions_sorted_transposed, label_1_count):
        """
        The fast version of DeLong's method for computing the covariance of
        unadjusted AUC.
        Args:
           predictions_sorted_transposed: a 2D numpy.array[n_classifiers, n_examples]
              sorted such as the examples with label "1" are first
        Returns:
           (AUC value, DeLong covariance)
        Reference:
         @article{sun2014fast,
           title={Fast Implementation of DeLong's Algorithm for
                  Comparing the Areas Under Correlated Receiver Oerating
                  Characteristic Curves},
           author={Xu Sun and Weichao Xu},
           journal={IEEE Signal Processing Letters},
           volume={21},
           number={11},
           pages={1389--1393},
           year={2014},
           publisher={IEEE}
         }
        """
        # Short variables are named as they are in the paper
        m = label_1_count
        n = predictions_sorted_transposed.shape[1] - m
        positive_examples = predictions_sorted_transposed[:, :m]
        negative_examples = predictions_sorted_transposed[:, m:]
        k = predictions_sorted_transposed.shape[0]
    
        tx = np.empty([k, m], dtype=np.float)
        ty = np.empty([k, n], dtype=np.float)
        tz = np.empty([k, m + n], dtype=np.float)
        for r in range(k):
            tx[r, :] = compute_midrank(positive_examples[r, :])
            ty[r, :] = compute_midrank(negative_examples[r, :])
            tz[r, :] = compute_midrank(predictions_sorted_transposed[r, :])
        aucs = tz[:, :m].sum(axis=1) / m / n - float(m + 1.0) / 2.0 / n
        v01 = (tz[:, :m] - tx[:, :]) / n
        v10 = 1.0 - (tz[:, m:] - ty[:, :]) / m
        sx = np.cov(v01)
        sy = np.cov(v10)
        delongcov = sx / m + sy / n
        return aucs, delongcov
    
    
    def calc_pvalue(aucs, sigma):
        """Computes log(10) of p-values.
        Args:
           aucs: 1D array of AUCs
           sigma: AUC DeLong covariances
        Returns:
           log10(pvalue)
        """
        l = np.array([[1, -1]])
        z = np.abs(np.diff(aucs)) / np.sqrt(np.dot(np.dot(l, sigma), l.T))
        return np.log10(2) + scipy.stats.norm.logsf(z, loc=0, scale=1) / np.log(10)
    
    
    def compute_ground_truth_statistics(ground_truth, sample_weight):
        assert np.array_equal(np.unique(ground_truth), [0, 1])
        order = (-ground_truth).argsort()
        label_1_count = int(ground_truth.sum())
        if sample_weight is None:
            ordered_sample_weight = None
        else:
            ordered_sample_weight = sample_weight[order]
    
        return order, label_1_count, ordered_sample_weight
    
    
    def delong_roc_variance(ground_truth, predictions, sample_weight=None):
        """
        Computes ROC AUC variance for a single set of predictions
        Args:
           ground_truth: np.array of 0 and 1
           predictions: np.array of floats of the probability of being class 1
        """
        order, label_1_count, ordered_sample_weight = compute_ground_truth_statistics(
            ground_truth, sample_weight)
        predictions_sorted_transposed = predictions[np.newaxis, order]
        aucs, delongcov = fastDeLong(predictions_sorted_transposed, label_1_count, ordered_sample_weight)
        assert len(aucs) == 1, "There is a bug in the code, please forward this to the developers"
        return aucs[0], delongcov
    
    
    alpha = .95
    y_pred = np.array([0.21, 0.32, 0.63, 0.35, 0.92, 0.79, 0.82, 0.99, 0.04])
    y_true = np.array([0,    1,    0,    0,    1,    1,    0,    1,    0   ])
    
    auc, auc_cov = delong_roc_variance(
        y_true,
        y_pred)
    
    auc_std = np.sqrt(auc_cov)
    lower_upper_q = np.abs(np.array([0, 1]) - (1 - alpha) / 2)
    
    ci = stats.norm.ppf(
        lower_upper_q,
        loc=auc,
        scale=auc_std)
    
    ci[ci > 1] = 1
    
    print('AUC:', auc)
    print('AUC COV:', auc_cov)
    print('95% AUC CI:', ci)
    

    输出:

    AUC: 0.8
    AUC COV: 0.028749999999999998
    95% AUC CI: [0.46767194, 1.]
    

    我还检查了此实现是否与 中华人民共和国 从中获得的结果 R :

    library(pROC)
    
    y_true = c(0,    1,    0,    0,    1,    1,    0,    1,    0)
    y_pred = c(0.21, 0.32, 0.63, 0.35, 0.92, 0.79, 0.82, 0.99, 0.04)
    
    # Build a ROC object and compute the AUC
    roc = roc(y_true, y_pred)
    roc
    

    输出:

    Call:
    roc.default(response = y_true, predictor = y_pred)
    
    Data: y_pred in 5 controls (y_true 0) < 4 cases (y_true 1).
    Area under the curve: 0.8
    

    然后

    # Compute the Confidence Interval
    ci(roc)
    

    输出

    95% CI: 0.4677-1 (DeLong)