代码之家  ›  专栏  ›  技术社区  ›  Bastien

用xgboost建立二元回归模型

  •  2
  • Bastien  · 技术社区  · 7 年前

    我想用 xgboost 为了制作一个tweedie模型,但是我得到了一个模糊的错误消息。

    下面是一个可重复的示例:

    准备数据:

    library(xgboost)
    library(dplyr)
    
    set.seed(123)
    xx <- rpois(5000, 0.02)
    xx[xx>0] <- rgamma(sum(xx>0), 50)
    
    yy <- matrix(rnorm(15000), 5000,3, dimnames = list(1:5000, c("a", "b", "c")))
    
    train_test <- sample(c(0,1), 5000, replace = T)
    

    准备xgboost,重要的是: objective = 'reg:tweedie' , eval_metric = "tweedie-nloglik" tweedie_variance_power = 1.2 :

    dtrain <- xgb.DMatrix(
      data = yy %>% subset(train_test == 0),
      label = xx %>% subset(train_test == 0)
    )
    
    dtest <- xgb.DMatrix(
      data = yy %>% subset(train_test == 1),
      label = xx %>% subset(train_test == 1)
    )
    
    watchlist <- list(eval = dtest, train = dtrain)
    
    param <- list(max.depth = 2,
                  eta = 0.3,
                  nthread = 1,
                  silent = 1,
                  objective = 'reg:tweedie',
                  eval_metric = "tweedie-nloglik",
                  tweedie_variance_power = 1.2)
    

    最后调用xgboost:

    resBoost <- xgb.train(params = param, data=dtrain, nrounds = 20, watchlist=watchlist)
    

    这就产生了一个模糊的错误消息:

    Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj) :
      [17:59:18] amalgamation/../src/metric/elementwise_metric.cc:168: Check failed: param != nullptr tweedie-nloglik must be in formattweedie-nloglik@rho
    
    Stack trace returned 10 entries:
    [bt] (0) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(dmlc::StackTrace[abi:cxx11]()+0x1bc) [0x7f1f0ce742ac]
    [bt] (1) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f1f0ce74e88]
    [bt] (2) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(xgboost::metric::EvalTweedieNLogLik::EvalTweedieNLogLik(char const*)+0x1eb) [0x7f1f0cea00db]
    [bt] (3) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(+0x68ef1) [0x7f1f0ce78ef1]
    [bt] (4) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(xgboost::Metric::Create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x263) [0x7f1f0ce7ede3]
    [bt] (5) /usr/local/lib/R/site-library/xgboost/libs/xgboost.so(xgboost::LearnerImpl::Configure(std::vector<std::pair
    

    这个问题似乎与参数有关 eval_metric=“特威迪·恩洛格利克” 因为如果我改变 eval_metric logloss 它通过了:

    param$eval_metric <- "logloss"
    resBoost <- xgb.train(params = param, data=dtrain, nrounds = 20, watchlist=watchlist)
    [1]     eval-logloss:0.634391   train-logloss:0.849734
    [2]     eval-logloss:0.634391   train-logloss:0.849734
    ...
    

    知道如何使用 eval_metric=“特威迪·恩洛格利克” 参数在我看来是最合适的?谢谢

    1 回复  |  直到 7 年前
        1
  •  3
  •   Eran Moshe    7 年前

    TL;博士 :感谢 弗兰斯罗登堡 评论: use eval_metric="tweedie-nloglik@1.2

    我在看tweedie eval的实现(我甚至不知道tweedie是什么)和 following link

    特威迪:

    struct EvalTweedieNLogLik: public EvalEWiseBase<EvalTweedieNLogLik> {
      explicit EvalTweedieNLogLik(const char* param) {
        CHECK(param != nullptr)
            << "tweedie-nloglik must be in format tweedie-nloglik@rho";
        rho_ = atof(param);
        CHECK(rho_ < 2 && rho_ >= 1)
            << "tweedie variance power must be in interval [1, 2)";
        std::ostringstream os;
        os << "tweedie-nloglik@" << rho_;
        name_ = os.str();
      }
      const char *Name() const override {
        return name_.c_str();
      }
      inline bst_float EvalRow(bst_float y, bst_float p) const {
        bst_float a = y * std::exp((1 - rho_) * std::log(p)) / (1 - rho_);
        bst_float b = std::exp((2 - rho_) * std::log(p)) / (2 - rho_);
        return -a + b;
      }
     protected:
      std::string name_;
      bst_float rho_;
    };
    

    对数损失:

    struct EvalLogLoss : public EvalEWiseBase<EvalLogLoss> {
      const char *Name() const override {
        return "logloss";
      }
      inline bst_float EvalRow(bst_float y, bst_float py) const {
        const bst_float eps = 1e-16f;
        const bst_float pneg = 1.0f - py;
        if (py < eps) {
          return -y * std::log(eps) - (1.0f - y)  * std::log(1.0f - eps);
        } else if (pneg < eps) {
          return -y * std::log(1.0f - eps) - (1.0f - y)  * std::log(eps);
        } else {
          return -y * std::log(py) - (1.0f - y) * std::log(pneg);
        }
      }
    };
    

    看起来像 EvalTweedieNLogLik 应该获取一个名为 param . 看起来你也得到了准确的线条:

    CHECK(param != nullptr)
        << "tweedie-nloglik must be in format tweedie-nloglik@rho";
    

    当我把它和 EvalLogLoss公司 相关性的区别在于它不需要变量,这就是它工作的原因。

    感谢@Frans Rodenburg的评论,我一直在搜索并阅读如何使用它的示例 here .

    使用 eval_metric="tweedie-nloglik@1.2

    当我第一次从xgboost文档中阅读这些行时,我也犯了错误:

    tweedie-nloglik:tweedie回归的负对数似然(在a tweedie_variance_power参数的指定值)

    它可能只与python相关。

    推荐文章