代码之家  ›  专栏  ›  技术社区  ›  Giampaolo Levorato

将statsmodels summary()导出到.png

  •  1
  • Giampaolo Levorato  · 技术社区  · 10 月前

    我训练了一个 glm 如下:

        fitGlm = smf.glm( listOfInModelFeatures,
              family=sm.families.Binomial(),data=train, freq_weights = train['sampleWeight']).fit()
    

    结果看起来不错:

    print(fitGlm.summary())
    
                     Generalized Linear Model Regression Results                  
    ==============================================================================
    Dep. Variable:                 Target   No. Observations:              1065046
    Model:                            GLM   Df Residuals:               4361437.81
    Model Family:                Binomial   Df Model:                            7
    Link Function:                  Logit   Scale:                          1.0000
    Method:                          IRLS   Log-Likelihood:            -6.0368e+05
    Date:                Sun, 25 Aug 2024   Deviance:                   1.2074e+06
    Time:                        09:03:54   Pearson chi2:                 4.12e+06
    No. Iterations:                     8   Pseudo R-squ. (CS):             0.1716
    Covariance Type:            nonrobust                                         
    ===========================================================================================
                                  coef    std err          z      P>|z|      [0.025      0.975]
    -------------------------------------------------------------------------------------------
    Intercept                   3.2530      0.003   1074.036      0.000       3.247       3.259
    feat1                       0.6477      0.004    176.500      0.000       0.641       0.655
    feat2                       0.3939      0.006     71.224      0.000       0.383       0.405
    feat3                       0.1990      0.007     28.294      0.000       0.185       0.213
    feat4                       0.4932      0.009     54.614      0.000       0.476       0.511
    feat5                       0.4477      0.005     90.323      0.000       0.438       0.457
    feat6                       0.3031      0.005     57.572      0.000       0.293       0.313
    feat7                       0.3711      0.004     87.419      0.000       0.363       0.379
    ===========================================================================================
    

    然后我尝试导出 summary() 进入之内 .png 如这里所建议的:

    Python: How to save statsmodels results as image file?

    所以,我写了这段代码:

        fig, ax = plt.subplots(figsize=(16, 8))
        summary = []
        fitGlm.summary(print_fn=lambda x: summary.append(x))
        summary = '\n'.join(summary)
        ax.text(0.01, 0.05, summary, fontfamily='monospace', fontsize=12)
        ax.axis('off')
        plt.tight_layout()
        plt.savefig('output.png', dpi=300, bbox_inches='tight')
    

    但我得到了这个错误:

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    Cell In[57], line 57
         55 fig, ax = plt.subplots(figsize=(16, 8))
         56 summary = []
    ---> 57 fitGlm.summary(print_fn=lambda x: summary.append(x))
         58 summary = '\n'.join(summary)
         59 ax.text(0.01, 0.05, summary, fontfamily='monospace', fontsize=12)
    
    TypeError: GLMResults.summary() got an unexpected keyword argument 'print_fn'
    

    看起来像 print_fn 不被statsmodels识别?

    有人能帮帮我吗?

    1 回复  |  直到 10 月前
        1
  •  1
  •   DanielL    10 月前

    我已经设置了一个测试,看看在哪里可以使用print_fn。我还检查了最后一个问题发布的解决方案,但在文档中找不到print_fn。

    我试图转换为制表,以便将摘要保存为png格式:

    import matplotlib.pyplot as plt
    import pandas as pd
    
    # Convert the summary table to a pandas DataFrame
    # change tables [0] to [1] to get the second table
    summary_df = pd.read_html(model.summary().tables[0].as_html(), header=0, index_col=0)[0]
    
    # Get the headers
    headers = summary_df.columns.tolist()
    
    # Convert the DataFrame to a list of lists and add the headers
    summary_list = [headers] + summary_df.values.tolist()
    
    # Create a new figure
    fig, ax = plt.subplots()
    
    # Remove the axes
    ax.axis('off')
    
    # Add a table to the figure
    table = plt.table(cellText=summary_list, loc='center')
    
    # Auto scale the table
    table.auto_set_font_size(False)
    table.set_fontsize(10)
    table.scale(1, 1.5)
    
    # Save the figure as a PNG file
    plt.savefig('summary2.png', dpi=300, bbox_inches='tight')
    

    在我看来,将数据保存为png是一种非常不寻常的情况。它阻止用户共享信息。有一些选项可以将摘要导出到csv和latex。 如果您手动执行此操作,我建议导出为csv并复制粘贴为图像。或者另存为txt,甚至截图。

    供参考:

    model.summary().as_csv()
    # save as csv
    with open('summary.csv', 'w') as file:
        file.write(model.summary().as_csv())
    

    text = model.summary().as_text()
    
    # save to txt
    with open('summary.txt', 'w') as file:
        file.write(text)