代码之家 › 专栏 › 技术社区 › Trenton McKinney ivirshup

如何提供现有数据帧的可复制副本?

jupyter-notebook dataframe pandas python-3.x python

2

Trenton McKinney ivirshup · 技术社区 · 7 年前

2018-09-18_reproducible_dataframe.ipynb

这个问题以前被标记为 How to make good reproducible pandas examples .
- 如果您需要制作合成(假)数据以供共享,请转到该问题。
- 其他问题和相关答案包括如何创造可复制的数据帧。
- 如何复制 具有的现有数据帧 .to_clipboard .to\剪贴板 .

这似乎是一个显而易见的问题。然而,许多询问大熊猫问题的用户都是新手,缺乏经验。
提问的一个关键部分是 How to create a Minimal, Complete, and Verifiable example ,它解释了“什么”和“为什么”,而不是“如何”。

例如 OP

想想这个,就像你已经加载了一个文件,只需要共享一点,就可以重现错误。

import pandas as pd
import numpy as np
from datetime import datetime
from string import ascii_lowercase as al

np.random.seed(365)
rows = 15
cols = 2
data = np.random.randint(0, 10, size=(rows, cols))
index = pd.bdate_range(datetime.today(), freq='d', periods=rows)

df = pd.DataFrame(data=data, index=index, columns=list(al[:cols]))

            a  b
2020-07-30  2  4
2020-07-31  1  5
2020-08-01  2  2
2020-08-02  9  8
2020-08-03  4  0
2020-08-04  3  3
2020-08-05  7  7
2020-08-06  7  0
2020-08-07  8  4
2020-08-08  3  2
2020-08-09  6  2
2020-08-10  6  8
2020-08-11  9  6
2020-08-12  1  6
2020-08-13  5  7

数据帧后面可能会跟一些其他代码,这些代码会产生错误或不会产生所需的结果

询问堆栈溢出问题时应提供的内容。

一个写得很好的连贯性问题 formatted text
产生错误-as的代码 格式化文本
整个错误回溯为
潜在的,当前的;预期结果-as 格式化文本 ,或者是图像,如果是情节的话
以易于使用的形式 -作为 格式化文本

不要添加数据作为此问题的答案。

2 回复 | 直到 5 年前

1

26

cs95 abhishek58g 5 年前

第一:不要发布图片资料,只请文字

第二:不要在评论部分粘贴数据,或者作为答案,而是编辑你的问题

如何快速提供数据帧中的样本数据

回答这个问题的方法不止一种。然而,这个答案并不是一个详尽的解决方案。它提供了最简单的方法。

提供到可共享数据集的链接(可能在GitHub上,也可能在Google上提供共享文件)。如果它是一个大的数据集,并且目标是优化某些方法,那么这一点特别有用。缺点是这些数据在将来可能不再可用,这降低了该职位的效益。
提供输出 df.head(10).to_clipboard(sep=',', index=True)

代码:

提供输出 pandas.DataFrame.to_clipboard

df.head(10).to_clipboard(sep=',', index=True)

注意 :执行前一行代码时,不会出现输出。
将剪贴板粘贴到 code block 在堆栈溢出问题中

,a,b
2020-07-30,2,4
2020-07-31,1,5
2020-08-01,2,2
2020-08-02,9,8
2020-08-03,4,0
2020-08-04,3,3
2020-08-05,7,7
2020-08-06,7,0
2020-08-07,8,4
2020-08-08,3,2

有人试图回答您的问题,可以将其复制到剪贴板,然后:

df = pd.read_clipboard(sep=',')

`.head(10)`

使用 .iloc
下面的示例选择第3-11行和所有列

df.iloc[3:12, :].to_clipboard(sep=',')

其他参考资料 `pd.read_clipboard`

.to_clipboard() 行不通
使用 .to_dict()

# if you have a datetime column, convert it to a str
df['date'] = df['date'].astype('str')

# if you have a datetime index, convert it to a str
df.index = df.index.astype('str')

# output to a dict
df.head(10).to_dict(orient='index')

# which will look like
{'2020-07-30': {'a': 2, 'b': 4},
 '2020-07-31': {'a': 1, 'b': 5},
 '2020-08-01': {'a': 2, 'b': 2},
 '2020-08-02': {'a': 9, 'b': 8},
 '2020-08-03': {'a': 4, 'b': 0},
 '2020-08-04': {'a': 3, 'b': 3},
 '2020-08-05': {'a': 7, 'b': 7},
 '2020-08-06': {'a': 7, 'b': 0},
 '2020-08-07': {'a': 8, 'b': 4},
 '2020-08-08': {'a': 3, 'b': 2}}

# copy the previous dict and paste into a code block on SO
# the dict can be converted to a dataframe with 
# df = pd.DataFrame.from_dict(d, orient='index')  # d is the name of the dict
# convert datatime column or index back to datetime

使用 .to_dict()
- How to efficiently build and share a sample dataframe?
- How to make good reproducible pandas examples

2

1

Yuca 7 年前

如果你这样做 print(df.head(20)) pd.read_clipboard() 将数据加载到数据帧中。这种方法适用于发布在 pandas multiindex