代码之家  ›  专栏  ›  技术社区  ›  Davide Lorino

用pandas将suds对象转换为数据帧

  •  1
  • Davide Lorino  · 技术社区  · 7 年前

    我有一个列表,如下所示:

    `[(deliveryObject){
       id = "0bf003ee0000000000000000000002a11cb6"
       start = 2019-01-02 09:30:00
       messageId = "68027b94b892396ed29581cde9ad07ff"
       status = "sent"
       type = "normal"
       }, (deliveryObject){
       id = "0bf0BE3ABFFDF8744952893782139E82793B"
       start = 2018-12-29 23:00:00
       messageId = "0bc403eb0000000000000000000000113404"
       status = "sent"
       type = "transactional"
     }, (deliveryObject){
       id = "0bf0702D03CB42D848CBB0B0AF023A87FA65"
       start = 2018-12-29 23:00:00
       messageId = "0bc403eb0000000000000000000000113403"
       status = "sent"
       type = "transactional"
       }
    ]`
    

    当我呼唤 type() python告诉我这是一个列表。

    当我用 pd.DataFrame(df) ,结果如下:

    my list as a dataframe

    有人能帮我吗?数据帧应该有列名称,如“id”、“start”、“messageid”等,但它们只是作为每个观察的第一个元素出现,列名称显示为0、1、2等。

    感谢您的帮助,谢谢!

    3 回复  |  直到 7 年前
        1
  •  1
  •   fcsr    7 年前

    如果这是针对Bronto的,并且正在使用SOAP和SUDS实现。那么deliverobject只是一个suds对象。

    你可以做到

    from suds.client import Client
    
    list_of_deliveryObjects = [(deliveryObject){
       id = "0bf003ee0000000000000000000002a11cb6"
       start = 2019-01-02 09:30:00
       messageId = "68027b94b892396ed29581cde9ad07ff"
       status = "sent"
       type = "normal"
       }, (deliveryObject){
       id = "0bf0BE3ABFFDF8744952893782139E82793B"
       start = 2018-12-29 23:00:00
       messageId = "0bc403eb0000000000000000000000113404"
       status = "sent"
       type = "transactional"
     }, (deliveryObject){
       id = "0bf0702D03CB42D848CBB0B0AF023A87FA65"
       start = 2018-12-29 23:00:00
       messageId = "0bc403eb0000000000000000000000113403"
       status = "sent"
       type = "transactional"
       }
    ]
    
    
    data = [Client.dict(suds_object) for suds_object in list_of_deliveryObjects]
    df = pd.DataFrame(data)
    
        2
  •  1
  •   vladsiv    7 年前

    好吧,这个看起来不好看,但它起作用了。 我将您的列表转换为字符串:

    import re
    import pandas as pd
    
    x = """[(deliveryObject){
       id = "0bf003ee0000000000000000000002a11cb6"
       start = 2019-01-02 09:30:00
       messageId = "68027b94b892396ed29581cde9ad07ff"
       status = "sent"
       type = "normal"
       }, (deliveryObject){
       id = "0bf0BE3ABFFDF8744952893782139E82793B"
       start = 2018-12-29 23:00:00
       messageId = "0bc403eb0000000000000000000000113404"
       status = "sent"
       type = "transactional"
     }, (deliveryObject){
       id = "0bf0702D03CB42D848CBB0B0AF023A87FA65"
       start = 2018-12-29 23:00:00
       messageId = "0bc403eb0000000000000000000000113403"
       status = "sent"
       type = "transactional"
       }
    ]"""
    

    然后我用regex列出了一些字典:

    a = re.sub(' =', ':', x)
    a = re.sub('\(deliveryObject\)', '', a)
    
    for x in ['id', 'start', 'messageId', 'status', 'type']:
        a = re.sub(x, '\''+x+'\'', a)
    
    a = re.sub("(?<=[\"0])\n(?= +?[\'])", '\n,', a)
    a = re.sub('(?<=[0])\n(?=,)', '\"\n', a)
    a = re.sub('(?<=[:]) (?=[0-9])', ' \"', a)
    a = re.sub('(?<= )\"(?=[\w])', '[\"', a)
    a = re.sub('(?<=[\w])\"(?=\n)', '\"]', a)
    

    现在您有一个字典列表。第一排像这样

    list_of_dict = eval(a)
    df = pd.DataFrame(list_of_dict[0])
    print(df.head())
    
                                         id                start                         messageId status    type
    0  0bf003ee0000000000000000000002a11cb6  2019-01-02 09:30:00  68027b94b892396ed29581cde9ad07ff   sent  normal
    

    从字典列表中添加其余字典。

    拜托,请随意改善我的雷吉克斯,我知道它看起来很糟糕。

        3
  •  0
  •   IftahP    7 年前

    我这样做了:

    import pandas as pd
    lst =[{
       'id':"0bf003ee0000000000000000000002a11cb6",
       'start' : "2019-01-02 09:30:00",
       'messageId': "68027b94b892396ed29581cde9ad07ff",
       'status' : "sent",
       'type' : "normal"
       },{
       'id' :  "0bf0BE3ABFFDF8744952893782139E82793B",
       'start' :  "2018-12-29 23:00:00",
       'messageId' :  "0bc403eb0000000000000000000000113404",
       'status' :  "sent",
       'type' :  "transactional"
     }, {
       'id' :  "0bf0702D03CB42D848CBB0B0AF023A87FA65",
       'start' :  "2018-12-29 23:00:00",
       'messageId' :  "0bc403eb0000000000000000000000113403",
       'status' :  "sent",
       'type' :  "transactional"
       }]
    df = pd.DataFrame(lst)
    df
    

    得到了这个(也见附图):

        id  messageId   start   status  type
    0   0bf003ee0000000000000000000002a11cb6    68027b94b892396ed29581cde9ad07ff    2019-01-02 09:30:00 sent    normal
    1   0bf0BE3ABFFDF8744952893782139E82793B    0bc403eb0000000000000000000000113404    2018-12-29 23:00:00 sent    transactional
    2   0bf0702D03CB42D848CBB0B0AF023A87FA65    0bc403eb0000000000000000000000113403    2018-12-29 23:00:00 sent    transactional
    

    Result