代码之家  ›  专栏  ›  技术社区  ›  RustyShackleford

如何同步执行内存管理操作?

  •  0
  • RustyShackleford  · 技术社区  · 7 年前

    我有一个应用程序由于我认为列表中的数据太多而不断崩溃。

    下面是调用api并将结果保存到列表中的代码。 lst1 是我传递给api的id列表。我必须考虑到如果http请求超时,或者我可以构建一个机制来清除附加的数据列表,我可以从 lst1号

    import requests
    import pandas as pd
    import xml.etree.ElementTree as ET
    from bs4 import BeautifulSoup 
    import time
    from concurrent import futures
    
    lst1=[1,2,3]
    
    lst =[]
    
    for i in lst1:
        url = 'urlId={}'.format(i)
        while True:
            try:
                xml_data1 = requests.get(url).text
                print(xml_data1)
                break
            except requests.exceptions.RequestException as e:
                print(e)
        lst.append(xml_data1)
    

    我在想,如果我可以应用下面的函数将xml解压成一个数据帧并执行所需的数据帧操作,同时清除附加数据的列表 lst 它可以释放记忆。如果不是这样,我愿意接受任何建议,允许代码或应用程序不会因我认为列表中有太多xml数据而崩溃:

    def create_dataframe(xml):
        soup = BeautifulSoup(xml, "xml")
        # Get Attributes from all nodes
        attrs = []
        for elm in soup():  # soup() is equivalent to soup.find_all()
            attrs.append(elm.attrs)
        # Since you want the data in a dataframe, it makes sense for each field to be a new row consisting of all the other node attributes
        fields_attribute_list = [x for x in attrs if 'Id' in x.keys()]
        other_attribute_list = [x for x in attrs if 'Id' not in x.keys() and x != {}]
        # Make a single dictionary with the attributes of all nodes except for the `Field` nodes.
        attribute_dict = {}
        for d in other_attribute_list:
            for k, v in d.items():
                attribute_dict.setdefault(k, v)
        # Update each field row with attributes from all other nodes.
        full_list = []
        for field in fields_attribute_list:
            field.update(attribute_dict)
            full_list.append(field)
        # Make Dataframe
        df = pd.DataFrame(full_list)
        return df
    
    
    with futures.ThreadPoolExecutor() as executor:  # Or use ProcessPoolExecutor
        df_list = executor.map(create_dataframe, lst)
    
    full_df = pd.concat(df_list)
    print(full_df)
    
    
    #final pivoted dataframe
    final_df = pd.pivot_table(full_df, index='Id', columns='FieldTitle', values='Value', aggfunc='first').reset_index()
    
    0 回复  |  直到 7 年前