代码之家 › 专栏 › 技术社区 › RustyShackleford

如何同步执行内存管理操作?

list-manipulation memory-management memory python-3.x python

0

RustyShackleford · 技术社区 · 7 年前

我有一个应用程序由于我认为列表中的数据太多而不断崩溃。

下面是调用api并将结果保存到列表中的代码。 lst1 是我传递给api的id列表。我必须考虑到如果http请求超时,或者我可以构建一个机制来清除附加的数据列表,我可以从 lst1号

import requests
import pandas as pd
import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup 
import time
from concurrent import futures

lst1=[1,2,3]

lst =[]

for i in lst1:
    url = 'urlId={}'.format(i)
    while True:
        try:
            xml_data1 = requests.get(url).text
            print(xml_data1)
            break
        except requests.exceptions.RequestException as e:
            print(e)
    lst.append(xml_data1)

我在想,如果我可以应用下面的函数将xml解压成一个数据帧并执行所需的数据帧操作,同时清除附加数据的列表 lst 它可以释放记忆。如果不是这样,我愿意接受任何建议,允许代码或应用程序不会因我认为列表中有太多xml数据而崩溃:

def create_dataframe(xml):
    soup = BeautifulSoup(xml, "xml")
    # Get Attributes from all nodes
    attrs = []
    for elm in soup():  # soup() is equivalent to soup.find_all()
        attrs.append(elm.attrs)
    # Since you want the data in a dataframe, it makes sense for each field to be a new row consisting of all the other node attributes
    fields_attribute_list = [x for x in attrs if 'Id' in x.keys()]
    other_attribute_list = [x for x in attrs if 'Id' not in x.keys() and x != {}]
    # Make a single dictionary with the attributes of all nodes except for the `Field` nodes.
    attribute_dict = {}
    for d in other_attribute_list:
        for k, v in d.items():
            attribute_dict.setdefault(k, v)
    # Update each field row with attributes from all other nodes.
    full_list = []
    for field in fields_attribute_list:
        field.update(attribute_dict)
        full_list.append(field)
    # Make Dataframe
    df = pd.DataFrame(full_list)
    return df


with futures.ThreadPoolExecutor() as executor:  # Or use ProcessPoolExecutor
    df_list = executor.map(create_dataframe, lst)

full_df = pd.concat(df_list)
print(full_df)


#final pivoted dataframe
final_df = pd.pivot_table(full_df, index='Id', columns='FieldTitle', values='Value', aggfunc='first').reset_index()

0 回复 | 直到 7 年前