代码之家 › 专栏 › 技术社区 › Jérôme

在wsgi/apache应用程序中强制每个IP的并发线程限制

mod-wsgi flask apache multithreading python

1

Jérôme · 技术社区 · 7 年前

我们正在运行一个flask应用程序,公开存储在数据库中的数据。它回报很多 503 错误。我的理解是,当达到最大并发线程数时,这些线程是由Apache生成的。

根本原因很可能是应用程序性能不佳,但在这个阶段,我们负担不起更多的开发时间,所以我正在寻找一个廉价的部署配置黑客来缓解这个问题。

数据提供程序正在高速发送数据。我相信他们的节目有很多 五百零三 然后尝试/抓住它们重试,直到成功。
数据消费者使用这个应用的速度要低得多,我希望他们不要被这些问题困扰。

我正在考虑限制每个提供者的IP并发访问的数量。他们可能会获得较低的吞吐量,但他们会像现在这样生活,这将使休闲消费者的生活更容易。

我确认了 mod_limitipconn 好像是用泰来做这个的。

mod limitipconn[…]允许管理员限制单个IP地址允许的同时请求数。

我想确定我了解它是如何工作的,以及如何设置限制的。

我一直认为由于wsgi设置,最多有5个同时连接: threads=5 . 但我读到了 Processes and Threading 我很困惑。

考虑到下面的配置,这些假设是否正确?

一次只运行应用程序的一个实例。
最多可以生成5个并发线程。
当处理5个请求时,如果第六个请求到达,客户机将得到 五百零三 .
将Apache级别的IP X.X.X.X.同时请求数限制为3将确保该IP只能使用这5个线程中的3个,而将2个留给其他IP。
提高wsgi-config中的线程数有助于在客户端之间共享连接池,方法是在速率限制中提供更大的粒度(对于4个提供程序中的每个提供程序,可以限制为3个,保留5个以上,总共17个),但即使服务器具有ID,也不会提高整体性能。le cores,因为 the Python GIL prevents several threads to run at the same time .
将线程数提高到100这样的高值可能会使请求更长,但会限制 五百零三 响应。如果客户机将自己的并发请求限制设置得不太高,甚至可能就足够了,如果不设置的话,我可以用 mod_limitipconn .
过多地增加线程数会使请求太长,以至于客户端将得到超时而不是 五百零三 这并不是更好。

下面的当前配置。不知道什么重要。

apachectl -V :

Server version: Apache/2.4.25 (Debian)
Server built:   2018-06-02T08:01:13
Server's Module Magic Number: 20120211:68
Server loaded:  APR 1.5.2, APR-UTIL 1.5.4
Compiled using: APR 1.5.2, APR-UTIL 1.5.4
Architecture:   64-bit
Server MPM:     event
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)

/etc/apache2/apache2.conf :

# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.
#
KeepAlive On

#
# MaxKeepAliveRequests: The maximum number of requests to allow
# during a persistent connection. Set to 0 to allow an unlimited amount.
# We recommend you leave this number high, for maximum performance.
#
MaxKeepAliveRequests 100

/etc/apache2/mods-available/mpm_worker.conf (但这不重要 event 更多,对吗?):

<IfModule mpm_worker_module>
        StartServers                     2
        MinSpareThreads          25
        MaxSpareThreads          75
        ThreadLimit                      64
        ThreadsPerChild          25
        MaxRequestWorkers         150
        MaxConnectionsPerChild   0
</IfModule>

/etc/apache2/sites-available/my_app.conf :

WSGIDaemonProcess my_app threads=5

2 回复 | 直到 6 年前

1

Fine 7 年前

我不想打扰他们 因此,将数据提供者的请求从数据消费者中分离出来(我不熟悉Apache,因此我不向您展示生产就绪的配置,而是一种总体方法):

<VirtualHost *>
    ServerName example.com

    WSGIDaemonProcess consumers user=user1 group=group1 threads=5
    WSGIDaemonProcess providers user=user1 group=group1 threads=5
    WSGIScriptAliasMatch ^/consumers_ulrs/.* /path_to_your_app/consumers.wsgi process-group=consumers
    WSGIScriptAliasMatch ^/providers_ulrs/.* /path_to_your_app/providers.wsgi process-group=providers

    ...

</VirtualHost>

通过限制每个IP的请求量,您可能会损害用户体验,但仍然无法解决您的问题。例如,请注意,由于NAT和ISP的工作方式,许多独立用户可能拥有相同的IP。

另外,很奇怪 ThreadsPerChild=25 但是 WSGIDaemonProcess my_app threads=5 . 您确定通过该配置,由Apache创建的所有线程都将被WSGi服务器使用吗?

2

0

Jérôme 6 年前

最后我采用了不同的方法。我在应用程序代码中添加了一个限制器来处理这个问题。

"""Concurrency requests limiter

Inspired by Flask-Limiter
"""

from collections import defaultdict
from threading import BoundedSemaphore
from functools import wraps

from flask import request
from werkzeug.exceptions import TooManyRequests


# From flask-limiter
def get_remote_address():
    """Get IP address for the current request (or 127.0.0.1 if none found)

    This won't work behind a proxy. See flask-limiter docs.
    """
    return request.remote_addr or '127.0.0.1'


class NonBlockingBoundedSemaphore(BoundedSemaphore):
    def __enter__(self):
        ret = self.acquire(blocking=False)
        if ret is False:
            raise TooManyRequests(
                'Only {} concurrent request(s) allowed'
                .format(self._initial_value))
        return ret


class ConcurrencyLimiter:

    def __init__(self, app=None, key_func=get_remote_address):
        self.app = app
        self.key_func = key_func
        if app is not None:
            self.init_app(app)

    def init_app(self, app):
        self.app = app
        app.extensions = getattr(app, 'extensions', {})
        app.extensions['concurrency_limiter'] = {
            'semaphores': defaultdict(dict),
        }

    def limit(self, max_concurrent_requests=1):
        def decorator(func):
            @wraps(func)
            def wrapper(*args, **kwargs):
                # Limiter not initialized
                if self.app is None:
                    return func(*args, **kwargs)
                identity = self.key_func()
                sema = self.app.extensions['concurrency_limiter'][
                    'semaphores'][func].setdefault(
                        identity,
                        NonBlockingBoundedSemaphore(max_concurrent_requests)
                    )
                with sema:
                    return func(*args, **kwargs)
            return wrapper
        return decorator


limiter = ConcurrencyLimiter()


def init_app(app):
    """Initialize limiter"""

    limiter.init_app(app)
    if app.config['AUTHENTICATION_ENABLED']:
        from h2g_platform_core.api.extensions.auth import get_identity
        limiter.key_func = get_identity

然后我需要做的就是把这个装饰应用到我的视图中:

@limiter.limit(1)  #Â One concurrent request by user
def get(...):
    ...

实际上,我只保护那些产生高流量的。

在应用程序代码中这样做很好,因为我可以为每个经过身份验证的用户而不是每个IP设置一个限制。

为此,我只需要替换默认值 get_remote_address 在里面 key_func 返回用户唯一标识的函数。

注意,这为每个视图函数设置了不同的限制。如果限制需要是全局的,可以用不同的方法实现。事实上,这会更简单。