代码之家  ›  专栏  ›  技术社区  ›  Yuseferi

如何将零碎的日志记录到日志存储中

  •  1
  • Yuseferi  · 技术社区  · 7 年前

    我已经在服务器上正确设置了麋鹿堆栈,并使用 蟒蛇 我可以用下面的代码片段将日志发送到logstash,一切正常。

    import logging
    import logstash
    import sys
    
    host = 'localhost'
    
    test_logger = logging.getLogger('python-logstash-logger')
    test_logger.setLevel(logging.INFO)
    # test_logger.addHandler(logstash.LogstashHandler(host, 5959, version=1))
    test_logger.addHandler(logstash.TCPLogstashHandler(host, 5000, version=1))
    
    test_logger.error('python-logstash: test logstash error message.')
    test_logger.info('python-logstash: test logstash info message.')
    test_logger.warning('python-logstash: test logstash warning message.')
    
    # add extra field to logstash message
    extra = {
        'test_string': 'python version: ' + repr(sys.version_info),
        'test_boolean': True,
        'test_dict': {'a': 1, 'b': 'c'},
        'test_float': 1.23,
        'test_integer': 123,
        'test_list': [1, 2, '3'],
    }
    test_logger.info('python-logstash: test extra fields', extra=extra)
    

    **下一步**我想把Logstash和Scrapy整合起来,

    这是我蜘蛛代码的一部分:

    # -*- coding: utf-8 -*-
    import scrapy
    import json
    import logging
    from scrapy.xlib.pydispatch import dispatcher
    from scrapy import signals
    from collections import defaultdict
    import time
    from ..helper import Helper
    from ..items import SampleItem
    import requests as py_request
    import logging
    import logstash
    import sys
    
    
    class SampleSpider(scrapy.Spider):
        name = 'sample'
        allowed_domains = []
        start_urls = ['https://www.sample.com/']
        duplicate_found = False
        counter = defaultdict(dict)
        cat = 0
        place_code = 0
        categories = {}
        logstash_logger = None
    
        def __init__(self, *args, **kwargs):
    
    
            self.logstash_logger = logging.getLogger('scrapy-logger')
            self.logstash_logger.setLevel(logging.INFO)
            self.logstash_logger.addHandler(logstash.TCPLogstashHandler('localhost', 5000, version=1))
            dispatcher.connect(self.spider_closed, signal=signals.spider_closed)
    
        def get_place_code(self):
            return self.place_code
    
        def set_place_code(self, value):
            self.place_code = value
    
        def start_requests(self):
            logging.info(":::>{0} Spider Starting".format(self.name))
            self.logstash_logger.info(":::>{0} Spider Starting".format(self.name))
            self.categories = Helper().get_categories();
            req_timestamp = str(time.time())[:-2]
            for cat in self.categories:
                self.counter[cat['id']] = 0
                logging.info(":::> Start crawling category = {0} ".format(cat['id']))
                self.logstash_logger.info(":::> Start crawling category = {0} ".format(cat['id']))
                start_url = 'https://www.sample.com?c=' + str(
                    cat['id'])
                logging.info(start_url)
                yield scrapy.Request(url=start_url,
                                     method="GET",
                                     callback=self.parse,
                                     meta={'cat': cat['id'], 'requestDateTime': 0, 'counter': 0}
                                     )
    
        def spider_closed(self, spider):
            logging.info(":::>********************************************************************")
            logging.info(":::>{0} Spider Finished.".format(self.name))
            self.logstash_logger.info(":::>{0} Spider Finished.".format(self.name))
    
            total = 0
            for cat_id, value in self.counter.items():
                logging.info("{0} items imported into {1} category".format(value, cat_id))
                self.logstash_logger.info("{0} items imported into {1} category".format(value, cat_id))
                total += value
            logging.info(":::>******** End Summary; Total : {0} items scraped ***********".format(total))
            self.logstash_logger.info(":::>******** End Summary; Total : {0} items scraped ***********".format(total))
    
        def parse(self, response):
         # do my parsing stuffs there
         self.logstash_logger.info('End of Data for category')
    

    我可以在Scrapyd日志中看到我的自定义日志,但没有发送到logstash

    2018-08-04 13:42:18 [root] INFO: :::> Start crawling category = 43614 
    2018-08-04 13:42:18 [scrapy-logger] INFO: :::> Start crawling category = 43614 
    

    我的问题是为什么它不把日志发送到logstash? 我怎样才能把这些废木料放进储藏室?

    1 回复  |  直到 7 年前
        1
  •  0
  •   Yuseferi    7 年前

    实际上,我已经做了99%,我只需要 破烂的 作为记录员

    def __init__(self, *args, **kwargs):
        self.logstash_logger = logging.getLogger('scrapy')
        self.logstash_logger.addHandler(logstash.TCPLogstashHandler('logstash', 5000, version=1))
        dispatcher.connect(self.spider_closed, signal=signals.spider_closed)
    

    我把答案贴出来,也许对同样情况下的另一个人有用。

    推荐文章