代码之家  ›  专栏  ›  技术社区  ›  andrey.shedko

木偶演员不识别链接

  •  0
  • andrey.shedko  · 技术社区  · 6 年前

    我想在表达式中获取一些HTML,但不知何故我出错了。

    Error: Evaluation failed: ReferenceError: link is not defined
        at __puppeteer_evaluation_script__:8:29
        at ExecutionContext.evaluateHandle (C:\Repositories\auto-grabber-server\node_modules\puppeteer\lib\ExecutionContext.js:124:13)
        at process._tickCallback (internal/process/next_tick.js:68:7)
      -- ASYNC --
        at ExecutionContext.<anonymous> (C:\Repositories\auto-grabber-server\node_modules\puppeteer\lib\helper.js:144:27)
        at ExecutionContext.evaluate (C:\Repositories\auto-grabber-server\node_modules\puppeteer\lib\ExecutionContext.js:58:31)
        at ExecutionContext.<anonymous> (C:\Repositories\auto-grabber-server\node_modules\puppeteer\lib\helper.js:145:23)
        at Frame.evaluate (C:\Repositories\auto-grabber-server\node_modules\puppeteer\lib\FrameManager.js:447:20)
        at process._tickCallback (internal/process/next_tick.js:68:7)
      -- ASYNC --
        at Frame.<anonymous> (C:\Repositories\auto-grabber-server\node_modules\puppeteer\lib\helper.js:144:27)
        at Page.evaluate (C:\Repositories\auto-grabber-server\node_modules\puppeteer\lib\Page.js:777:43)
        at Page.<anonymous> (C:\Repositories\auto-grabber-server\node_modules\puppeteer\lib\helper.js:145:23)
        at zrGrabber.StartGrabbingHtml (C:\Repositories\auto-grabber-server\grabbers\zr.grabber.js:52:40)
        at process._tickCallback (internal/process/next_tick.js:68:7)
    

    链接已经传递给startgrabbinghtml函数,但接下来我会提到错误。我想Async工作人员出了点问题,但是不能准确地得到什么。

    const puppeteer = require("puppeteer");
    let links = [];
    const Mongo = require('./../db/mongo');
    const zrLinks = [
        "https://www.zr.ru/stories/consultant/optimalno/",
        "https://www.zr.ru/news/avtomobili/",
        "https://www.zr.ru/stories/prezentaciya-car/new/"
    ];
    
    module.exports = class zrGrabber {
        async startGrabbingLinks() {
            try {
                for (let i = 0; i < zrLinks.length; i++) {
                    const browser = await puppeteer.launch();
                    const page = await browser.newPage();
                    await page.goto(zrLinks[i], {
                        waitUntil: 'load',
                        timeout: 0
                    });
                    const result = await page.evaluate(() => {
                        const links = document.querySelectorAll('div.head > h2 > a')
                        return [...links].map(link => link.href);
                    });
                    await page.close();
                    await browser.close();
                    links = [...links, ...result];
                }
                const db = new Mongo();
                for (let i = 0; i < links.length; i++) {
                    // if link already in database skip grabbing
                    const found = await db.findLink(links[i]);
                    if (found) {
                        continue;
                    }
                    // else grab and write link to database
                    await this.StartGrabbingHtml(links[i])
                }
            } catch (err) {
                console.log(err)
            }
        }
    
        async StartGrabbingHtml(link) {
            try {
                const browser = await puppeteer.launch();
                const page = await browser.newPage();
                await page.goto(link, {
                    waitUntil: 'load',
                    timeout: 0
                });
                const article = await page.evaluate(() => { // error throwing here
                    const date = document.querySelector('#storyDetailArticle > time').innerHTML;
                    const name = document.querySelector('#storyDetailArticle > h1').innerHTML;
                    const description = document.querySelector('#storyDetailArticle > div.stroy_announcement > h3').innerHTML;
                    const author = document.querySelector('#storyDetailArticle > div.announcement_author.story_author.no_preview > div').innerHTML;
                    const content = document.querySelector('#storyDetailArticle > div.stroy_content').innerHTML;
                    return {
                        source: link,
                        date: date,
                        name: name,
                        description: description,
                        author: author,
                        content: content
                    };
                });
                console.log(article)
                const db = new Mongo();
                await db.insertOne(article);
                await page.close();
                await browser.close();
            } catch (err) {
                console.log(err)
            }
        }
    }
    

    我在这里做错什么了?

    1 回复  |  直到 6 年前
        1
  •  1
  •   Thomas Dondorf    6 年前

    link page.evaluate

    await page.evaluate(link => {
        // ...
    }, link);
    
    推荐文章