Welcome to MLink Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
267 views
in Technique[技术] by (71.8m points)

javascript - Trying to scrape websites using puppeteer and getting back empty objects

I began learning puppeteer today and I ran into a problem. I was trying to create a covid tracker and I wanted to scrape from worldometers. But when I try to get back information it returns an array with empty objects. The number of objects matches to the number of tags with the same class but it doesn't show any information. here is my code

const puppeteer = require("puppeteer")
async function getCovidCases(){
    const browser = await puppeteer.launch({
        defaultViewport: null,
        headless: false,
        slowMo: 250
    })
    const page = await browser.newPage()
    const url = "https://www.worldometers.info/coronavirus/#countries"
    await page.goto(url, {waitUntil: 'networkidle0'})
    await page.waitForSelector(".navbar-nav", {visible: true})
    const results = await page.$$eval(".navbar-nav", rows => {
        return rows
    })
    await console.log(results)
}
getCovidCases()

Does Anyone Know What To Do?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Based on the selector I assume you are in this step interested in the navbar items.

    const results = await page.$$eval(".navbar-nav", navBars => {
      return navBars.map(navBar => {
        const anchors = Array.from(navBar.getElementsByTagName('a'));
        return anchors.map(anchor => anchor.innerText);
      });
    })

This yields [ [ 'Coronavirus', 'Population' ] ] and might be useful for you.

Use $eval if you expect only one element and $$eval if you expect multiple elements. In the callback you have a reference to that dom element, but you cannot return it directly. If you console.log anything it won't show up in the nodejs terminal, but in the browsers terminal. What you return there will be send back to nodejs and it needs to be serializable (I think). What you get back from navBar will be converted to an empty object and is not what you want. That's why I map over it and convert it to a string (innerText).

If you want scrape other data, you should use another selector (.nav-bar).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to MLink Developer Q&A Community for programmer and developer-Open, Learning and Share
...