基于 Node.js 的爬虫库Puppeteer

作者：System 时间：2024年08月17日分类：所有,nodejs 字数：644

这篇文章距离上次修改已过269天，其中的内容可能已经有所变动。

以下是一个使用puppeteer库进行简单网页抓取的示例代码：




const puppeteer = require('puppeteer');
 
async function fetchHTML(url) {
    // 启动浏览器
    const browser = await puppeteer.launch();
    // 打开新页面
    const page = await browser.newPage();
    // 导航到URL
    await page.goto(url);
    // 获取页面HTML内容
    const html = await page.content();
    // 关闭浏览器
    await browser.close();
    // 返回HTML内容
    return html;
}
 
// 使用示例
fetchHTML('https://example.com').then(html => console.log(html));

这段代码首先引入puppeteer库，然后定义了一个异步函数fetchHTML，该函数接受一个URL作为参数，使用puppeteer的功能打开一个新页面，导航到指定的URL，获取页面的HTML内容，并在最后返回这个内容。最后，给出了如何使用这个函数的例子。

基于 Node.js 的爬虫库Puppeteer

评论已关闭

推荐阅读