nodejs写爬虫

作者：System 时间：2024年08月23日分类：所有,爬虫字数：815

这篇文章距离上次修改已过693天，其中的内容可能已经有所变动。

在Node.js中编写一个简单的爬虫，可以使用axios来发送HTTP请求，以及cheerio来解析返回的HTML内容。以下是一个简单的例子，用于抓取一个网页上的所有链接。

首先，确保安装了所需的包：




npm install axios cheerio

然后，编写爬虫代码：




const axios = require('axios');
const cheerio = require('cheerio');
 
async function fetchLinks(url) {
  try {
    const { data } = await axios.get(url);
    const $ = cheerio.load(data);
    const links = [];
 
    $('a').each((i, link) => {
      const href = $(link).attr('href');
      if (href) {
        links.push(href);
      }
    });
 
    console.log(links);
  } catch (error) {
    console.error('An error occurred:', error);
  }
}
 
// 使用示例
const url = 'https://example.com'; // 替换为你想抓取的网址
fetchLinks(url);

这段代码会输出指定网页上所有<a>标签的href属性。你可以根据需要修改选择器来抓取不同的内容。记得遵守网站的robots.txt规则和政策，尊重版权和法律，不要进行破坏性爬取。

nodejs写爬虫

评论已关闭

推荐阅读