Java-网络爬虫

作者：System 时间：2024年08月11日分类：所有,爬虫字数：1434

这篇文章距离上次修改已过337天，其中的内容可能已经有所变动。




import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
 
public class SimpleWebCrawler {
 
    public static void main(String[] args) throws Exception {
        URL url = new URL("http://example.com");
        HttpURLConnection connection = (HttpURLConnection) url.openConnection();
        connection.setRequestMethod("GET");
 
        int responseCode = connection.getResponseCode();
        if (responseCode == HttpURLConnection.HTTP_OK) {
            BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
            String inputLine;
            StringBuilder content = new StringBuilder();
 
            while ((inputLine = in.readLine()) != null) {
                content.append(inputLine);
                content.append("\n");
            }
 
            in.close();
            connection.disconnect();
 
            // 对获取的内容进行处理
            String webPageContent = content.toString();
            // 例如，可以打印出网页内容
            System.out.println(webPageContent);
        } else {
            System.out.println("GET request not worked");
        }
    }
}

这段代码展示了如何使用Java进行简单的网络爬取。它创建了一个指向http://example.com的URL对象，然后建立了一个HTTP连接，发送了一个GET请求。如果响应码是200（HTTP\_OK），它将读取服务器响应的内容，并将其存储到一个字符串中，然后关闭连接并打印出网页内容。如果响应码不是200，它将输出一个错误消息。这个例子是一个基本的网络爬虫示例，实际应用中可能需要更复杂的处理，比如解析HTML、处理重定向、处理多线程/异步下载等。

Java-网络爬虫

评论已关闭

推荐阅读