java单线程简单爬虫

作者：System 时间：2024年08月23日分类：所有,爬虫字数：1410

这篇文章距离上次修改已过680天，其中的内容可能已经有所变动。

以下是一个简单的Java单线程网页爬虫示例，使用java.net.HttpURLConnection进行网络请求。




import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
 
public class SimpleCrawler {
 
    public static void main(String[] args) {
        try {
            URL url = new URL("http://example.com"); // 替换为你想爬取的网页
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");
 
            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
                String inputLine;
                StringBuilder content = new StringBuilder();
 
                while ((inputLine = in.readLine()) != null) {
                    content.append(inputLine);
                }
 
                in.close();
                connection.disconnect();
 
                // 打印网页内容
                System.out.println(content.toString());
            } else {
                System.out.println("GET request not worked");
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

这段代码创建了一个简单的单线程网页爬虫，它连接到指定的URL，发送GET请求，并打印出服务器响应的内容。这个例子没有处理更复杂的情况，比如多线程下载、处理重定向、处理cookies、处理Ajax加载的内容、爬取的深度控制等。

java单线程简单爬虫

评论已关闭

推荐阅读