java-网络爬虫 1
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
public class SimpleWebCrawler {
public static void main(String[] args) throws Exception {
String url = "http://example.com"; // 替换为你想爬取的网站
HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
connection.setRequestMethod("GET");
try (BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()))) {
String inputLine;
StringBuilder content = new StringBuilder();
while ((inputLine = in.readLine()) != null) {
content.append(inputLine);
}
// 打印网页内容
System.out.println(content.toString());
}
}
}
这段代码展示了如何使用Java进行简单的网络爬虫。它创建了一个指向指定URL的HttpURLConnection,然后读取并打印了网页内容。这是一个基本的示例,实际的网络爬虫可能需要处理更复杂的情况,比如多线程下载、页面解析、链接跟踪、robots.txt遵守等。
评论已关闭