System 发布的文章

2024-08-13

为了创建一个高性能的C++爬虫，我们可以使用libcurl库来处理网络请求，使用pugixml库来解析HTML，并且使用boost::asio来处理异步IO操作。以下是一个简化的例子，展示了如何使用这些库来创建一个简单的网页爬虫。




#include <iostream>
#include <string>
#include <curl/curl.h>
#include <pugixml.hpp>
#include <boost/asio.hpp>
 
using namespace std;
using namespace boost::asio;
 
// 这是一个简单的回调函数，用于接收curl请求的响应数据
size_t WriteCallback(void *contents, size_t size, size_t nmemb, void *userp) {
    ((std::string*)userp)->append((char*)contents, size * nmemb);
    return size * nmemb;
}
 
// 发送HTTP GET请求
void sendRequest(const std::string &url, std::string &html) {
    CURL *curl;
    CURLcode res;
 
    curl = curl_easy_init();
    if(curl) {
        curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
        curl_easy_setopt(curl, CURLOPT_WRITEDATA, &html);
        res = curl_easy_perform(curl);
        if(res != CURLE_OK) {
            cerr << "curl_easy_perform() failed: " << curl_easy_strerror(res) << endl;
        }
        curl_easy_cleanup(curl);
    }
}
 
// 解析HTML并打印出链接
void parseHtml(const std::string &html) {
    pugi::xml_document doc;
    if (!doc.load_string(html.c_str()).status) {
        cerr << "Failed to parse HTML" << endl;
        return;
    }
 
    for (pugi::xml_node a : doc.select_nodes("//a")) {
        pugi::xml_node href = a.child("href");
        if (href) {
            cout << href.text().get() << endl;
        }
    }
}
 
int main() {
    io_service service;
    ip::tcp::socket socket(service);
 
    // 这里只是示例，实际爬虫可能需要处理多个URL
    std::string url = "http://example.com";
    std::string html;
 
    sendRequest(url, html);
    parseHtml(html);
 
    return 0;
}

这个例子中，sendRequest函数使用libcurl来发送HTTP GET请求，并将响应内容传递给parseHtml函数，后者使用pugixml来解析HTML并打印出所有的链接。这个例子并没有实现完整的爬虫，因为它没有处理多线程、异步IO、URL队列、去重、页面深度控制等高性能爬虫需要考虑的因素。实际的高性能爬虫还需要更复杂的实现，包括并发控制、资源管理和异常处理等。

- 阅读更多 -

简单爬虫：东方财富网股票数据爬取

System

2024-08-13

所有,爬虫

以下是一个简单的示例代码，用于爬取东方财富网（http://quote.eastmoney.com/center/grid.html）上某股票（例如“600771”）在2023年12月30日的数据。




import requests
from bs4 import BeautifulSoup
import pandas as pd
 
# 股票代码
stock_code = "600771"
 
# 设置请求头，模拟浏览器访问
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
 
# 东方财富网股票历史数据URL
url = f'http://quotes.money.163.com/service/chddata.html?code=1.{stock_code}&start=20231229&end=20231230'
 
# 发送请求
response = requests.get(url, headers=headers)
 
# 检查请求是否成功
if response.status_code == 200:
    # 解析数据
    data = response.text.split('~')
    data.pop(0)  # 移除数组中的空数据
 
    # 转换数据为DataFrame
    df = pd.DataFrame(data, columns=['日期', '开盘', '收盘', '最高', '最低', '成交量', '成交额'])
 
    # 将日期字段转换为日期格式
    df['日期'] = pd.to_datetime(df['日期'], format='%Y%m%d')
 
    # 转换其他数字字段为浮点型
    for col in df.columns[1:]:
        df[col] = df[col].astype('float')
 
    # 输出结果
    print(df)

这段代码首先设置了股票代码和请求头，然后构造了请求的URL。接着，它发送请求，检查响应状态，并解析返回的文本数据。最后，它将数据转换为Pandas DataFrame，并对其进行了一些清洗工作，如转换日期格式和数据类型，最后打印出来。

注意：

这个例子假设只获取2023年12月30日的数据，实际上可以通过调整URL中的start和end参数来获取不同时间段的数据。
东方财富网可能有反爬机制，如果代码无法正常工作，可能需要更新或修改请求头信息，以模拟真实的浏览器访问。

System

2024-08-13

所有,python

Keras是一个用Python编写的开源神经网络库，可以作为TensorFlow、CNTK或Theano的高层接口使用。Keras为开发者提供了一个灵活的神经网络开发流程，可以快速地原型化深度学习模型，同时支持convnets、recurrent neural networks、以及mix-and-match。

安装Keras通常需要安装对应的深度学习后端（如TensorFlow、CNTK等），以下是在Python中安装Keras的步骤：




pip install keras

如果你使用的是TensorFlow作为后端，你可能需要安装TensorFlow版本的Keras：




pip install tensorflow

或者




pip install keras-tensorflow

使用Keras创建一个简单的序列模型：




from keras.models import Sequential
from keras.layers import Dense
 
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
 
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
 
model.fit(x_train, y_train, epochs=5, batch_size=32)
 
loss_and_metrics = model.evaluate(x_test, y_test)
 
classes = model.predict(x_test, batch_size=128)

注意事项：

确保你的Python环境配置正确，并且与Keras和所选择的后端兼容。
根据你的GPU支持和配置，安装对应的深度学习框架和Keras版本。
在使用Keras之前，请确保已经安装了必要的依赖项，如NumPy、SciPy等。
在使用Keras进行模型训练时，确保有足够的数据和计算资源来处理大型模型和数据集。

- 阅读更多 -

Python + selenium —— xpath定位方法详解！

System

2024-08-13

所有,python

在Python的Selenium库中，XPath是一种非常强大的元素定位方式。XPath是一种在XML（HTML可以被视为XML的一种特殊情况）文档中查找信息的语言，它可以用来在HTML中对元素进行定位。

以下是一些常用的XPath定位方法：

绝对路径定位：

绝对路径定位方法是最直接的一种方式，但是如果页面结构有变化，可能就需要重新定位。




element = driver.find_element_by_xpath('/html/body/div/form/input[1]')

相对路径定位：

相对路径定位方法是一种更为稳定的定位方式，它不需要完全指定元素的路径，只需要指定其相对于其他元素的位置关系即可。




element = driver.find_element_by_xpath('//form/input[1]')

属性定位：

如果页面中的元素有唯一的属性（如id、name、class等），可以直接通过这些属性进行定位。




element = driver.find_element_by_xpath("//input[@id='su']")

索引定位：

在XPath中，可以使用索引定位到元素，索引是以1开始的。




element = driver.find_element_by_xpath("//input[1]")

模糊匹配定位：

contains()函数可以用来进行模糊匹配，匹配包含指定内容的元素。




element = driver.find_element_by_xpath("//a[contains(text(),'新闻')]")

逻辑运算定位：

可以使用逻辑运算符and、or进行组合查询。




element = driver.find_element_by_xpath("//input[@class='su' and @id='su']")

轴定位：

轴定位可以定位到某个元素的父元素、子元素、兄弟元素等。




element = driver.find_element_by_xpath("//input/..")  # 定位到input元素的父元素

文本定位：

text()可以用来匹配元素的文本内容。




element = driver.find_element_by_xpath("//a[text()='新闻']")

以上就是一些常用的XPath定位方法，在实际使用中可以根据页面的实际情况选择合适的定位方式。

- 阅读更多 -

Python绘制风速风场图

System

2024-08-13

所有,python




import matplotlib.pyplot as plt
import numpy as np
 
# 假设以下变量已经根据你的数据进行初始化和计算
u_wind = np.array([...])  # 风的x分量，应该是一个二维数组
v_wind = np.array([...])  # 风的y分量，应该是一个二维数组
 
# 计算风速
wind_speed = np.sqrt(u_wind**2 + v_wind**2)
 
# 设置地图和画布
plt.figure(figsize=(12, 9))
plt.contourf(u_wind, v_wind, wind_speed, 8, cmap='jet')
 
# 设置色条
plt.colorbar()
 
# 显示图像
plt.show()

这个代码示例展示了如何使用Matplotlib库结合NumPy来绘制风速风场图。在这个例子中，u_wind和v_wind是表示风的x和y分量的二维数组，wind_speed是对应每个点的风速值。contourf函数用于创建填充的风速等高线图，colorbar用于显示色条，最后使用show显示图像。

- 阅读更多 -

Python - 面向现实世界的人脸复原 GFP-GAN 简介与使用

System

2024-08-13

所有,python




import torch
from torch import nn
from torch.nn import functional as F
 
class GFP(nn.Module):
    """
    实现人脸复原的GFP模块。
    """
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros'):
        super(GFP, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias, padding_mode)
        self.gamma = nn.Parameter(torch.ones(1))
        self.beta = nn.Parameter(torch.zeros(1))
 
    def forward(self, x):
        output = self.conv(x)
        norm = torch.sqrt(torch.mean(output ** 2, dim=1, keepdim=True))
        output = self.gamma * output / norm + self.beta
        return output
 
# 示例：使用GFP模块
input_tensor = torch.randn(1, 512, 4, 4)  # 假设输入特征图大小为4x4
gfp_layer = GFP(512, 512, 3, padding=1)
output_tensor = gfp_layer(input_tensor)
print(output_tensor.shape)  # 输出: torch.Size([1, 512, 4, 4])

这个代码实例展示了如何定义一个GFP模块，并使用它对输入的特征图进行处理。在实例化GFP类后，我们创建了一个随机的输入特征图，并通过GFP模块进行转换，最后打印出输出特征图的形状以验证模块的正确性。

System

2024-08-13

所有,python

解释：

ModuleNotFoundError: No module named 'PIL' 表示Python无法找到名为PIL的模块。PIL（Python Imaging Library）是一个用于图像处理的库，但自Python 3.4以后，PIL不再被官方支持，取而代之的是Pillow，它是PIL的一个友好分支，并且得到了维护。

解决方法：

如果你使用的是Python 3.4及以上版本，你需要安装Pillow而不是PIL。可以使用pip安装：
```
pip install Pillow
```
如果你的代码中有从PIL导入的部分，你需要将这些导入语句更新为从Pillow导入。例如，如果你的代码中有：
```
from PIL import Image
```
你应该将其更改为：
```
from PIL import Image
```
如果你的项目依赖于一个名为PIL的特定版本，而你不能更改为Pillow，那么你可能需要同时安装PIL和Pillow，但这通常不推荐，因为这可能会导致模糊的依赖性和其他问题。

System

2024-08-13

所有,python

ArrayList是Java集合框架中的一部分，是一种可动态增长和减小的数组。

以下是一些常用的ArrayList方法：

添加元素：




ArrayList<String> arrayList = new ArrayList<>();
arrayList.add("Apple");
arrayList.add("Banana");

在指定位置添加元素：




arrayList.add(1, "Orange");

删除元素：




arrayList.remove("Apple");

在指定位置删除元素：




arrayList.remove(1);

修改元素：




arrayList.set(1, "Mango");

获取元素：




String fruit = arrayList.get(1);

获取ArrayList大小：




int size = arrayList.size();

判断ArrayList是否为空：




boolean isEmpty = arrayList.isEmpty();

清空ArrayList：




arrayList.clear();

遍历ArrayList：




for (String fruit : arrayList) {
    System.out.println(fruit);
}

或者使用迭代器：




Iterator<String> iterator = arrayList.iterator();
while (iterator.hasNext()) {
    System.out.println(iterator.next());
}

这些是ArrayList的基本操作，具体使用时可以根据需要选择相应的方法。

- 阅读更多 -

Python深度数据挖掘之电力系统负荷预测

System

2024-08-13

所有,python




import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
 
# 假设有一个包含负荷数据和相关特征的DataFrame
data = pd.read_csv('path_to_data.csv')
 
# 分离特征和目标变量
X = data.drop(['actual_load'], axis=1)
y = data['actual_load']
 
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 
# 创建和训练随机森林回归模型
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
 
# 进行预测
y_pred = rf_model.predict(X_test)
 
# 评估模型性能
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
 
# 保存模型
with open('model.pkl', 'wb') as file:
    pickle.dump(rf_model, file)

这段代码展示了如何使用随机森林模型进行电力系统负荷的预测，包括数据读取、特征选择、模型训练、预测和性能评估。最后，模型被保存到一个二进制文件中以便将来使用。

- 阅读更多 -

python部署linux

System

2024-08-13

所有,python

要在Linux上部署Python应用，你需要执行以下步骤：

安装Python：
使用Linux包管理器安装Python。例如，在Ubuntu上，你可以使用以下命令安装Python3：
```
sudo apt-update
sudo apt install python3
```
创建虚拟环境（可选，但推荐）：
使用venv模块创建一个隔离的Python环境，以避免依赖冲突。
```
python3 -m venv myenv
source myenv/bin/activate
```
安装应用所需依赖：
在你的应用目录中，使用pip安装所需的Python包。
```
pip install -r requirements.txt
```
部署应用：
将你的应用代码和相关文件部署到Linux服务器上。
运行应用：
在服务器上直接运行你的Python应用。
```
python3 app.py
```

确保你的应用配置（如数据库连接、API密钥等）与Linux服务器的环境设置相匹配。如果你的应用需要网络服务，你可能还需要配置防火墙规则、系统服务或者使用如Gunicorn、uWSGI等应用服务器来管理应用进程。

- 阅读更多 -