Python实战指南:精通多目标进化算法 NSGA-II
1. 引言
在工程优化、工业设计和机器学习调参中,常常存在多个冲突目标:
- 汽车设计:燃油效率 vs 加速度
- 投资组合:收益最大化 vs 风险最小化
- 机器学习:模型精度 vs 复杂度
这类问题无法用单一目标函数描述,而是追求Pareto 最优解集。NSGA-II 正是多目标进化优化的经典算法,能高效逼近 Pareto 前沿。
2. NSGA-II 核心原理
NSGA-II (Non-dominated Sorting Genetic Algorithm II) 的核心思想包括:
- 非支配排序(Non-dominated Sorting):区分优劣层次
- 拥挤度距离(Crowding Distance):保持解的多样性
- 精英策略(Elitism):保留历史最优解
2.1 非支配排序原理
定义支配关系:
个体 A 支配 B,当且仅当:
- A 在所有目标上不差于 B
- A 至少在一个目标上优于 B
步骤:
- 计算每个个体被多少个个体支配(domination count)
- 找出支配数为 0 的个体 → 第一前沿 F1
- 从种群中移除 F1,并递归生成下一层 F2
2.2 拥挤度距离计算
用于衡量解集的稀疏程度:
- 对每个目标函数排序
- 边界个体拥挤度设为无穷大
- 内部个体的拥挤度 = 邻居目标差值归一化和
拥挤度大的个体更容易被保留,用于保持解的多样性。
2.3 算法流程图
初始化种群 P0
|
v
计算目标函数值
|
v
非支配排序 + 拥挤度
|
v
选择 + 交叉 + 变异
|
v
合并父代Pt与子代Qt得到Rt
|
v
按前沿层次+拥挤度选前N个
|
v
生成新种群 Pt+1
3. Python 实战:DEAP 实现 NSGA-II
3.1 安装
pip install deap matplotlib numpy
3.2 定义优化问题
我们以经典 ZDT1 问题为例:
$$ f_1(x) = x_1 $$
$$ f_2(x) = g(x) \cdot \Big(1 - \sqrt{\frac{x_1}{g(x)}}\Big) $$
$$ g(x) = 1 + 9 \cdot \frac{\sum_{i=2}^{n} x_i}{n-1} $$
import numpy as np
from deap import base, creator, tools, algorithms
# 定义多目标最小化
creator.create("FitnessMulti", base.Fitness, weights=(-1.0, -1.0))
creator.create("Individual", list, fitness=creator.FitnessMulti)
DIM = 30
toolbox = base.Toolbox()
toolbox.register("attr_float", np.random.rand)
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_float, n=DIM)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
# ZDT1目标函数
def evalZDT1(ind):
f1 = ind[0]
g = 1 + 9 * sum(ind[1:]) / (DIM-1)
f2 = g * (1 - np.sqrt(f1 / g))
return f1, f2
toolbox.register("evaluate", evalZDT1)
toolbox.register("mate", tools.cxSimulatedBinaryBounded, low=0, up=1, eta=20)
toolbox.register("mutate", tools.mutPolynomialBounded, low=0, up=1, eta=20, indpb=1.0/DIM)
toolbox.register("select", tools.selNSGA2)
3.3 主程序与可视化
import matplotlib.pyplot as plt
def run_nsga2():
pop = toolbox.population(n=100)
hof = tools.ParetoFront()
# 初始化非支配排序
pop = toolbox.select(pop, len(pop))
for gen in range(200):
offspring = algorithms.varAnd(pop, toolbox, cxpb=0.9, mutpb=0.1)
for ind in offspring:
ind.fitness.values = toolbox.evaluate(ind)
# 合并父代与子代
pop = toolbox.select(pop + offspring, 100)
# 可视化帕累托前沿
F1 = np.array([ind.fitness.values for ind in pop])
plt.scatter(F1[:,0], F1[:,1], c='red')
plt.xlabel('f1'); plt.ylabel('f2'); plt.title("NSGA-II Pareto Front")
plt.grid(True)
plt.show()
run_nsga2()
4. 手写 NSGA-II 核心实现
我们手动实现 非支配排序 和 拥挤度计算。
4.1 非支配排序
def fast_non_dominated_sort(values):
S = [[] for _ in range(len(values))]
n = [0 for _ in range(len(values))]
rank = [0 for _ in range(len(values))]
front = [[]]
for p in range(len(values)):
for q in range(len(values)):
if all(values[p] <= values[q]) and any(values[p] < values[q]):
S[p].append(q)
elif all(values[q] <= values[p]) and any(values[q] < values[p]):
n[p] += 1
if n[p] == 0:
rank[p] = 0
front[0].append(p)
i = 0
while front[i]:
next_front = []
for p in front[i]:
for q in S[p]:
n[q] -= 1
if n[q] == 0:
rank[q] = i+1
next_front.append(q)
i += 1
front.append(next_front)
return front[:-1]
4.2 拥挤度计算
def crowding_distance(values):
size = len(values)
distances = [0.0] * size
for m in range(len(values[0])):
sorted_idx = sorted(range(size), key=lambda i: values[i][m])
distances[sorted_idx[0]] = distances[sorted_idx[-1]] = float('inf')
min_val = values[sorted_idx[0]][m]
max_val = values[sorted_idx[-1]][m]
for i in range(1, size-1):
distances[sorted_idx[i]] += (values[sorted_idx[i+1]][m] - values[sorted_idx[i-1]][m]) / (max_val - min_val + 1e-9)
return distances
4.3 手写核心循环
def nsga2_custom(pop_size=50, generations=50):
# 初始化
pop = [np.random.rand(DIM) for _ in range(pop_size)]
fitness = [evalZDT1(ind) for ind in pop]
for gen in range(generations):
# 生成子代
offspring = [np.clip(ind + np.random.normal(0,0.1,DIM),0,1) for ind in pop]
fitness_offspring = [evalZDT1(ind) for ind in offspring]
# 合并
combined = pop + offspring
combined_fitness = fitness + fitness_offspring
# 非支配排序
fronts = fast_non_dominated_sort(combined_fitness)
new_pop, new_fitness = [], []
for front in fronts:
if len(new_pop) + len(front) <= pop_size:
new_pop.extend([combined[i] for i in front])
new_fitness.extend([combined_fitness[i] for i in front])
else:
distances = crowding_distance([combined_fitness[i] for i in front])
sorted_idx = sorted(range(len(front)), key=lambda i: distances[i], reverse=True)
for i in sorted_idx[:pop_size-len(new_pop)]:
new_pop.append(combined[front[i]])
new_fitness.append(combined_fitness[front[i]])
break
pop, fitness = new_pop, new_fitness
return pop, fitness
pop, fitness = nsga2_custom()
import matplotlib.pyplot as plt
plt.scatter([f[0] for f in fitness], [f[1] for f in fitness])
plt.title("Custom NSGA-II Pareto Front")
plt.show()
5. 高阶应用:机器学习特征选择
目标函数:
- 错误率最小化
- 特征数量最小化
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
data = load_breast_cancer()
X, y = data.data, data.target
def eval_model(ind):
selected = [i for i, g in enumerate(ind) if g>0.5]
if not selected:
return 1.0, len(data.feature_names)
model = DecisionTreeClassifier()
score = 1 - np.mean(cross_val_score(model, X[:,selected], y, cv=5))
return score, len(selected)
将其替换到 toolbox.register("evaluate", eval_model)
即可进行多目标特征选择。
6. 总结
本文深入讲解了 NSGA-II 多目标进化算法:
- 原理:非支配排序、拥挤度距离、精英策略
- 实现:DEAP 快速实现 + 手写核心代码
- 可视化:帕累托前沿绘制
- 应用:特征选择与模型调优
评论已关闭