默认值是 'mean'

发布时间：2025-06-24 20:05:32 作者：北方职教升学中心阅读量：514

'mean'、

要点

nn.CrossEntropyLoss()接受的输入是 logits，这说明分类的输出不需要提前经过 softmax。
```
importtorchimporttorch.nn asnnimporttorch.nn.functional asF# 定义输入和目标标签logits =torch.tensor([[2.0,0.5],[0.5,2.0]])# 未经过 softmax 的 logitstarget =torch.tensor([0,1])# 目标标签# 使用 nn.CrossEntropyLoss 计算损失（接受 logits）criterion_ce =nn.CrossEntropyLoss()loss_ce =criterion_ce(logits,target)# 使用 softmax 后再使用 nn.NLLLoss 计算损失log_probs =F.log_softmax(logits,dim=1)criterion_nll =nn.NLLLoss()loss_nll =criterion_nll(log_probs,target)print(f"Loss using nn.CrossEntropyLoss: {loss_ce.item()}")print(f"Loss using softmax + nn.NLLLoss: {loss_nll.item()}")# 验证两者是否相等asserttorch.allclose(loss_ce,loss_nll),"The losses are not equal, which indicates a mistake in the assumption."print("The losses are equal, indicating that nn.CrossEntropyLoss internally applies softmax.")
```
```
>>>Loss using nn.CrossEntropyLoss:0.2014133334159851>>>Loss using softmax +nn.NLLLoss:0.2014133334159851>>>The losses are equal,indicating that nn.CrossEntropyLoss internally applies softmax.
```
拓展： F.log_softmax()
F.log_softmax等价于先应用 softmax激活函数，然后对结果取对数 log()。对于一个 one-hot 编码标签向量， $y_{ic}$ 在样本属于类别 $c$ 时为 1，否则为 0。它是将 softmax和 log这两个操作结合在一起，以提高数值稳定性和计算效率。
文章目录
- 前置知识
- nn.CrossEntropyLoss() 交叉熵损失
- 参数
  数学公式
  带权重的公式（weight）
  标签平滑（label_smoothing）
  要点
- 附录
- 参考链接
前置知识
深度学习：关于损失函数的一些前置知识（PyTorch Loss）
nn.CrossEntropyLoss() 交叉熵损失
torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean', label_smoothing=0.0)
This criterion computes the cross entropy loss between input logits and target.
该函数计算输入 logits 和目标之间的交叉熵损失。
参数
- weight(Tensor, 可选): 一个形状为 $(C)$ 的张量，表示每个类别的权重。
- 总损失：
  计算所有样本的平均损失（ reduction参数默认为 'mean'）：
  $\mathcal{L} = \frac{1}{N} \sum_{i=1}^{N} \ell_i = \frac{1}{N} \sum_{i=1}^{N} -\log(p_{iy_i})$
  如果 reduction参数为 'sum'，总损失为所有样本损失的和：
  $\mathcal{L} = \sum_{i=1}^{N} \ell_i = \sum_{i=1}^{N} -\log(p_{iy_i})$
  如果 reduction参数为 'none'，则返回每个样本的损失 $\ell_i$ 组成的张量。如果提前经过 softmax，则需要使用 nn.NLLLoss()（负对数似然损失）。

附录

用于验证数学公式和函数实际运行的一致性

importtorchimporttorch.nn.functional asF# 假设有两个样本，每个样本有三个类别logits =torch.tensor([[1.5,2.0,0.5],[1.0,0.5,2.5]],requires_grad=True)targets =torch.tensor([1,2])# 根据公式实现 softmaxdefsoftmax(x):returntorch.exp(x)/torch.exp(x).sum(dim=1,keepdim=True)# 根据公式实现 log-softmaxdeflog_softmax(x):returnx -torch.log(torch.exp(x).sum(dim=1,keepdim=True))# 根据公式实现负对数似然损失（NLLLoss）defnll_loss(log_probs,targets):N =log_probs.size(0)return-log_probs[range(N),targets].mean()# 根据公式实现交叉熵损失defcustom_cross_entropy(logits,targets):log_probs =log_softmax(logits)returnnll_loss(log_probs,targets)# 使用 PyTorch 计算交叉熵损失criterion =torch.nn.CrossEntropyLoss(reduction='mean')loss_torch =criterion(logits,targets)# 使用根据公式实现的交叉熵损失loss_custom =custom_cross_entropy(logits,targets)# 打印结果print("PyTorch 计算的交叉熵损失:",loss_torch.item())print("根据公式实现的交叉熵损失:",loss_custom.item())# 验证结果是否相等asserttorch.isclose(loss_torch,loss_custom),"数学公式验证失败"# 带权重的交叉熵损失weights =torch.tensor([0.7,0.2,0.1])criterion_weighted =torch.nn.CrossEntropyLoss(weight=weights,reduction='mean')loss_weighted_torch =criterion_weighted(logits,targets)# 根据公式实现带权重的交叉熵损失defcustom_weighted_cross_entropy(logits,targets,weights):log_probs =log_softmax(logits)N =logits.size(0)weighted_loss =-log_probs[range(N),targets]*weights[targets]returnweighted_loss.sum()/weights[targets].sum()loss_weighted_custom =custom_weighted_cross_entropy(logits,targets,weights)# 打印结果print("PyTorch 计算的带权重的交叉熵损失:",loss_weighted_torch.item())print("根据公式实现的带权重的交叉熵损失:",loss_weighted_custom.item())# 验证结果是否相等asserttorch.isclose(loss_weighted_torch,loss_weighted_custom,atol=1e-6),"带权重的数学公式验证失败"# 标签平滑的交叉熵损失alpha =0.1criterion_label_smoothing =torch.nn.CrossEntropyLoss(label_smoothing=alpha,reduction='mean')loss_label_smoothing_torch =criterion_label_smoothing(logits,targets)# 根据公式实现标签平滑的交叉熵损失defcustom_label_smoothing_cross_entropy(logits,targets,alpha):N,C =logits.size()log_probs =log_softmax(logits)one_hot =torch.zeros_like(log_probs).scatter(1,targets.view(-1,1),1)smooth_targets =(1-alpha)*one_hot +alpha /C    loss =-(smooth_targets *log_probs).sum(dim=1).mean()returnlossloss_label_smoothing_custom =custom_label_smoothing_cross_entropy(logits,targets,alpha)# 打印结果print("PyTorch 计算的标签平滑的交叉熵损失:",loss_label_smoothing_torch.item())print("根据公式实现的标签平滑的交叉熵损失:",loss_label_smoothing_custom.item())# 验证结果是否相等asserttorch.isclose(loss_label_smoothing_torch,loss_label_smoothing_custom,atol=1e-6),"标签平滑的数学公式验证失败"

>>>PyTorch 计算的交叉熵损失:0.45524317026138306>>>根据公式实现的交叉熵损失:0.4552431106567383>>>PyTorch 计算的带权重的交叉熵损失:0.5048722624778748>>>根据公式实现的带权重的交叉熵损失:0.50487220287323>>>PyTorch 计算的标签平滑的交叉熵损失:0.5469098091125488>>>根据公式实现的标签平滑的交叉熵损失:0.5469098091125488

输出没有抛出 AssertionError，验证通过。标签平滑是一种正则化技术，通过在真实标签上添加一定程度的平滑来避免过拟合。默认值是 None。默认值是 None。

importtorchimporttorch.nn.functional asF# 定义输入 logitslogits =torch.tensor([[2.0,1.0,0.1],[1.0,3.0,0.2]])# 计算 log_softmaxlog_softmax_result =F.log_softmax(logits,dim=1)# 分开计算 softmax 和 logsoftmax_result =F.softmax(logits,dim=1)log_result =torch.log(softmax_result)print("Logits:")print(logits)print("\nLog softmax (using F.log_softmax):")print(log_softmax_result)print("\nSoftmax result:")print(softmax_result)print("\nLog of softmax result:")print(log_result)# 验证两者是否相等asserttorch.allclose(log_softmax_result,log_result),"The results are not equal."print("\nThe results are equal, indicating that F.log_softmax is equivalent to softmax followed by log.")

>>>Logits:>>>tensor([[2.0000,1.0000,0.1000],>>>[1.0000,3.0000,0.2000]])>>>Log softmax (using F.log_softmax):>>>tensor([[-0.4170,-1.4170,-2.3170],>>>[-2.1791,-0.1791,-2.9791]])>>>Softmax result:>>>tensor([[0.6590,0.2424,0.0986],>>>[0.1131,0.8360,0.0508]])>>>Log of softmax result:>>>tensor([[-0.4170,-1.4170,-2.3170],>>>[-2.1791,-0.1791,-2.9791]])>>>The results are equal,indicating that F.log_softmax isequivalent to softmax followed by log.

从结果中可以看到 F.log_softmax的结果等价于先计算 softmax 再取对数。

nn.CrossEntropyLoss()实际上默认（reduction=‘mean’）计算的是每个样本的平均损失，已经做了归一化处理，所以不需要对得到的结果进一步除以 batch_size 或其他某个数，除非是用作 loss_weight。'sum'。可选值为 'none'、

reduce(bool, 可选): 已弃用。范围在

[0, C)

之间，其中

C

是类别数。

负对数似然（Negative Log-Likelihood）：
计算负对数似然：

\ell_i = -\log(p_{iy_i})

其中

\ell_i

是第

i

个样本的损失，

p_{iy_i}

表示第

i

个样本在真实类别

y_i

上的预测概率。

目标标签 target期望两种格式：

类别索引: 类别的整数索引，而不是 one-hot 编码。
标签平滑（label_smoothing）
如果标签平滑（label smoothing）参数 $\alpha$ 被启用，目标标签 $\mathbf{y}_i$ 会被平滑处理：
$\mathbf{y}_i' = (1 - \alpha) \cdot \mathbf{y}_i + \frac{alpha}{C}$
其中， $\mathbf{y}_i$ 是原始的 one-hot 编码目标标签， $\mathbf{y}_i'$ 是平滑后的标签。如果指定了 ignore_index，则该类别索引也会被接受（即便可能不在类别范围内）
使用示例：
```
# Example of target with class indicesimporttorchimporttorch.nn asnnloss =nn.CrossEntropyLoss()input=torch.randn(3,5,requires_grad=True)target =torch.empty(3,dtype=torch.long).random_(5)output =loss(input,target)output.backward()
```
类别概率: 类别的概率分布，适用于需要每个批次项有多个类别标签的情况，如标签平滑等。
label_smoothing(float, 可选): 标签平滑值，范围在 [0.0, 1.0] 之间。对于第 $i$ 个样本，它的真实类别标签为 $y_i$ ，模型的输出 logits 为 $\mathbf{x}_i = (x_{i1}, x_{i2}, \ldots, x_{iC})$ ，其中 $x_{ic}$ 表示第 $i$ 个样本在第 $c$ 类别上的原始输出分数（logits）。如果 reduction不是 'none'，则默认情况下损失是取平均（True）；否则，是求和（False）。如果提供了这个参数，损失函数会根据类别的权重来调整各类别的损失，适用于类别不平衡的问题。
参考链接
CrossEntropyLoss - Docs

计算 softmax的结果的对数。默认值是 'mean'。默认值是 -100。下面是一个简单的例子：

importtorchimporttorch.nn asnn# 定义损失函数criterion =nn.CrossEntropyLoss()# 定义输入和目标标签input1 =torch.tensor([[2.0,0.5],[0.5,2.0]],requires_grad=True)# 批量大小为 2target1 =torch.tensor([0,1])# 对应的目标标签input2 =torch.tensor([[2.0,0.5],[0.5,2.0],[2.0,0.5],[0.5,2.0]],requires_grad=True)# 批量大小为 4target2 =torch.tensor([0,1,0,1])# 对应的目标标签# 计算损失loss1 =criterion(input1,target1)loss2 =criterion(input2,target2)print(f"Loss with batch size 2: {loss1.item()}")print(f"Loss with batch size 4: {loss2.item()}")

>>>Loss withbatch size 2:0.2014133334159851>>>Loss withbatch size 4:0.2014133334159851

可以看到这里的 input2实际上等价于 torch.cat([input1, input1], dim=0)，target2等价于 torch.cat([target1, target1], dim=0)，简单拓展了 batch_size 大小但最终的 Loss 没变，这也就验证了之前的说法。请使用 reduction参数。只有在每个批次项的单一类别标签过于限制时，才考虑使用类别概率。

\mathcal{L} = [\ell_1, \ell_2, \ldots, \ell_N] = [-\log(p_{iy_1}), -\log(p_{iy_2}), \ldots, -\log(p_{iy_N})]

带权重的公式（weight）

如果指定了类别权重 $\mathbf{w} = (w_1, w_2, \ldots, w_C)$ ，则总损失公式为：

$\mathcal{L} = \frac{1}{N} \sum_{i=1}^{N} w_{y_i} \cdot \ell_i = \frac{\sum_{i=1}^{N} w_{y_i} \cdot (-\log(p_{iy_i}))}{\sum_{i=1}^{N} w_{y_i}}$

其中 $w_{y_i}$ 是第 $i$ 个样本真实类别的权重。

交叉熵损失的计算步骤如下：

Softmax 函数：
对 logits 进行 softmax 操作，将其转换为概率分布：
$p_{ic} = \frac{\exp(x_{ic})}{\sum_{j=1}^{C} \exp(x_{ij})}$
其中 $ p_{ic} $ 表示第 $ i $ 个样本属于第 $ c $ 类别的预测概率。默认值是 None。默认值是 0.0。
reduction(str, 可选): 指定应用于输出的归约方式。
ignore_index(int, 可选): 如果指定了这个参数，则该类别的索引会被忽略，不会对损失和梯度产生影响。

数学公式

附录部分会验证下述公式和代码的一致性。
size_average(bool, 可选): 已弃用。
使用示例：
# Example of target with class probabilitiesimporttorchimporttorch.nn asnnloss =nn.CrossEntropyLoss()input=torch.randn(3,5,requires_grad=True)target =torch.randn(3,5).softmax(dim=1)output =loss(input,target)output.backward()
The performance of this criterion is generally better when target contains class indices, as this allows for optimized computation. Consider providing target as class probabilitiesonly when a single class label per minibatch item is too restrictive.
通常情况下，当目标为类别索引时，该函数的性能更好，因为这样可以进行优化计算。
总的损失公式会相应调整：
$\ell_i = - \sum_{c=1}^{C} y_{ic}' \cdot \log(p_{ic})$
其中， $y_{ic}$ 是第 $i$ 个样本在第 $c$ 类别上的标签，为原标签 $y_i$ 经过 one-hot 编码后 $\mathbf{y}_i$ 中的值。具体的数学定义如下：
$log_softmax ( x i ) = log ⁡ ( softmax ( x i ) ) = log ⁡ ( exp ⁡ ( x i ) ∑ j exp ⁡ ( x j ) ) = x i − log ⁡ ( ∑ j exp ⁡ ( x j ) ) \text{log_softmax}(x_i) = \log\left(\text{softmax}(x_i)\right) = \log\left(\frac{sum_j exp(x_j)}\right) = x_i - \log\left(\sum_j \exp(x_j)\right)$
在代码中，F.log_softmax的等价操作可以用以下步骤实现：
计算 softmax。

假设有 $N$ 个样本，每个样本属于 $C$ 个类别之一。'none'表示不进行归约，'mean'表示对所有样本的损失求平均，'sum'表示对所有样本的损失求和。

学生姓名：
男女
联系电话：
意向班型：
我是学生我是家长

咨询热线：	400-029-7969
咨询电话：	029-61855169 029-61855069
学校邮箱：	bfzx365@163.com
学校地址：	西安市雁塔区长安西路66号

默认值是 'mean'

要点

文章目录

前置知识

nn.CrossEntropyLoss() 交叉熵损失

参数

附录

标签平滑（label_smoothing）

参考链接

带权重的公式（weight）

数学公式