Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openlane数据集训练 #20

Open
onionysy opened this issue Jul 28, 2023 · 4 comments
Open

openlane数据集训练 #20

onionysy opened this issue Jul 28, 2023 · 4 comments

Comments

@onionysy
Copy link

我们在openlane公开数据集上进行训练,但是训练到一半出现了cuda error的问题。我们怀疑这是数据集车道线类别超出了21类的限制?但是在用openlane.py处理数据集的时候,我们看到有对于超出的类别进行了修改

                if lane_results['category'] >= 21:
                    lane_results['category'] = 20

我们目前已经不知道是那里出了问题,您有什么建议吗?报错信息如下:

2023-07-28 00:20:42,672 - mmseg - INFO - Exp name: anchor3dlane_iter.py 2023-07-28 00:20:42,673 - mmseg - INFO - Iter [3000/60000] lr: 2.000e-04, eta: 10:10:30, time: 0.642, data_time: 0.014, memory: 11367, batch_positives: 12.7812, batch_negatives: 450.0000, cls_loss: 0.1614, reg_losses_x: 0.0256, reg_losses_z: 0.0040, reg_losses_vis: 0.0297, liou_losses_x: 0.3897, liou_losses_z: 0.2364, cls_loss0: 0.0699, reg_losses_x0: 0.0508, reg_losses_z0: 0.0053, reg_losses_vis0: 0.0249, liou_losses_x0: 0.5498, liou_losses_z0: 0.2677, loss: 1.8151 2023-07-28 00:20:49,113 - mmseg - INFO - Iter [3010/60000] lr: 2.000e-04, eta: 10:10:24, time: 0.644, data_time: 0.013, memory: 11367, batch_positives: 13.5938, batch_negatives: 450.0000, cls_loss: 0.1474, reg_losses_x: 0.0386, reg_losses_z: 0.0039, reg_losses_vis: 0.0316, liou_losses_x: 0.3700, liou_losses_z: 0.2279, cls_loss0: 0.0648, reg_losses_x0: 0.0706, reg_losses_z0: 0.0048, reg_losses_vis0: 0.0262, liou_losses_x0: 0.5458, liou_losses_z0: 0.2555, loss: 1.7871 2023-07-28 00:20:55,559 - mmseg - INFO - Iter [3020/60000] lr: 2.000e-04, eta: 10:10:18, time: 0.645, data_time: 0.014, memory: 11367, batch_positives: 13.1438, batch_negatives: 450.0000, cls_loss: 0.1464, reg_losses_x: 0.0312, reg_losses_z: 0.0052, reg_losses_vis: 0.0316, liou_losses_x: 0.3480, liou_losses_z: 0.2309, cls_loss0: 0.0611, reg_losses_x0: 0.0473, reg_losses_z0: 0.0065, reg_losses_vis0: 0.0261, liou_losses_x0: 0.5236, liou_losses_z0: 0.2592, loss: 1.7170 2023-07-28 00:21:01,921 - mmseg - INFO - Iter [3030/60000] lr: 2.000e-04, eta: 10:10:10, time: 0.636, data_time: 0.014, memory: 11367, batch_positives: 11.4375, batch_negatives: 450.0000, cls_loss: 0.1548, reg_losses_x: 0.0307, reg_losses_z: 0.0042, reg_losses_vis: 0.0278, liou_losses_x: 0.3571, liou_losses_z: 0.2288, cls_loss0: 0.0599, reg_losses_x0: 0.0616, reg_losses_z0: 0.0067, reg_losses_vis0: 0.0242, liou_losses_x0: 0.5342, liou_losses_z0: 0.2702, loss: 1.7603 2023-07-28 00:21:08,344 - mmseg - INFO - Iter [3040/60000] lr: 2.000e-04, eta: 10:10:04, time: 0.642, data_time: 0.014, memory: 11367, batch_positives: 13.1125, batch_negatives: 450.0000, cls_loss: 0.1414, reg_losses_x: 0.0200, reg_losses_z: 0.0052, reg_losses_vis: 0.0308, liou_losses_x: 0.3512, liou_losses_z: 0.2308, cls_loss0: 0.0537, reg_losses_x0: 0.0501, reg_losses_z0: 0.0058, reg_losses_vis0: 0.0270, liou_losses_x0: 0.5265, liou_losses_z0: 0.2559, loss: 1.6984 2023-07-28 00:21:14,719 - mmseg - INFO - Iter [3050/60000] lr: 2.000e-04, eta: 10:09:56, time: 0.637, data_time: 0.014, memory: 11367, batch_positives: 13.2000, batch_negatives: 450.0000, cls_loss: 0.1403, reg_losses_x: 0.0277, reg_losses_z: 0.0052, reg_losses_vis: 0.0311, liou_losses_x: 0.3714, liou_losses_z: 0.2311, cls_loss0: 0.0684, reg_losses_x0: 0.0540, reg_losses_z0: 0.0072, reg_losses_vis0: 0.0258, liou_losses_x0: 0.5408, liou_losses_z0: 0.2696, loss: 1.7728 2023-07-28 00:21:21,130 - mmseg - INFO - Iter [3060/60000] lr: 2.000e-04, eta: 10:09:49, time: 0.641, data_time: 0.013, memory: 11367, batch_positives: 11.3625, batch_negatives: 450.0000, cls_loss: 0.1447, reg_losses_x: 0.0208, reg_losses_z: 0.0035, reg_losses_vis: 0.0295, liou_losses_x: 0.3702, liou_losses_z: 0.2274, cls_loss0: 0.0565, reg_losses_x0: 0.0476, reg_losses_z0: 0.0047, reg_losses_vis0: 0.0268, liou_losses_x0: 0.5384, liou_losses_z0: 0.2676, loss: 1.7376 2023-07-28 00:21:27,589 - mmseg - INFO - Iter [3070/60000] lr: 2.000e-04, eta: 10:09:44, time: 0.646, data_time: 0.014, memory: 11367, batch_positives: 13.2188, batch_negatives: 450.0000, cls_loss: 0.1481, reg_losses_x: 0.0324, reg_losses_z: 0.0038, reg_losses_vis: 0.0312, liou_losses_x: 0.3801, liou_losses_z: 0.2369, cls_loss0: 0.0596, reg_losses_x0: 0.0729, reg_losses_z0: 0.0042, reg_losses_vis0: 0.0266, liou_losses_x0: 0.5654, liou_losses_z0: 0.2595, loss: 1.8206 2023-07-28 00:21:33,933 - mmseg - INFO - Iter [3080/60000] lr: 2.000e-04, eta: 10:09:36, time: 0.634, data_time: 0.013, memory: 11367, batch_positives: 13.8812, batch_negatives: 450.0000, cls_loss: 0.1477, reg_losses_x: 0.0295, reg_losses_z: 0.0069, reg_losses_vis: 0.0318, liou_losses_x: 0.3902, liou_losses_z: 0.2495, cls_loss0: 0.0649, reg_losses_x0: 0.0831, reg_losses_z0: 0.0071, reg_losses_vis0: 0.0274, liou_losses_x0: 0.5694, liou_losses_z0: 0.2682, loss: 1.8756 2023-07-28 00:21:40,287 - mmseg - INFO - Iter [3090/60000] lr: 2.000e-04, eta: 10:09:28, time: 0.635, data_time: 0.013, memory: 11367, batch_positives: 13.5938, batch_negatives: 450.0000, cls_loss: 0.1450, reg_losses_x: 0.0237, reg_losses_z: 0.0068, reg_losses_vis: 0.0308, liou_losses_x: 0.3682, liou_losses_z: 0.2500, cls_loss0: 0.0605, reg_losses_x0: 0.0485, reg_losses_z0: 0.0093, reg_losses_vis0: 0.0261, liou_losses_x0: 0.5408, liou_losses_z0: 0.2832, loss: 1.7929 2023-07-28 00:21:46,753 - mmseg - INFO - Iter [3100/60000] lr: 2.000e-04, eta: 10:09:22, time: 0.647, data_time: 0.015, memory: 11367, batch_positives: 13.6750, batch_negatives: 450.0000, cls_loss: 0.1374, reg_losses_x: 0.0236, reg_losses_z: 0.0057, reg_losses_vis: 0.0305, liou_losses_x: 0.3791, liou_losses_z: 0.2349, cls_loss0: 0.0578, reg_losses_x0: 0.0576, reg_losses_z0: 0.0067, reg_losses_vis0: 0.0271, liou_losses_x0: 0.5623, liou_losses_z0: 0.2624, loss: 1.7851 2023-07-28 00:21:53,178 - mmseg - INFO - Iter [3110/60000] lr: 2.000e-04, eta: 10:09:16, time: 0.642, data_time: 0.013, memory: 11367, batch_positives: 13.3875, batch_negatives: 450.0000, cls_loss: 0.1396, reg_losses_x: 0.0203, reg_losses_z: 0.0043, reg_losses_vis: 0.0323, liou_losses_x: 0.3550, liou_losses_z: 0.2296, cls_loss0: 0.0614, reg_losses_x0: 0.0441, reg_losses_z0: 0.0054, reg_losses_vis0: 0.0289, liou_losses_x0: 0.5231, liou_losses_z0: 0.2590, loss: 1.7030 2023-07-28 00:21:59,601 - mmseg - INFO - Iter [3120/60000] lr: 2.000e-04, eta: 10:09:09, time: 0.642, data_time: 0.013, memory: 11367, batch_positives: 13.2500, batch_negatives: 450.0000, cls_loss: 0.1420, reg_losses_x: 0.0206, reg_losses_z: 0.0036, reg_losses_vis: 0.0315, liou_losses_x: 0.3702, liou_losses_z: 0.2274, cls_loss0: 0.0663, reg_losses_x0: 0.0586, reg_losses_z0: 0.0050, reg_losses_vis0: 0.0270, liou_losses_x0: 0.5430, liou_losses_z0: 0.2599, loss: 1.7553 2023-07-28 00:22:06,054 - mmseg - INFO - Iter [3130/60000] lr: 2.000e-04, eta: 10:09:03, time: 0.645, data_time: 0.014, memory: 11367, batch_positives: 12.5625, batch_negatives: 450.0000, cls_loss: 0.1473, reg_losses_x: 0.0194, reg_losses_z: 0.0051, reg_losses_vis: 0.0305, liou_losses_x: 0.3650, liou_losses_z: 0.2466, cls_loss0: 0.0599, reg_losses_x0: 0.0498, reg_losses_z0: 0.0066, reg_losses_vis0: 0.0257, liou_losses_x0: 0.5442, liou_losses_z0: 0.2827, loss: 1.7829 2023-07-28 00:22:12,533 - mmseg - INFO - Iter [3140/60000] lr: 2.000e-04, eta: 10:08:58, time: 0.648, data_time: 0.014, memory: 11367, batch_positives: 12.8063, batch_negatives: 450.0000, cls_loss: 0.1401, reg_losses_x: 0.0304, reg_losses_z: 0.0041, reg_losses_vis: 0.0299, liou_losses_x: 0.3633, liou_losses_z: 0.2325, cls_loss0: 0.0563, reg_losses_x0: 0.0659, reg_losses_z0: 0.0052, reg_losses_vis0: 0.0265, liou_losses_x0: 0.5352, liou_losses_z0: 0.2644, loss: 1.7539 2023-07-28 00:22:19,005 - mmseg - INFO - Iter [3150/60000] lr: 2.000e-04, eta: 10:08:52, time: 0.647, data_time: 0.014, memory: 11367, batch_positives: 12.8063, batch_negatives: 450.0000, cls_loss: 0.1518, reg_losses_x: 0.0198, reg_losses_z: 0.0054, reg_losses_vis: 0.0323, liou_losses_x: 0.3584, liou_losses_z: 0.2361, cls_loss0: 0.0587, reg_losses_x0: 0.0531, reg_losses_z0: 0.0068, reg_losses_vis0: 0.0268, liou_losses_x0: 0.5368, liou_losses_z0: 0.2713, loss: 1.7572 2023-07-28 00:22:25,480 - mmseg - INFO - Iter [3160/60000] lr: 2.000e-04, eta: 10:08:47, time: 0.648, data_time: 0.014, memory: 11367, batch_positives: 11.9625, batch_negatives: 450.0000, cls_loss: 0.1476, reg_losses_x: 0.0203, reg_losses_z: 0.0039, reg_losses_vis: 0.0286, liou_losses_x: 0.3592, liou_losses_z: 0.2322, cls_loss0: 0.0604, reg_losses_x0: 0.0450, reg_losses_z0: 0.0062, reg_losses_vis0: 0.0247, liou_losses_x0: 0.5252, liou_losses_z0: 0.2661, loss: 1.7194 2023-07-28 00:22:31,967 - mmseg - INFO - Iter [3170/60000] lr: 2.000e-04, eta: 10:08:41, time: 0.649, data_time: 0.014, memory: 11367, batch_positives: 13.4187, batch_negatives: 450.0000, cls_loss: 0.1473, reg_losses_x: 0.0182, reg_losses_z: 0.0046, reg_losses_vis: 0.0329, liou_losses_x: 0.3593, liou_losses_z: 0.2498, cls_loss0: 0.0609, reg_losses_x0: 0.0429, reg_losses_z0: 0.0058, reg_losses_vis0: 0.0275, liou_losses_x0: 0.5267, liou_losses_z0: 0.2846, loss: 1.7605 2023-07-28 00:22:38,414 - mmseg - INFO - Iter [3180/60000] lr: 2.000e-04, eta: 10:08:35, time: 0.645, data_time: 0.014, memory: 11367, batch_positives: 12.9000, batch_negatives: 450.0000, cls_loss: 0.1436, reg_losses_x: 0.0247, reg_losses_z: 0.0040, reg_losses_vis: 0.0299, liou_losses_x: 0.3473, liou_losses_z: 0.2335, cls_loss0: 0.0534, reg_losses_x0: 0.0536, reg_losses_z0: 0.0048, reg_losses_vis0: 0.0250, liou_losses_x0: 0.5128, liou_losses_z0: 0.2604, loss: 1.6928 2023-07-28 00:22:44,884 - mmseg - INFO - Iter [3190/60000] lr: 2.000e-04, eta: 10:08:30, time: 0.647, data_time: 0.014, memory: 11367, batch_positives: 14.3000, batch_negatives: 450.0000, cls_loss: 0.1401, reg_losses_x: 0.0227, reg_losses_z: 0.0049, reg_losses_vis: 0.0356, liou_losses_x: 0.3705, liou_losses_z: 0.2517, cls_loss0: 0.0573, reg_losses_x0: 0.0490, reg_losses_z0: 0.0060, reg_losses_vis0: 0.0310, liou_losses_x0: 0.5568, liou_losses_z0: 0.2878, loss: 1.8134 /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [8,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [13,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [18,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. Traceback (most recent call last): File "/snap/pycharm-community/342/plugins/python-ce/helpers/pydev/pydevd.py", line 1500, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/snap/pycharm-community/342/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/buaa/songyue/Anchor3DLane-main/tools/train.py", line 364, in <module> main() File "/home/buaa/songyue/Anchor3DLane-main/tools/train.py", line 354, in main train( File "/home/buaa/songyue/Anchor3DLane-main/tools/train.py", line 242, in train runner.run(data_loaders, cfg.workflow) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 144, in run iter_runner(iter_loaders[i], **kwargs) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 64, in train outputs = self.model.train_step(data_batch, self.optimizer, **kwargs) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/lane_detector/anchor_3dlane.py", line 477, in train_step losses, other_vars = self(**data_batch) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/lane_detector/anchor_3dlane.py", line 398, in forward return self.forward_train(img, mask, img_metas, **kwargs) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func return old_func(*args, **kwargs) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/lane_detector/anchor_3dlane.py", line 448, in forward_train losses, other_vars = self.loss(output, gt_3dlanes, output_aux) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 205, in new_func return old_func(*args, **kwargs) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/lane_detector/anchor_3dlane.py", line 411, in loss anchor_losses = self.lane_loss(proposals_list, gt_3dlanes) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/losses/lane_loss.py", line 137, in forward cls_loss = focal_loss(cls_pred, cls_target) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/losses/kornia_focal.py", line 145, in forward return focal_loss(input, target, self.alpha, self.gamma, self.reduction, self.eps) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/losses/kornia_focal.py", line 84, in focal_loss target_one_hot: torch.Tensor = one_hot(target, num_classes=input.shape[1], device=input.device, dtype=input.dtype) # [b, c, h, w] File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/losses/kornia_focal.py", line 50, in one_hot return one_hot.scatter_(1, labels.unsqueeze(1), 1.0) + eps RuntimeError: CUDA error: device-side assert triggered

@spyflying
Copy link
Collaborator

应该是因为有一条数据的标签超范围了,上面修改标签的那行代码是在合并左右curb,并没有对超出范围的数据做判断。可以把数据过一遍,超出范围的数据直接删掉。

@onionysy
Copy link
Author

合并左右curb?您所说的是Anchor3DLane-main/tools/convert_datasets/openlane.py中第251-252两行代码吗?似乎和我理解的不太一样?

@onionysy
Copy link
Author

图片

@Champagne1219
Copy link

你好,请问这个问题您解决了么?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants