YOLOv5+单目测距（python）

2025-06-24 11:49:52 来源：新华社

字号：默认大超大 | 打印 |

YOLOv5+单目测距（python）

1. 相关配置
2. 测距原理
3. 相机标定
- 3.1：标定方法1
- 3.2：标定方法2
4. 相机测距
- 4.1 测距添加
- 4.2 细节修改（可忽略）
- 4.3 主代码
5. 实验效果

相关链接
1. YOLOV7 + 单目测距（python）
2. YOLOV5 + 单目跟踪（python）
3. YOLOV7 + 单目跟踪（python）
4. YOLOV5 + 双目测距（python）
5. YOLOV7 + 双目测距（python）
6. 具体实现效果已在Bilibili发布，点击跳转

本篇博文工程源码下载
链接1：https://download.csdn.net/download/qq_45077760/87708260
链接2：https://github.com/up-up-up-up/yolov5_Monocular_ranging

更多有关单目（尺寸测量，跟踪、碰撞检测等）的文章请见：https://blog.csdn.net/qq_45077760/category_12312107.html

1. 相关配置

系统：win 10
YOLO版本：yolov5 6.1
拍摄视频设备：安卓手机
电脑显卡：NVIDIA 2080Ti（CPU也可以跑，GPU只是起到加速推理效果）

2. 测距原理

单目测距原理相较于双目十分简单，无需进行立体匹配，仅需利用下边公式线性转换即可：

D =(F*W)/P

其中D是目标到摄像机的距离, Ｆ是摄像机焦距（焦距需要自己进行标定获取）, W是目标的宽度或者高度（行人检测一般以人的身高为基准）, P是指目标在图像中所占据的像素
在这里插入图片描述
了解基本原理后，下边就进行实操阶段

3. 相机标定

3.1：标定方法1

可以参考张友正标定法获取相机的焦距

3.2：标定方法2

直接使用代码获得焦距，需要提前拍摄一个矩形物体，拍摄时候相机固定，距离被拍摄物体自行设定，并一直保持此距离，背景为纯色，不要出现杂物；最后将拍摄的视频用以下代码检测：

importcv2win_width =1920win_height =1080mid_width =int(win_width /2)mid_height =int(win_height /2)foc =1990.0# 根据教程调试相机焦距real_wid =9.05# A4纸横着的时候的宽度，视频拍摄A4纸要横拍，镜头横，A4纸也横font =cv2.FONT_HERSHEY_SIMPLEXw_ok =1capture =cv2.VideoCapture('5.mp4')capture.set(3,win_width)capture.set(4,win_height)while(True):ret,frame =capture.read()# frame = cv2.flip(frame, 1)ifret ==False:breakgray =cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)gray =cv2.GaussianBlur(gray,(5,5),0)ret,binary =cv2.threshold(gray,140,200,60)# 扫描不到纸张轮廓时，要更改阈值，直到方框紧密框住纸张kernel =cv2.getStructuringElement(cv2.MORPH_RECT,(3,3))binary =cv2.dilate(binary,kernel,iterations=2)contours,hierarchy =cv2.findContours(binary,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)# cv2.drawContours(frame, contours, -1, (0, 255, 0), 2)    # 查看所检测到的轮框forc incontours:ifcv2.contourArea(c)<1000:# 对于矩形区域，只显示大于给定阈值的轮廓，所以一些微小的变化不会显示。对于光照不变和噪声低的摄像头可不设定轮廓最小尺寸的阈值continuex,y,w,h =cv2.boundingRect(c)# 该函数计算矩形的边界框ifx >mid_width ory >mid_height:continueif(x +w)<mid_width or(y +h)<mid_height:continueifh >w:continueifx ==0ory ==0:continueifx ==win_width ory ==win_height:continuew_ok =w        cv2.rectangle(frame,(x +1,y +1),(x +w_ok -1,y +h -1),(0,255,0),2)dis_inch =(real_wid *foc)/(w_ok -2)dis_cm =dis_inch *2.54# os.system("cls")# print("Distance : ", dis_cm, "cm")frame =cv2.putText(frame,"%.2fcm"%(dis_cm),(5,25),font,0.8,(0,255,0),2)frame =cv2.putText(frame,"+",(mid_width,mid_height),font,1.0,(0,255,0),2)cv2.namedWindow('res',0)cv2.namedWindow('gray',0)cv2.resizeWindow('res',win_width,win_height)cv2.resizeWindow('gray',win_width,win_height)cv2.imshow('res',frame)cv2.imshow('gray',binary)c =cv2.waitKey(40)ifc ==27:# 按退出键esc关闭窗口breakcv2.destroyAllWindows()

反复调节 ret, binary = cv2.threshold(gray, 140, 200, 60)这一行里边的三个参数，直到线条紧紧包裹住你所拍摄视频的物体，然后调整相机焦距直到左上角距离和你拍摄视频时相机到物体的距离接近为止
在这里插入图片描述
然后将相机焦距写进测距代码distance.py文件里，这里行人用高度表示，根据公式 D = (F*W)/P，知道相机焦距F、行人的高度66.9（单位英寸→170cm/2.54）、像素点距离 h，即可求出相机到物体距离D。这里用到h-2是因为框的上下边界像素点不接触物体

foc =1990.0# 镜头焦距real_hight_person =66.9# 行人高度real_hight_car =57.08# 轿车高度# 自定义函数，单目测距defperson_distance(h):dis_inch =(real_hight_person *foc)/(h -2)dis_cm =dis_inch *2.54dis_cm =int(dis_cm)dis_m =dis_cm/100returndis_mdefcar_distance(h):dis_inch =(real_hight_car *foc)/(h -2)dis_cm =dis_inch *2.54dis_cm =int(dis_cm)dis_m =dis_cm/100returndis_m

4. 相机测距

4.1 测距添加

主要是把测距部分加在了画框附近，首先提取边框的像素点坐标，然后计算边框像素点高度，在根据公式 D = (F*W)/P 计算目标距离

for*xyxy,conf,cls inreversed(det):ifsave_txt:# Write to filexywh =(xyxy2xywh(torch.tensor(xyxy).view(1,4))/gn).view(-1).tolist()# normalized xywhline =(cls,*xywh,conf)ifsave_conf else(cls,*xywh)# label formatwithopen(txt_path +'.txt','a')asf:f.write(('%g '*len(line)).rstrip()%line +'\n')ifsave_img orsave_crop orview_img:# Add bbox to imagex1 =int(xyxy[0])#获取四个边框坐标y1 =int(xyxy[1])x2 =int(xyxy[2])y2 =int(xyxy[3])h =y2-y1         ifnames[int(cls)]=="person":c =int(cls)# integer class  整数类 1111111111label =Noneifhide_labels else(names[c]ifhide_conf elsef'{ names[c]}{ conf:.2f}')# 111dis_m =person_distance(h)# 调用函数，计算行人实际高度label +=f'  { dis_m}m'# 将行人距离显示写在标签后txt ='{ 0}'.format(label)annotator.box_label(xyxy,txt,color=colors(c,True))ifnames[int(cls)]=="car":c =int(cls)# integer class  整数类 1111111111label =Noneifhide_labels else(names[c]ifhide_conf elsef'{ names[c]}{ conf:.2f}')# 111dis_m =car_distance(h)# 调用函数，计算汽车实际高度label +=f'  { dis_m}m'# 将汽车距离显示写在标签后txt ='{ 0}'.format(label)annotator.box_label(xyxy,txt,color=colors(c,True))ifsave_crop:save_one_box(xyxy,imc,file=save_dir /'crops'/names[c]/f'{ p.stem}.jpg',BGR=True)

4.2 细节修改（可忽略）

到上述步骤就已经实现了单目测距过程，下边是一些小细节修改，可以不看
为了实时显示画面，对运行的py文件点击编辑配置，在形参那里输入–view-img --save-txt
在这里插入图片描述
但实时显示画面太大，我们对显示部分做了修改，这部分也可以不要，具体是把代码

ifview_img:cv2.imshow(str(p),im0)cv2.waitKey(1)# 1 millisecond

替换成

ifview_img:cv2.namedWindow("Webcam",cv2.WINDOW_NORMAL)cv2.resizeWindow("Webcam",1280,720)cv2.moveWindow("Webcam",0,100)cv2.imshow("Webcam",im0)cv2.waitKey(1)

4.3 主代码

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license"""Run inference on images, videos, directories, streams, etc.Usage - sources:    $ python path/to/detect.py --weights yolov5s.pt --source 0              # webcam                                                             img.jpg        # image                                                             vid.mp4        # video                                                             path/          # directory                                                             path/*.jpg     # glob                                                             'https://youtu.be/Zgi9g1ksQHc'  # YouTube                                                             'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP streamUsage - formats:    $ python path/to/detect.py --weights yolov5s.pt                 # PyTorch                                         yolov5s.torchscript        # TorchScript                                         yolov5s.onnx               # ONNX Runtime or OpenCV DNN with --dnn                                         yolov5s.xml                # OpenVINO                                         yolov5s.engine             # TensorRT                                         yolov5s.mlmodel            # CoreML (MacOS-only)                                         yolov5s_saved_model        # TensorFlow SavedModel                                         yolov5s.pb                 # TensorFlow GraphDef                                         yolov5s.tflite             # TensorFlow Lite                                         yolov5s_edgetpu.tflite     # TensorFlow Edge TPU"""importargparseimportosimportsysfrompathlib importPathimportcv2importtorchimporttorch.backends.cudnn ascudnnFILE =Path(__file__).resolve()ROOT =FILE.parents[0]# YOLOv5 root directoryifstr(ROOT)notinsys.path:sys.path.append(str(ROOT))# add ROOT to PATHROOT =Path(os.path.relpath(ROOT,Path.cwd()))# relativefrommodels.common importDetectMultiBackendfromutils.datasets importIMG_FORMATS,VID_FORMATS,LoadImages,LoadStreamsfromutils.general import(LOGGER,check_file,check_img_size,check_imshow,check_requirements,colorstr,increment_path,non_max_suppression,print_args,scale_coords,strip_optimizer,xyxy2xywh)fromutils.plots importAnnotator,colors,save_one_boxfromutils.torch_utils importselect_device,time_syncfromdistance importperson_distance,car_distance@torch.no_grad()defrun(weights=ROOT /'yolov5s.pt',# model.pt path(s)source=ROOT /'data/images',# file/dir/URL/glob, 0 for webcamdata=ROOT /'data/coco128.yaml',# dataset.yaml pathimgsz=(640,640),# inference size (height, width)conf_thres=0.25,# confidence thresholdiou_thres=0.45,# NMS IOU thresholdmax_det=1000,# maximum detections per imagedevice='',# cuda device, i.e. 0 or 0,1,2,3 or cpuview_img=False,# show resultssave_txt=False,# save results to *.txtsave_conf=False,# save confidences in --save-txt labelssave_crop=False,# save cropped prediction boxesnosave=False,# do not save images/videosclasses=None,# filter by class: --class 0, or --class 0 2 3agnostic_nms=False,# class-agnostic NMSaugment=False,# augmented inferencevisualize=False,# visualize featuresupdate=False,# update all modelsproject=ROOT /'runs/detect',# save results to project/namename='exp',# save results to project/nameexist_ok=False,# existing project/name ok, do not incrementline_thickness=3,# bounding box thickness (pixels)hide_labels=False,# hide labelshide_conf=False,# hide confidenceshalf=False,# use FP16 half-precision inferencednn=False,# use OpenCV DNN for ONNX inference):source =str(source)save_img =notnosave andnotsource.endswith('.txt')# save inference imagesis_file =Path(source).suffix[1:]in(IMG_FORMATS +VID_FORMATS)is_url =source.lower().startswith(('rtsp://','rtmp://','http://','https://'))webcam =source.isnumeric()orsource.endswith('.txt')or(is_url andnotis_file)ifis_url andis_file:source =check_file(source)# download# Directoriessave_dir =increment_path(Path(project)/name,exist_ok=exist_ok)# increment run(save_dir /'labels'ifsave_txt elsesave_dir).mkdir(parents=True,exist_ok=True)# make dir# Load modeldevice =select_device(device)model =DetectMultiBackend(weights,device=device,dnn=dnn,data=data)stride,names,pt,jit,onnx,engine =model.stride,model.names,model.pt,model.jit,model.onnx,model.engine    imgsz =check_img_size(imgsz,s=stride)# check image size# Halfhalf &=(pt orjit oronnx orengine)anddevice.type!='cpu'# FP16 supported on limited backends with CUDAifpt orjit:model.model.half()ifhalf elsemodel.model.float()# Dataloaderifwebcam:view_img =check_imshow()cudnn.benchmark =True# set True to speed up constant image size inferencedataset =LoadStreams(source,img_size=imgsz,stride=stride,auto=pt)bs =len(dataset)# batch_sizeelse:dataset =LoadImages(source,img_size=imgsz,stride=stride,auto=pt)bs =1# batch_sizevid_path,vid_writer =[None]*bs,[None]*bs    # Run inferencemodel.warmup(imgsz=(1ifpt elsebs,3,*imgsz),half=half)# warmupdt,seen =[0.0,0.0,0.0],0forpath,im,im0s,vid_cap,s indataset:t1 =time_sync()im =torch.from_numpy(im).to(device)im =im.half()ifhalf elseim.float()# uint8 to fp16/32im /=255# 0 - 255 to 0.0 - 1.0iflen(im.shape)==3:im =im[None]# expand for batch dimt2 =time_sync()dt[0]+=t2 -t1        # Inferencevisualize =increment_path(save_dir /Path(path).stem,mkdir=True)ifvisualize elseFalsepred =model(im,augment=augment,visualize=visualize)t3 =time_sync()dt[1]+=t3 -t2        # NMSpred =non_max_suppression(pred,conf_thres,iou_thres,classes,agnostic_nms,max_det=max_det)dt[2]+=time_sync()-t3        # Second-stage classifier (optional)# pred = utils.general.apply_classifier(pred, classifier_model, im, im0s)# Process predictionsfori,det inenumerate(pred):# per imageseen +=1ifwebcam:# batch_size >= 1p,im0,frame =path[i],im0s[i].copy(),dataset.count                s +=f'{ i}: 'else:p,im0,frame =path,im0s.copy(),getattr(dataset,'frame',0)p =Path(p)# to Pathsave_path =str(save_dir /p.name)# im.jpgtxt_path =str(save_dir /'labels'/p.stem)+(''ifdataset.mode =='image'elsef'_{ frame}')# im.txts +='%gx%g '%im.shape[2:]# print stringgn =torch.tensor(im0.shape)[[1,0,1,0]]# normalization gain whwhimc =im0.copy()ifsave_crop elseim0  # for save_cropannotator =Annotator(im0,line_width=line_thickness,example=str(names))iflen(det):# Rescale boxes from img_size to im0 sizedet[:,:4]=scale_coords(im.shape[2:],det[:,:4],im0.shape).round()# Print resultsforc indet[:,-1].unique():n =(det[:,-1]==c).sum()# detections per classs +=f"{ n}{ names[int(c)]}{ 's'*(n >1)}, "# add to string# Write resultsfor*xyxy,conf,cls inreversed(det):ifsave_txt:# Write to filexywh =(xyxy2xywh(torch.tensor(xyxy).view(1,4))/gn).view(-1).tolist()# normalized xywhline =(cls,*xywh,conf)ifsave_conf else(cls,*xywh)# label formatwithopen(txt_path +'.txt','a')asf:f.write(('%g '*len(line)).rstrip()%line +'\n')ifsave_img orsave_crop orview_img:# Add bbox to imagex1 =int(xyxy[0])y1 =int(xyxy[1])x2 =int(xyxy[2])y2 =int(xyxy[3])h =y2-y1                        ifnames[int(cls)]=="person":c =int(cls)# integer class  整数类 1111111111label =Noneifhide_labels else(names[c]ifhide_conf elsef'{ names[c]}{ conf:.2f}')# 111dis_m =person_distance(h)label +=f'  { dis_m}m'txt ='{ 0}'.format(label)# annotator.box_label(xyxy, txt, color=(255, 0, 255))annotator.box_label(xyxy,txt,color=colors(c,True))ifnames[int(cls)]=="car":c =int(cls)# integer class  整数类 1111111111label =Noneifhide_labels else(names[c]ifhide_conf elsef'{ names[c]}{ conf:.2f}')# 111dis_m =car_distance(h)label +=f'  { dis_m}m'txt ='{ 0}'.format(label)# annotator.box_label(xyxy, txt, color=(255, 0, 255))annotator.box_label(xyxy,txt,color=colors(c,True))ifsave_crop:save_one_box(xyxy,imc,file=save_dir /'crops'/names[c]/f'{ p.stem}.jpg',BGR=True)# Stream resultsim0 =annotator.result()'''if view_img:                cv2.imshow(str(p), im0)                cv2.waitKey(1)  # 1 millisecond'''ifview_img:cv2.namedWindow("Webcam",cv2.WINDOW_NORMAL)cv2.resizeWindow("Webcam",1280,720)cv2.moveWindow("Webcam",0,100)cv2.imshow("Webcam",im0)cv2.waitKey(1)# Save results (image with detections)ifsave_img:ifdataset.mode =='image':cv2.imwrite(save_path,im0)else:# 'video' or 'stream'ifvid_path[i]!=save_path:# new videovid_path[i]=save_path                        ifisinstance(vid_writer[i],cv2.VideoWriter):vid_writer[i].release()# release previous video writerifvid_cap:# videofps =vid_cap.get(cv2.CAP_PROP_FPS)w =int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))h =int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))else:# streamfps,w,h =30,im0.shape[1],im0.shape[0]save_path =str(Path(save_path).with_suffix('.mp4'))# force *.mp4 suffix on results videosvid_writer[i]=cv2.VideoWriter(save_path,cv2.VideoWriter_fourcc(*'mp4v'),fps,(w,h))vid_writer[i].write(im0)# Print time (inference-only)LOGGER.info(f'{ s}Done. ({ t3 -t2:.3f}s)')# Print resultst =tuple(x /seen *1E3forx indt)# speeds per imageLOGGER.info(f'Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape { (1,3,*imgsz)}'%t)ifsave_txt orsave_img:s =f"\n{ len(list(save_dir.glob('labels/*.txt')))}labels saved to { save_dir /'labels'}"ifsave_txt else''LOGGER.info(f"Results saved to { colorstr('bold',save_dir)}{ s}")ifupdate:strip_optimizer(weights)# update model (to fix SourceChangeWarning)defparse_opt():parser =argparse.ArgumentParser()parser.add_argument('--weights',nargs='+',type=str,default=ROOT /'yolov5s.pt',help='model path(s)')parser.add_argument('--source',type=str,default=ROOT /'data/images/1.mp4',help='file/dir/URL/glob, 0 for webcam')parser.add_argument('--data',type=str,default=ROOT /'data/coco128.yaml',help='(optional) dataset.yaml path')parser.add_argument('--imgsz','--img','--img-size',nargs='+',type=int,default=[640],help='inference size h,w')parser.add_argument('--conf-thres',type=float,default=0.25,help='confidence threshold')parser.add_argument('--iou-thres',type=float,default=0.45,help='NMS IoU threshold')parser.add_argument('--max-det',type=int,default=1000,help='maximum detections per image')parser.add_argument('--device',default='',help='cuda device, i.e. 0 or 0,1,2,3 or cpu')parser.add_argument('--view-img',action='store_true',help='show results')parser.add_argument('--save-txt',action='store_true',help='save results to *.txt')parser.add_argument('--save-conf',action='store_true',help='save confidences in --save-txt labels')parser.add_argument('--save-crop',action='store_true',help='save cropped prediction boxes')parser.add_argument('--nosave',action='store_true',help='do not save images/videos')parser.add_argument('--classes',nargs='+',type=int,help='filter by class: --classes 0, or --classes 0 2 3')parser.add_argument('--agnostic-nms',action='store_true',help='class-agnostic NMS')parser.add_argument('--augment',action='store_true',help='augmented inference')parser.add_argument('--visualize',action='store_true',help='visualize features')parser.add_argument('--update',action='store_true',help='update all models')parser.add_argument('--project',default=ROOT /'runs/detect',help='save results to project/name')parser.add_argument('--name',default='exp',help='save results to project/name')parser.add_argument('--exist-ok',action='store_true',help='existing project/name ok, do not increment')parser.add_argument('--line-thickness',default=3,type=int,help='bounding box thickness (pixels)')parser.add_argument('--hide-labels',default=False,action='store_true',help='hide labels')parser.add_argument('--hide-conf',default=False,action='store_true',help='hide confidences')parser.add_argument('--half',action='store_true',help='use FP16 half-precision inference')parser.add_argument('--dnn',action='store_true',help='use OpenCV DNN for ONNX inference')opt =parser.parse_args()opt.imgsz *=2iflen(opt.imgsz)==1else1# expandprint_args(FILE.stem,opt)returnoptdefmain(opt):check_requirements(exclude=('tensorboard','thop'))run(**vars(opt))if__name__ =="__main__":opt =parse_opt()main(opt)

5. 实验效果

实验效果如下

更多有关单目（尺寸测量，跟踪、碰撞检测等）的文章请见：https://blog.csdn.net/qq_45077760/category_12312107.html

【我要纠错】责任编辑：新华社