Other Parts Discussed in Thread:TDA4VM
1、问题:PC端python3.6+onnx1.8的docker镜像中edgeai-benchmark生产模型在SDK-08_05_00_11版本TDA4VM板子edgeai-benchmark无法运行
2、错误log:
INFO:20221218-130103: starting – cl-6110_onnxrt_imagenet1k_torchvision_resnet50_onnx
INFO:20221218-130103: model_path – /opt/edgeai-modelzoo/models/vision/classification/imagenet1k/torchvision/resnet50.onnx
INFO:20221218-130103: model_file – /opt/edgeai-benchmark/work_dirs/modelartifacts/TDA4VM/8bits/cl-6110_onnxrt_imagenet1k_torchvision_resnet50_onnx/model/resnet50.onnx
INFO:20221218-130103: running – cl-6110_onnxrt_imagenet1k_torchvision_resnet50_onnx
INFO:20221218-130103: pipeline_config – {'task_type': 'classification', 'dataset_category': 'imagenet', 'calibration_dataset': <edgeai_benchmark.datasets.imagenet.ImageNetCls object at 0xffff697ec910>, 'input_dataset': <edgeai_benchmark.datasets.imagenet.ImageNetCls object at 0xffff697ec370>, 'postprocess': <edgeai_benchmark.postprocess.PostProcessTransforms object at 0xffff697ec5b0>, 'preprocess': <edgeai_benchmark.preprocess.PreProcessTransforms object at 0xffff697ec6a0>, 'session': <edgeai_benchmark.sessions.onnxrt_session.ONNXRTSession object at 0xffff697ec490>, 'model_info': {'metric_reference': {'accuracy_top1%': 76.15}, 'model_shortlist': 30}}libtidl_onnxrt_EP loaded 0x35e11f80Final number of subgraphs created are : 1, – Offloaded Nodes – 125, Total Nodes – 125APP: Init … !!!
MEM: Init … !!!
MEM: Initialized DMA HEAP (fd=5) !!!
MEM: Init … Done !!!
IPC: Init … !!!
IPC: Init … Done !!!
REMOTE_SERVICE: Init … !!!
REMOTE_SERVICE: Init … Done !!!
594557.953562 s: GTC Frequency = 200 MHz
APP: Init … Done !!!
594557.953591 s: VX_ZONE_INIT:Enabled
594557.953599 s: VX_ZONE_ERROR:Enabled
594557.953606 s: VX_ZONE_WARNING:Enabled
594557.953975 s: VX_ZONE_INIT:[tivxInitLocal:145] Initialization Done !!!
594557.954018 s: VX_ZONE_INIT:[tivxHostInitLocal:93] Initialization Done for HOST !!!
594558.025423 s: VX_ZONE_ERROR:[ownContextSendCmd:802] Command ack message returned failure cmd_status: -1
594558.025451 s: VX_ZONE_ERROR:[ownContextSendCmd:838] tivxEventWait() failed.
594558.025477 s: VX_ZONE_ERROR:[ownNodeKernelInit:525] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
594558.025495 s: VX_ZONE_ERROR:[ownNodeKernelInit:526] Please be sure the target callbacks have been registered for this core
594558.025511 s: VX_ZONE_ERROR:[ownNodeKernelInit:527] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
594558.025530 s: VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:1 … failed !!!
594558.025553 s: VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
594558.025569 s: VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
TIDL_RT_OVX: ERROR: Verifying TIDL graph … Failed !!!
TIDL_RT_OVX: ERROR: Verify OpenVX graph failed
infer 2/2: cl-6110_onnxrt_imagenet1k_torchvision_resnet50_on| | 0% 0/1| [< ]594558.079894 s: VX_ZONE_ERROR:[ownContextSendCmd:802] Command ack message returned failure cmd_status: -1
594558.079924 s: VX_ZONE_ERROR:[ownContextSendCmd:838] tivxEventWait() failed.
594558.079938 s: VX_ZONE_ERROR:[ownNodeKernelInit:525] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
594558.079948 s: VX_ZONE_ERROR:[ownNodeKernelInit:526] Please be sure the target callbacks have been registered for this core
594558.079957 s: VX_ZONE_ERROR:[ownNodeKernelInit:527] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
594558.079968 s: VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:1 … failed !!!
594558.079981 s: VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
594558.079990 s: VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
594558.080101 s: VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:799] graph is not in a state required to be scheduled
594558.080111 s: VX_ZONE_ERROR:[vxProcessGraph:734] schedule graph failed
594558.080116 s: VX_ZONE_ERROR:[vxProcessGraph:739] wait graph failed
ERROR: Running TIDL graph … Failed !!!
infer 2/2: cl-6110_onnxrt_imagenet1k_torchvision_resnet50_on| 100%|##########|| 1/1 [00:00<00:00, 18.44it/s]
*** mgg *** description= 2/2 run_dir_base= cl-6110_onnxrt_imagenet1k_torchvision_resnet50_onnx elapsed_time= 1303.8575649261475 ms
SUCCESS:20221218-130105: benchmark results – {'infer_path': 'cl-6110_onnxrt_imagenet1k_torchvision_resnet50_onnx', 'accuracy_top1%': 0.0, 'num_subgraphs': 1, 'infer_time_core_ms': 16129.723087, 'infer_time_subgraph_ms': 41.80643, 'ddr_transfer_mb': 82.945088, 'perfsim_time_ms': 0.0, 'perfsim_ddr_transfer_mb': 0.0, 'perfsim_gmacs': 0.0}
594558.119588 s: VX_ZONE_INIT:[tivxHostDeInitLocal:107] De-Initialization Done for HOST !!!
594558.123975 s: VX_ZONE_INIT:[tivxDeInitLocal:223] De-Initialization Done !!!
APP: Deinit … !!!
REMOTE_SERVICE: Deinit … !!!
REMOTE_SERVICE: Deinit … Done !!!
IPC: Deinit … !!!
IPC: DeInit … Done !!!
MEM: Deinit … !!!
DDR_SHARED_MEM: Alloc's: 7 alloc's of 26958100 bytesDDR_SHARED_MEM: Free's : 7 free's of 26958100 bytesDDR_SHARED_MEM: Open's : 0 allocs of 0 bytesDDR_SHARED_MEM: Total size: 536870912 bytesMEM: Deinit … Done !!!
APP: Deinit … Done !!!
Nancy Wang:
我将您的问题升级到了英文论坛,会有产品线专家给您支持,请及时跟进。
e2e.ti.com/…/tda4vm-the-edgeai-benchmark-production-model-cannot-run-on-tda4vm
,
Nancy Wang:
英文论坛已有回复,请及时跟进。
,
Jay Meng:
OK,TKS!
I changed setup_pc.sh according to your answer. Now running cl-6110_onnxrt_imagenet1k_torchvision_resnet50_on is successful, but other models also have errors, such as
1> running od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx error log:
INFO:20221218-081136: starting – od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnxINFO:20221218-081136: model_path – /opt/edgeai-modelzoo/models/vision/detection/coco/edgeai-mmdet/ssd_mobilenetv2_lite_512x512_20201214_model.onnxINFO:20221218-081136: model_file – /opt/edgeai-benchmark/work_dirs/modelartifacts/TDA4VM/8bits/od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx/model/ssd_mobilenetv2_lite_512x512_20201214_model.onnxDownloading 1/1: /opt/edgeai-modelzoo/models/vision/detection/coco/edgeai-mmdet/ssd_mobilenetv2_lite_512x512_20201214_model.onnxDownloading software-dl.ti.com/…/ssd_mobilenetv2_lite_512x512_20201214_model.onnx to /opt/edgeai-benchmark/work_dirs/modelartifacts/TDA4VM/8bits/od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx/model/ssd_mobilenetv2_lite_512x512_20201214_model.onnx12795904it [00:53, 240803.44it/s] Download done for /opt/edgeai-modelzoo/models/vision/detection/coco/edgeai-mmdet/ssd_mobilenetv2_lite_512x512_20201214_model.onnxTraceback (most recent call last): File "/opt/edgeai-benchmark/edgeai_benchmark/pipelines/pipeline_runner.py", line 154, in _run_pipeline result = cls._run_pipeline_impl(settings, pipeline_config, description) File "/opt/edgeai-benchmark/edgeai_benchmark/pipelines/pipeline_runner.py", line 125, in _run_pipeline_impl accuracy_result = accuracy_pipeline(description) File "/opt/edgeai-benchmark/edgeai_benchmark/pipelines/accuracy_pipeline.py", line 103, in __call__ self.session.start() File "/opt/edgeai-benchmark/edgeai_benchmark/sessions/onnxrt_session.py", line 47, in start super().start() File "/opt/edgeai-benchmark/edgeai_benchmark/sessions/basert_session.py", line 140, in start self.get_model() File "/opt/edgeai-benchmark/edgeai_benchmark/sessions/basert_session.py", line 402, in get_model optimization_done = self._optimize_model(is_new_file=(not model_file_exists)) File "/opt/edgeai-benchmark/edgeai_benchmark/sessions/basert_session.py", line 443, in _optimize_model from osrt_model_tools.onnx_tools import onnx_model_opt as onnxoptModuleNotFoundError: No module named 'osrt_model_tools'No module named 'osrt_model_tools'
2> running od-8000_onnxrt_coco_mlperf_ssd_resnet34-ssd1200_onnx error log:
Final number of subgraphs created are : 1, – Offloaded Nodes – 186, Total Nodes – 186 APP: Init … !!!MEM: Init … !!!MEM: Initialized DMA HEAP (fd=5) !!!MEM: Init … Done !!!IPC: Init … !!!IPC: Init … Done !!!REMOTE_SERVICE: Init … !!!REMOTE_SERVICE: Init … Done !!!748472.194157 s: GTC Frequency = 200 MHzAPP: Init … Done !!!748472.194182 s: VX_ZONE_INIT:Enabled748472.194190 s: VX_ZONE_ERROR:Enabled748472.194197 s: VX_ZONE_WARNING:Enabled748472.194537 s: VX_ZONE_INIT:[tivxInitLocal:145] Initialization Done !!!748472.194579 s: VX_ZONE_INIT:[tivxHostInitLocal:93] Initialization Done for HOST !!!748472.255784 s: VX_ZONE_ERROR:[ownContextSendCmd:802] Command ack message returned failure cmd_status: -1748472.255811 s: VX_ZONE_ERROR:[ownContextSendCmd:838] tivxEventWait() failed.748472.255824 s: VX_ZONE_ERROR:[ownNodeKernelInit:525] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode748472.255834 s: VX_ZONE_ERROR:[ownNodeKernelInit:526] Please be sure the target callbacks have been registered for this core748472.255843 s: VX_ZONE_ERROR:[ownNodeKernelInit:527] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel748472.255853 s: VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:3 … failed !!!748472.255865 s: VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed748472.255874 s: VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failedTIDL_RT_OVX: ERROR: Verifying TIDL graph … Failed !!!TIDL_RT_OVX: ERROR: Verify OpenVX graph failedinfer 3/50: od-8000_onnxrt_coco_mlperf_ssd_resnet34-ssd1200_| | 0% 0/1| [< ]748472.369270 s: VX_ZONE_ERROR:[ownContextSendCmd:802] Command ack message returned failure cmd_status: -1748472.369299 s: VX_ZONE_ERROR:[ownContextSendCmd:838] tivxEventWait() failed.748472.369313 s: VX_ZONE_ERROR:[ownNodeKernelInit:525] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode748472.369322 s: VX_ZONE_ERROR:[ownNodeKernelInit:526] Please be sure the target callbacks have been registered for this core748472.369331 s: VX_ZONE_ERROR:[ownNodeKernelInit:527] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel748472.369341 s: VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:3 … failed !!!748472.369354 s: VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed748472.369363 s: VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed748472.369496 s: VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:799] graph is not in a state required to be scheduled748472.369507 s: VX_ZONE_ERROR:[vxProcessGraph:734] schedule graph failed748472.369517 s: VX_ZONE_ERROR:[vxProcessGraph:739] wait graph failedERROR: Running TIDL graph … Failed !!!infer 3/50: od-8000_onnxrt_coco_mlperf_ssd_resnet34-ssd1200_| 100%|##########|| 1/1 [00:00<00:00, 8.53it/s]*** mgg *** description= 3/50 run_dir_base= od-8000_onnxrt_coco_mlperf_ssd_resnet34-ssd1200_onnx elapsed_time= 3147.7415561676025 msSUCCESS:20221218-075935: benchmark results – {'infer_path': 'od-8000_onnxrt_coco_mlperf_ssd_resnet34-ssd1200_onnx', 'accuracy_ap[.5:.95]%': 0.0, 'accuracy_ap50%': 0.0, 'num_subgraphs': 1, 'infer_time_core_ms': 7310027617237.317, 'infer_time_subgraph_ms': 34.935245, 'ddr_transfer_mb': 74.958144, 'perfsim_time_ms': 0.0, 'perfsim_ddr_transfer_mb': 0.0, 'perfsim_gmacs': 0.0}748472.899576 s: VX_ZONE_INIT:[tivxHostDeInitLocal:107] De-Initialization Done for HOST !!!748472.903047 s: VX_ZONE_INIT:[tivxDeInitLocal:223] De-Initialization Done !!!APP: Deinit … !!!REMOTE_SERVICE: Deinit … !!!REMOTE_SERVICE: Deinit … Done !!!IPC: Deinit … !!!IPC: DeInit … Done !!!MEM: Deinit … !!!DDR_SHARED_MEM: Alloc's: 9 alloc's of 25549012 bytes DDR_SHARED_MEM: Free's : 9 free's of 25549012 bytes DDR_SHARED_MEM: Open's : 0 allocs of 0 bytes DDR_SHARED_MEM: Total size: 536870912 bytes MEM: Deinit … Done !!!APP: Deinit … Done !!!
,
Nancy Wang:
已跟进。
,
Jay Meng:
OK,TKS!
Now there is new errors, detailed log:
INFO:20230518-081053: starting process on parallel_device – 0INFO:20230518-081053: starting – cl-0016_onnxrt_imagenet1k_torchvision_mobilenetv2_onnxINFO:20230518-081053: model_path – /home/cambricon/work/ai/edgeai-modelzoo/models/vision/classification/imagenet1k/torchvision/mobilenetv2.onnxINFO:20230518-081053: model_file – /home/cambricon/work/ai/edgeai-benchmark/work_dirs/modelartifacts/TDA4VM/8bits/cl-0016_onnxrt_imagenet1k_torchvision_mobilenetv2_onnx/model/mobilenetv2.onnxINFO:20230518-081053: running – cl-0016_onnxrt_imagenet1k_torchvision_mobilenetv2_onnxINFO:20230518-081053: pipeline_config – {'task_type': 'classification', 'dataset_category': 'imagenet', 'calibration_dataset': <edgeai_benchmark.datasets.imagenet.ImageNetCls object at 0x7f81afe78dd0>, 'input_dataset': <edgeai_benchmark.datasets.imagenet.ImageNetCls object at 0x7f81afe78810>, 'postprocess': <edgeai_benchmark.postprocess.PostProcessTransforms object at 0x7f81ad429d10>, 'preprocess': <edgeai_benchmark.preprocess.PreProcessTransforms object at 0x7f81ad429c50>, 'session': <edgeai_benchmark.sessions.onnxrt_session.ONNXRTSession object at 0x7f816a32b0d0>, 'model_info': {'metric_reference': {'accuracy_top1%': 69.76}, 'model_shortlist': 30}}INFO:20230518-081053: infer – cl-0016_onnxrt_imagenet1k_torchvision_mobilenetv2_onnx – this may take some time…libtidl_onnxrt_EP loaded 0x55f6367c7060 ******** WARNING ******* : Could not open /home/cambricon/work/ai/edgeai-benchmark/work_dirs/modelartifacts/TDA4VM/8bits/cl-0016_onnxrt_imagenet1k_torchvision_mobilenetv2_onnx/artifacts/allowedNode.txt for reading… Entire model will run on ARM without any delegation to TIDL !Final number of subgraphs created are : 1, – Offloaded Nodes – 0, Total Nodes – 0 infer : cl-0016_onnxrt_imagenet1k_torchvision_mobilenetv2_on| 0%| || 0/5 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/pipelines/pipeline_runner.py", line 154, in _run_pipeline result = cls._run_pipeline_impl(settings, pipeline_config, description) File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/pipelines/pipeline_runner.py", line 125, in _run_pipeline_impl accuracy_result = accuracy_pipeline(description) File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/pipelines/accuracy_pipeline.py", line 122, in __call__ param_result = self._run(description=description) File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/pipelines/accuracy_pipeline.py", line 164, in _run output_list = self._infer_frames(description) File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/pipelines/accuracy_pipeline.py", line 229, in _infer_frames output, info_dict = self._run_with_log(session.infer_frame, data, info_dict) File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/pipelines/accuracy_pipeline.py", line 302, in _run_with_log return func(*args, **kwargs) File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/sessions/onnxrt_session.py", line 105, in infer_frame outputs = self.interpreter.run(output_keys, input_dict) File "/home/edgeai/.pyenv/versions/py36/lib/python3.6/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 188, in run return self._sess.run(output_names, input_feed, run_options)onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(uint8)) , expected: (tensor(float))[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(uint8)) , expected: (tensor(float))TASKS | 100%|##########|| 2/2 [00:06<00:00, 3.10s/it]
,
Nancy Wang:
已跟进。
,
Nancy Wang:
since you had the error earlier, there are some that causes issue. Please delete the modelartifacts/TDA4VM/8bits folder and try again.