hi:
大家好!
问题描述:
下面两个反馈信息,分别是优化前后的。优化前的 II = 33.优化后的II = 10 。按照我的理解
应该是优化后 II = 10 的效率明显要比II = 33的要好很多吧,为什么测试时间却是II = 33时间少。
II =10反而耗时更多。难道耗时在于数据存储,而不在于计算?如何解释?谢谢!
Bef Optimization:
;*—————————————————————————-*
;* SOFTWARE PIPELINE INFORMATION
;*
;* Loop source line : 166
;* Loop opening brace source line : 167
;* Loop closing brace source line : 186
;* Known Minimum Trip Count : 1
;* Known Max Trip Count Factor : 1
;* Loop Carried Dependency Bound(^) : 32
;* Unpartitioned Resource Bound : 9
;* Partitioned Resource Bound(*) : 11
;* Resource Partition:
;* A-side B-side
;* .L units 3 5
;* .S units 5 7
;* .D units 3 5
;* .M units 4 6
;* .X cross paths 8 10
;* .T address paths 2 4
;* Long read paths 0 0
;* Long write paths 0 0
;* Logical ops (.LS) 0 0 (.L or .S unit)
;* Addition ops (.LSD) 7 16 (.L or .S or .D unit)
;* Bound(.L .S .LS) 4 6
;* Bound(.L .S .D .LS .LSD) 6 11*
;*
;* Searching for software pipeline schedule at …
;* ii = 32 Did not find schedule
;* ii = 33 Schedule found with 1 iterations in parallel
Aft optimization:
;*—————————————————————————-*
;* SOFTWARE PIPELINE INFORMATION
;*
;* Loop source line : 166
;* Loop opening brace source line : 167
;* Loop closing brace source line : 186
;* Known Minimum Trip Count : 1
;* Known Max Trip Count Factor : 1
;* Loop Carried Dependency Bound(^) : 5
;* Unpartitioned Resource Bound : 9
;* Partitioned Resource Bound(*) : 10
;* Resource Partition:
;* A-side B-side
;* .L units 5 3
;* .S units 6 5
;* .D units 5 3
;* .M units 3 7
;* .X cross paths 7 10*
;* .T address paths 2 4
;* Long read paths 0 0
;* Long write paths 0 0
;* Logical ops (.LS) 0 0 (.L or .S unit)
;* Addition ops (.LSD) 10 13 (.L or .S or .D unit)
;* Bound(.L .S .LS) 6 4
;* Bound(.L .S .D .LS .LSD) 9 8
;*
;* Searching for software pipeline schedule at …
;* ii = 10 Schedule found with 5 iterations in parallel
;*
Best Regards!
Louis:
这个需要结合profile的信息来确认。
Armstrong:
回复 Louis:
Louis
这个需要结合profile的信息来确认。
Louis:
回复 Armstrong:
你好,
请问你是如何得到的两者(ii=33, ii=10)的实际时间的快慢比较的?实际运行还是profile?
Armstrong:
回复 Louis:
Louis
你好,
请问你是如何得到的两者(ii=33, ii=10)的实际时间的快慢比较的?实际运行还是profile?
Louis:
回复 Armstrong:
建议用仿真一遍,用profile的结果来比较,在结果中会详细列出cpu的阻塞时间和实际运行时间。另外你的两个代码和数据是不是全部放在L2里面跑的?