我用的是8168 dvr_rdk 4.03,我在ccs下面优化一段代码后,从profile里面的cycles计算出来是12ms左右,但是我把这段代码集成到rdk里面计算出来的是300ms,相差是相当大的,而且代码也相对简单,代码如下:
#define NUM_SAMPLES 16
static short c_xoff[16] = {-1,0,1,-1,1,-1,0,1,0,-1,0,1,-1,1,-1,0};
static short c_c_xoff[16] = {-2,0,2,-2,2,-2,0,2,0,-2,0,2,-2,2,-2,0};
static short c_yoff[16] = {-1,0,1,-1,1,-1,0,1,0,-1,0,1,-1,1,-1,0};
extern unsigned char m_samples_c[288][352][20];extern unsigned char m_samples_c_c[144][352][20];extern unsigned char m_mask_c[288][352];
extern unsigned char yuv_c[288][352];
extern unsigned char yuv_c_c[144][352];
for( i = y_s+1; i < y_e-1; i++)
{ for( j = x_s+1; j < x_e-1; j++)
{
for( k = 0 ; k < NUM_SAMPLES; k+=2)
{
row = i + c_yoff[k];
col = j + c_xoff[k];
m_samples_c[i][j][k] = yuv_c_y[row][col];
k++;
row = i + c_yoff[k];
col = j + c_xoff[k];
m_samples_c[i][j][k] = yuv_c_y[row][col];
}
if(up)
{
y_c = i >>1;
for( k = 0 ; k < NUM_SAMPLES; k+=2)
{
row = y_c + c_yoff[k];
col = j + c_c_xoff[k];
m_samples_c_c[y_c][j][k] = yuv_c_uv[row][col];
k++;
row = y_c + c_yoff[k];
col = j + c_c_xoff[k];
m_samples_c_c[y_c][j][k] = yuv_c_uv[row][col];
}
}
}
}
请问是怎么回事呢???????????????????????????????????????????????????????????????
Chris Meng:
你好,
1. 在CCS里建议使用dsp核自带timer来统计时间。
2. 集成时候的cache配置是否和ccs里面配置一致?
map dog:
回复 Chris Meng:
是使用dsp核自带timer来统计时间的,rdk里面的cache设置也和ccs里面的一致
Chris Meng:
回复 map dog:
两种方法的编译选项是否一致?