近期公司产品出现概率性返修,经分析,与UBIFS error文件系统错误相关,请大神们解决!!问题描述如下:
【产品平台说明】
①硬件CPU:AM3354
②Linux内核版本:Linux version 3.2.0
③gcc编译器版本:gcc version 4.5.3 20110311 (prerelease)
④文件系统:ubifs
【问题现象】
(每个产品的问题不太一样,但是,都统一指向了UBIFS文件系统错误,如下列举两个现象供大神分析??)
第1种现象:系统无法正常启动,内核启动打印信息如下:
U-Boot SPL 2011.09 (Sep 11 2013 – 08:39:49)
Texas Instruments Revision detection unimplemented
U-Boot 2011.09 (Sep 11 2013 – 08:39:49)
DRAM: 256 MiB
NAND: 256 MiB
MMC: OMAP SD/MMC: 0
*** Warning – bad CRC, using default environment
Net: cpsw
Hit any key to stop autoboot: 0Booting from nand …
HW ECC BCH8 Selected
NAND read: device 0 offset 0x280000, size 0x500000
5242880 bytes read: OK
## Booting kernel from Legacy Image at 80007fc0 …
Image Name: Linux-3.2.0
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 2960864 Bytes = 2.8 MiB
Load Address: 80008000
Entry Point: 80008000
Verifying Checksum … OK
XIP Kernel Image … OK
OK
Starting kernel …
Uncompressing Linux… done, booting the kernel.
[ 0.000000] Linux version 3.2.0 (rm@localhost.localdomain) (gcc version 4.5.3 20110311 (prerelease) (GCC) ) #113 Mon Nov 3 17:05:49 CST 2014
………………………..
此处Linux内核正常启动,无异常信息(省略打印信息)……
……………………………..
[ 4.001856] rtc-s35390a 1-0030: setting system clock to 2000-01-02 21:31:24 UTC (946848684)
[ 4.078074] UBIFS: recovery needed
[ 4.235435] UBIFS: recovery completed
[ 4.239322] UBIFS: mounted UBI device 0, volume 0, name "rootfs"
[ 4.245663] UBIFS: file system size: 199225344 bytes (194556 KiB, 189 MiB, 1569 LEBs)
[ 4.254106] UBIFS: journal size: 9023488 bytes (8812 KiB, 8 MiB, 72 LEBs)
[ 4.261799] UBIFS: media format: w4/r0 (latest is w4/r0)
[ 4.267951] UBIFS: default compressor: lzo
[ 4.272260] UBIFS: reserved for root: 0 bytes (0 KiB)
[ 4.280961] VFS: Mounted root (ubifs filesystem) on device 0:13.
[ 4.288499] Freeing init memory: 608K
INIT: version 2.86 booting
/*此处开始打印UBIFS文件系统错误…….*/
[ 4.754005] UBIFS error (pid 823): ubifs_check_node: bad CRC: calculated 0xbe587da8, read 0xe45bdd67
[ 4.763690] UBIFS error (pid 823): ubifs_check_node: bad node at LEB 211:113632
[ 4.771406] UBIFS error (pid 823): ubifs_read_node: expected node type 9
[ 4.778479] UBIFS error (pid 823): ubifs_iget: failed to read inode 189, error -117
[ 4.786577] UBIFS error (pid 823): ubifs_lookup: dead directory entry 'default', error -117
[ 4.795383] UBIFS warning (pid 823): ubifs_ro_mode: switched to read-only mode, error -117
[ 4.804094] Backtrace:[ 4.806720] [<c0017978>] (dump_backtrace+0x0/0x110) from [<c03dc4ec>] (dump_stack+0x18/0x1c)
[ 4.815640] r6:cf37d000 r5:cf47a368 r4:60008400 r3:c05f6c48
[ 4.821655] [<c03dc4d4>] (dump_stack+0x0/0x1c) from [<c01736d0>] (ubifs_ro_mode+0x74/0x78)
[ 4.830387] [<c017365c>] (ubifs_ro_mode+0x0/0x78) from [<c016d5bc>] (ubifs_lookup+0x148/0x150)
[ 4.839461] r4:cf480298 r3:c05f6c48
[ 4.843253] [<c016d474>] (ubifs_lookup+0x0/0x150) from [<c00b0858>] (d_alloc_and_lookup+0x4c/0x6c)
[ 4.852694] r8:00000001 r7:00000000 r6:cfab1ed8 r5:cf480298 r4:cf47a368
[ 4.859812] [<c00b080c>] (d_alloc_and_lookup+0x0/0x6c) from [<c00b2810>] (do_lookup+0x254/0x34c)
[ 4.869070] r6:cfab1e44 r5:cfab1ed8 r4:cfab1e3c r3:00000000
[ 4.875080] [<c00b25bc>] (do_lookup+0x0/0x34c) from [<c00b2a3c>] (link_path_walk+0x134/0x7d0)
[ 4.884080] [<c00b2908>] (link_path_walk+0x0/0x7d0) from [<c00b44f8>] (path_openat+0xa4/0x398)
[ 4.893153] [<c00b4454>] (path_openat+0x0/0x398) from [<c00b48fc>] (do_filp_open+0x34/0x88)
[ 4.901972] [<c00b48c8>] (do_filp_open+0x0/0x88) from [<c00a6b00>] (do_sys_open+0xe8/0x180)
[ 4.910771] r7:00000001 r6:00000003 r5:00020000 r4:cf93c000
[ 4.916781] [<c00a6a18>] (do_sys_open+0x0/0x180) from [<c00a6bc0>] (sys_open+0x28/0x2c)
[ 4.925243] [<c00a6b98>] (sys_open+0x0/0x2c) from [<c0014280>] (ret_fast_syscall+0x0/0x30)
INIT: Entering runlevel: 51: can
/*此处开始打印UBIFS文件系统错误…….*/
[ 5.045726] UBIFS error (pid 826): ubifs_check_node: bad CRC: calculated 0xbe587da8, read 0xe45bdd67
[ 5.055414] UBIFS error (pid 826): ubifs_check_node: bad node at LEB 211:113632
[ 5.063116] UBIFS error (pid 826): ubifs_read_node: expected node type 9
[ 5.070190] UBIFS error (pid 826): ubifs_iget: failed to read inode 189, error -117
[ 5.078290] UBIFS error (pid 826): ubifs_lookup: dead directory entry 'default', error -117
/etc/init.d/rc: .: line 18: can't open /etc/default/rcS
Cannot create Qt/Embedded data directory: /tmp/qtembedded-0
Cannot create Qt/Embedded data directory: /tmp/qtembedded-0
Cannot create Qt/Embedded data directory: /tmp/qtembedded-0
Cannot create Qt/Embedded data directory: /tmp/qtembedded-0
Cannot create Qt/Embedded data directory: /tmp/qtembedded-0
Cannot create Qt/Embedded data directory: /tmp/qtembedded-0
Cannot create Qt/Embedded data directory: /tmp/qtembedded-0
Cannot create Qt/Embedded data directory: /tmp/qtembedded-0
Cannot create Qt/Embedded data directory: /tmp/qtembedded-0
Cannot create Qt/Embedded data directory: /tmp/qtembedded-0
INIT: Id "tty2" respawning too fast: disabled for 5 minutes
[ 9.584319] UBIFS error (pid 858): make_reservation: cannot reserve 160 bytes in jhead 1, error -30
[ 9.593872] UBIFS error (pid 858): ubifs_write_inode: can't write inode 5005, error -30
/*此时Linux系统直接死掉,不向下走了……….*/
第2种现象:系统启动成功,文件无法操作,操作文件会导致文件系统崩溃,变为只读:
U-Boot SPL 2011.09 (Sep 11 2013 – 08:39:49)
Texas Instruments Revision detection unimplemented
U-Boot 2011.09 (Sep 11 2013 – 08:39:49)
DRAM: 256 MiB
NAND: 256 MiB
MMC: OMAP SD/MMC: 0
*** Warning – bad CRC, using default environment
Net: cpsw
Hit any key to stop autoboot: 0Booting from nand …
HW ECC BCH8 Selected
NAND read: device 0 offset 0x280000, size 0x500000
5242880 bytes read: OK
## Booting kernel from Legacy Image at 80007fc0 …
Image Name: Linux-3.2.0
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 2960864 Bytes = 2.8 MiB
Load Address: 80008000
Entry Point: 80008000
Verifying Checksum … OK
XIP Kernel Image … OK
OK
Starting kernel …
Uncompressing Linux… done, booting the kernel.
[ 0.000000] Linux version 3.2.0 (rm@localhost.localdomain) (gcc version 4.5.3 20110311 (prerelease) (GCC) ) #113 Mon Nov 3 17:05:49 CST 2014
………………………..
此处Linux内核正常启动,无异常信息(省略打印信息)……
……………………………..
4.001908] rtc-s35390a 1-0030: setting system clock to 2000-01-04 01:28:44 UTC (946949324)
[ 4.078180] UBIFS: recovery needed
[ 4.180129] UBIFS: recovery completed
[ 4.184043] UBIFS: mounted UBI device 0, volume 0, name "rootfs"
[ 4.190372] UBIFS: file system size: 199225344 bytes (194556 KiB, 189 MiB, 1569 LEBs)
[ 4.198815] UBIFS: journal size: 9023488 bytes (8812 KiB, 8 MiB, 72 LEBs)
[ 4.206518] UBIFS: media format: w4/r0 (latest is w4/r0)
[ 4.212655] UBIFS: default compressor: lzo
[ 4.216996] UBIFS: reserved for root: 0 bytes (0 KiB)
[ 4.226220] VFS: Mounted root (ubifs filesystem) on device 0:13.
[ 4.233732] Freeing init memory: 608K
INIT: version 2.86 booting
Please wait: booting…
Starting udev
[ 5.093881] udevd (841): /proc/841/oom_adj is deprecated, please use /proc/841/oom_score_adj instead.
[ 9.827571] alignment: ignoring faults is unsafe on this CPU. Defaulting to fixup mode.
Root filesystem already rw, not remounting
Caching udev devnodes
ALSA: Restoring mixer settings…
Configuring network interfaces… /usr/sbin/alsactl: load_state:1625: No soundcards found…
[ 10.684082] net eth0: CPSW phy found : id is : 0x1cc915
done.
Setting up IP spoofing protection: rp_filter.
[ 10.784501] net eth1: CPSW phy found : id is : 0x1cc915
INIT: Entering runlevel: 5
Starting system message bus: dbus.
Starting telnet daemon.
Starting syslogd/klogd: done
Starting thttpd.
root@am335x:~# –1–fd=16—
[ 13.363970] net eth0: CPSW phy found : id is : 0x1cc915
[ 15.354352] PHY: 0:01 – Link is Up – 100/Full
/*此时Linux系统已经成功启动*/
/*下面对系统中的文件进行删除操作时,出错了………………*/
root@am335x:~# cd /home/ root@am335x:/home# ls
app app.tar.gz app_U chargeData mtar root
root@am335x:/home# rm -rf app.tar.gz /*删除一个文件*/ root@am335x:/home# cd /usr/ root@am335x:/usr# cd qt/lib/fonts/ root@am335x:/usr/qt/lib/fonts# rm wenquanyi_160_75.qpf /*删除一个文件*/ root@am335x:/usr/qt/lib/fonts# rm wenquanyi_160_50.qpf /*删除一个文件*/
/********经过上面的删除文件操作后,UBIFS文件系统开始报错,并变为只读***********/ [ 67.372753] UBIFS error (pid 2021): ubifs_check_node: bad CRC: calculated 0x8819f8fc, read 0xd9c5d8fd
[ 67.382503] UBIFS error (pid 2021): ubifs_check_node: bad node at LEB 224:384
[ 67.390038] UBIFS error (pid 2021): ubifs_read_node: expected node type 9
[ 67.397198] UBIFS warning (pid 2021): ubifs_ro_mode: switched to read-only mode, error -117
[ 67.406001] Backtrace:[ 67.408631] [<c0017978>] (dump_backtrace+0x0/0x110) from [<c03dc4ec>] (dump_stack+0x18/0x1c)
[ 67.417540] r6:ffffff8b r5:cf519048 r4:60008400 r3:c05f6c48
[ 67.423557] [<c03dc4d4>] (dump_stack+0x0/0x1c) from [<c01736d0>] (ubifs_ro_mode+0x74/0x78)
[ 67.432288] [<c017365c>] (ubifs_ro_mode+0x0/0x78) from [<c016a0b4>] (ubifs_jnl_delete_inode+0x94/0xb4)
[ 67.442097] r4:cf37d000 r3:00000000
[ 67.445899] [<c016a020>] (ubifs_jnl_delete_inode+0x0/0xb4) from [<c016efd8>] (ubifs_evict_inode+0xc0/0x100)
[ 67.456164] r9:cfb8a000 r8:c0014428 r7:0000000a r6:c03f2688 r5:cf37d000
[ 67.463184] r4:cf519048
[ 67.465985] [<c016ef18>] (ubifs_evict_inode+0x0/0x100) from [<c00bcebc>] (evict+0x7c/0x160)
[ 67.474791] r5:c03f2688 r4:cf519048
[ 67.478578] [<c00bce40>] (evict+0x0/0x160) from [<c00bd0b0>] (iput+0x110/0x1b0)
[ 67.486282] r5:cf376600 r4:cf519048
[ 67.490078] [<c00bcfa0>] (iput+0x0/0x1b0) from [<c00b3a2c>] (do_unlinkat+0x120/0x164)
[ 67.498329] r6:cf519048 r5:cf58d118 r4:00000000 r3:00000000
[ 67.504341] [<c00b390c>] (do_unlinkat+0x0/0x164) from [<c00b4d3c>] (sys_unlink+0x18/0x1c)
[ 67.512944] r6:00000000 r5:00000000 r4:bebd1da9
[ 67.517868] [<c00b4d24>] (sys_unlink+0x0/0x1c) from [<c0014280>] (ret_fast_syscall+0x0/0x30)
[ 67.526763] UBIFS error (pid 2021): ubifs_evict_inode: can't delete inode 8398, error -117
请大神们解决,紧急啊!!!!!
Jian Zhou:
问下,客户大概是一种什么样的应用场景,是不是会经常出现异常掉电。
jinlei zhang:
回复 Jian Zhou:
你好!
非常感谢你的及时回答!!
我们分析此问题的开始,也是怀疑产品频繁掉电引起的,为验证此原因,我们做了工装平台,做频繁上下电实验:
实验1:开机启动到命令行—>立刻断电—>停留5s 持续了3天3夜(估算:3000次左右)
实验2:开机未启动完成(内核刚启动一半)—>立刻断电—>停留5s 持续了1夜(估算:700次左右)
但是,无法复现市场上返回的产品的问题现象!
目前,我们苦于无法复现问题现象,无法分析出此问题的根本原因?
请问:
①可能的原因有哪些?是内核的原因还是UBI-FS文件系统的原因?
②与产品的应用程序是否有关联?如:应用程序做哪些操作可能会导致此问题?
③目前TI的客户中,有没有遇到此类问题?怎么解决的?
盼复,谢谢!!
Jian Zhou:
回复 jinlei zhang:
如果是工业环境,建议换成鲁棒性更强的YAFFS文件系统。
或者把根文件系统做成只读的
jinlei zhang:
回复 Jian Zhou:
你好!
非常感谢你提供的方案!多谢多谢!!
我们的产品应用环境是工业环境!
你提供的两个方案有没有参考文档&资料:
①如何制作鲁棒性更强的YAFFS文件系统,TI有没有直接可以在直接用的?(硬件CPU:AM3354 Linux内核版本:Linux version 3.2.0)
②如何将根文件系统做成只读?
另外:
由于涉及市场产品的召回返工,动作比较大,为了确保方案的有效性,我们想知道:
目前TI的客户有没有遇到类似问题的?他们是如何解决的?最终是否杜绝了产品问题?
盼复!!谢谢!!
Yaoming Qin:
回复 jinlei zhang:
文件系统是一个纯软的,基于mtd驱动,可以脱离具体的平台,可以参考 http://processors.wiki.ti.com/index.php/Create_a_YAFFS_Target_Image?keyMatch=yaffs&tisearch=Search-EN 或者更多的baidu上的资料。
Yaoming Qin:
回复 jinlei zhang:
关于,文件系统只读的设置,是在uboot中的命令行参数把rw改为ro
leo chen:
回复 Yaoming Qin:
不知用ramfs可不可行,这样每次系统重启都是初始状态
young young:
你好!
同行你的问题解决了吗?这个问题很多人遇到的TI一直没有能解决问题的答复。
我感觉是ECC纠错的问题,我也遇到和你同样问题,我用这个型号TC58NVG1S3HTA00的NAND FLASH就会出现
ECC是8bit的就会出现,那个出问题的机器,你可要看看nand是不是那个坏块正好在文件系统的分区上,如果是的话开关机多次后
只要对这个坏块区读写数据操作了,就会出现你发的那个报错的打印,系统起不来我们现在还没有解决。
你的情况怎么样??
jinlei zhang:
回复 young young:
你好!
我们的问题还没有解决!目前还在想办法…..
young young:
回复 jinlei zhang:
你好!
希望能一起研究这个问题。我感觉是ECC纠错出了问题,你是用BCH8吗?
只要文件系统分区上有坏块,就会出现文件系统起不了。