记一次cocos2d-x游戏android崩溃排查

by Galoisplusplus - 六 21 11月 2015
Tags #cocos2d-x #CS #tech #android #Android #breakpad #debug #crash #崩溃 #游戏崩溃 #cocos #cocos2d #游戏开发 #手游开发 #game #mobile game #game devolopment

最近查google breakpad回传的crash log,发现了不少cocos2d::FileUtilsAndroid::getData引起的崩溃。崩溃日志的关键信息如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
Operating system: Android
                  0.0.0 Linux 3.4.0-3215177 #1 SMP PREEMPT Thu Nov 6 17:48:34 KST 2014 armv7l
CPU: arm
     ARMv7 Qualcomm Krait features: swp,half,thumb,fastmult,vfpv2,edsp,neon,vfpv3,tls,vfpv4,idiva,idivt
     4 CPUs

Crash reason:  SIGSEGV
Crash address: 0x0

Thread 11 (crashed)
 0  libc.so + 0x22284
     r0 = 0x00000000    r1 = 0x7a634048    r2 = 0x00001000    r3 = 0x5a007304
     r4 = 0x40090d9c    r5 = 0x01632924    r6 = 0x01632924    r7 = 0x40091334
     r8 = 0x01632924    r9 = 0x00001000   r10 = 0x7a634280   r12 = 0x00000035
     fp = 0x00001000    sp = 0x78a78390    lr = 0x00000280    pc = 0x4005b284
    Found by: given as instruction pointer in context
 1  libc.so + 0x25aaf
     sp = 0x78a783a0    pc = 0x4005eab1
    Found by: stack scanning
 2  libcocos2dlua.so!cocos2d::FileUtilsAndroid::getData [CCFileUtils-android.cpp : 273 + 0xb]
     sp = 0x78a783c8    pc = 0x77fe2415
    Found by: stack scanning
 3  libcocos2dlua.so!cocos2d::FileUtilsAndroid::getDataFromFile [CCFileUtils-android.cpp : 307 + 0x5]
     r4 = 0x78a78450    r5 = 0x7d307558    r6 = 0x78a78450    r7 = 0x7e851e54
     sp = 0x78a78438    pc = 0x77fe2523
    Found by: call frame info
 4  libcocos2dlua.so!cocos2d::FontFreeType::createFontObject [CCFontFreeType.cpp : 127 + 0x7]
     r4 = 0x78a785a4    r5 = 0x7d307558    r6 = 0x78a78450    r7 = 0x7e851e54
     sp = 0x78a78440    pc = 0x782c6565
    Found by: call frame info
 5  libcocos2dlua.so!cocos2d::FontFreeType::create [CCFontFreeType.cpp : 58 + 0x9]
     r4 = 0x7d307558    r5 = 0x00000000    r6 = 0x00000000    r7 = 0x00000000
     sp = 0x78a78470    pc = 0x782c665b
    Found by: call frame info
 6  libcocos2dlua.so!cocos2d::FontAtlasCache::getFontAtlasTTF [CCFontAtlasCache.cpp : 90 + 0x13]
     r0 = 0x78a785a4    r1 = 0x00000014    r2 = 0x00000000    r4 = 0x7877905c
     r5 = 0x78a784b4    r6 = 0x78a785a4    r7 = 0x78a784b4    sp = 0x78a78490
     pc = 0x782c22bd
    Found by: call frame info
 7  libcocos2dlua.so!cocos2d::Label::setTTFConfig [CCLabel.cpp : 438 + 0x3]
     r4 = 0x82d231a8    r5 = 0x7ceddca0    r6 = 0x78a785a4    r7 = 0x00000000
     sp = 0x78a78578    pc = 0x78249423
    Found by: call frame info
 8  libcocos2dlua.so!cocos2d::ui::Text::setFontName [UIText.cpp : 162 + 0x9]
     r3 = 0x78249415    r4 = 0x78a785a4    r5 = 0x7ceddca0    r6 = 0x82d234c0
     r7 = 0x00000398    sp = 0x78a78590    pc = 0x78203b41
    Found by: call frame info
 9  libcocos2dlua.so!cocostudio::TextReader::setPropsFromJsonDictionary [TextReader.cpp : 119 + 0x7]
     r4 = 0x7a0b1ec8    r5 = 0x7ceddca0    r6 = 0x78a78614    r7 = 0x78a78620
     sp = 0x78a785e0    pc = 0x781b42f1
    Found by: call frame info

很明显,这是由访问非法内存地址0x0所引起的segmentation fault,下面再来看call stack中所提示的CCFileUtils-android.cpp中第273行:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
        do
        {
            // read rrom other path than user set it
            //CCLOG("GETTING FILE ABSOLUTE DATA: %s", filename);
            const char* mode = nullptr;
            if (forString)
                mode = "rt";
            else
                mode = "rb";

            FILE *fp = fopen(fullPath.c_str(), mode);
            CC_BREAK_IF(!fp);

            long fileSize;
            fseek(fp,0,SEEK_END);
            fileSize = ftell(fp);
            fseek(fp,0,SEEK_SET);
            if (forString)
            {
                data = (unsigned char*) malloc(fileSize + 1);
                data[fileSize] = '\0';
            }
            else
            {
                data = (unsigned char*) malloc(fileSize);
            }
            fileSize = fread(data,sizeof(unsigned char), fileSize,fp);
            fclose(fp);

            size = fileSize;
        } while (0);
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    do
    {
        // read rrom other path than user set it
        //CCLOG("GETTING FILE ABSOLUTE DATA: %s", filename);
        const char* mode = nullptr;
        if (forString)
            mode = "rt";
        else
            mode = "rb";

        FILE *fp = fopen(fullPath.c_str(), mode);
        CC_BREAK_IF(!fp);

        long fileSize;
        fseek(fp,0,SEEK_END);
        fileSize = ftell(fp);
        fseek(fp,0,SEEK_SET);
        if (forString)
        {
            data = (unsigned char*) malloc(fileSize + 1);
            data[fileSize] = '\0';
        }
        else
        {
            data = (unsigned char*) malloc(fileSize);
        }
        fileSize = fread(data,sizeof(unsigned char), fileSize,fp);
        fclose(fp);

        size = fileSize;
    } while (0);

第273行是调用fread函数,由于fp已经通过CC_BREAK_IF(!fp);做了检查,那么很可能便是malloc的buffer为NULL所致。

为了进一步验证,我们再来看call stack顶层:

1
2
3
4
5
6
 0  libc.so + 0x22284
     r0 = 0x00000000    r1 = 0x7a634048    r2 = 0x00001000    r3 = 0x5a007304
     r4 = 0x40090d9c    r5 = 0x01632924    r6 = 0x01632924    r7 = 0x40091334
     r8 = 0x01632924    r9 = 0x00001000   r10 = 0x7a634280   r12 = 0x00000035
     fp = 0x00001000    sp = 0x78a78390    lr = 0x00000280    pc = 0x4005b284
    Found by: given as instruction pointer in context

基本可以确定是访问r0(0x0)了,那么r0的庐山真面目究竟如何呢?既然crash log说崩溃的机型所用的cpu是arm架构,那么我们便可以查阅arm的文档AAPCS(Procedure Call Standard for the ARM Architecture)。其中Parameter Passing这一节便提到:

1
2
The base standard provides for passing arguments in core registers (r0-r3) and on the stack. For subroutines that
take a small number of parameters, only registers are used, greatly reducing the overhead of a call.

r0刚好对应了第一个参数,这在我们的例子中就是freaddata。当然,在arm中,r0不仅可以是参数寄存器(argument register),还可能是结果寄存器(result register)和临时寄存器(scratch register)。但是,在这个fread所引起的崩溃中,基本可以排除result register的情况,因为fread的结果不是需要被访问的内存地址;scratch register倒还是有可能的,如果能找到设备系统的libc.so来objdump看下相关的汇编代码可以更确定些。

基本确认原因后,改起来便很容易了。目前我已向cocos2d-x提交了一个patchPR 14458

参考资料

[1]AAPCS

[2]ARM to C calling convention, registers to save

检查malloc的返回值是否为NULL的必要性不必我多说,可以参考以下的SOF链接:

[3]Under what circumstances can malloc return NULL?

[4]Is there a need to check for NULL after allocating memory, when kernel uses overcommit memory

关于fread函数:

[5]fread reference

Comments