본문 바로가기

Kernel Crash Case-Studies

[KernelCrash] Abort at 0xecb29f00(defective device)

Analysis 

Kernel log says that program counter indicates invalid address(0xecb29f00) as below.

[  257.879321 / 01-01 00:04:20.499][1] Unable to handle kernel paging request at virtual address ecb29f00

[  257.879343 / 01-01 00:04:20.499][1] pgd = c4ebc000

[  257.879355 / 01-01 00:04:20.499][0] [ecb29f00] *pgd=6ca1141e(bad)

[  257.879372 / 01-01 00:04:20.499][1] Internal error: Oops: 8000000d [#1] PREEMPT SMP ARM

[  257.879384 / 01-01 00:04:20.499][0] Modules linked in: texfat(PO)

[  257.879403 / 01-01 00:04:20.499][1] CPU: 1 PID: 384 Comm: ueventd Tainted: P        W  O   3.18.31-perf-gd069b48-00001-g8a6d6e5 #1

[  257.879416 / 01-01 00:04:20.499][1] task: eccc4d00 ti: c4eaa000 task.ti: c4eaa000

[  257.879429 / 01-01 00:04:20.499][1] PC is at 0xecb29f00

[  257.879447 / 01-01 00:04:20.499][1] LR is at security_context_to_sid_core+0x184/0x1b0

[  257.879462 / 01-01 00:04:20.499][1] pc : [<ecb29f00>]    lr : [<c033ab00>]    psr: 80030013

[  257.879462 / 01-01 00:04:20.499][1] sp : c4eabed8  ip : e14870c0  fp : b0d70e50

[  257.879479 / 01-01 00:04:20.499][1] r10: 00000000  r9 : c4eaa000  r8 : 00000027

[  257.879492 / 01-01 00:04:20.499][1] r7 : c4e2bcc0  r6 : ebef5cc0  r5 : c4e2e1c0  r4 : 00000000

[  257.879504 / 01-01 00:04:20.499][1] r3 : c193e700  r2 : 00000000  r1 : c193e700  r0 : 00000000

[  257.879517 / 01-01 00:04:20.499][1] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user


As the linked register(R14) holds 0xc033ab00(security_context_to_sid_core+0x184), ueventd process has been running under the following callstacks.

-000|security_context_to_sid_core(scontext = 0xDEAD, scontext_len = 0, sid = 0xB

-001|security_context_to_sid(?, ?, ?, ?)

-002|selinux_inode_setsecurity(inode = 0x0, ?, value = 0x0, size = 0, flags = 0)

-003|selinux_inode_notifysecctx(?, ?, ?)

-004|security_inode_notifysecctx(?, ?, ?)

-005|kernfs_type(inline)

-005|kernfs_refresh_inode(kn = 0xFFFFFFFF, inode = 0x80030013)

-006|kernfs_iop_getattr(?, ?, stat = 0xC4EABF50)

-007|vfs_getattr_nosec(?, ?)

-008|vfs_fstatat(dfd = 0, filename = 0xC4E2BCC0, stat = 0xC4EABF50, ?)

-009|SYSC_fstatat64(inline)

-009|sys_fstatat64(?, ?, statbuf = -1328228292, ?)

-010|ret_fast_syscall(asm)


When program counter is 0xc033a980, its stack address should have been holding 0xC4EABE94.

In case of 0xc033a984 address, the stack address is updated as 0xC4EABE58(0xC4EABE94-0x3c)

0xc033a97c <security_context_to_sid_core>:      push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}

0xc033a980 <security_context_to_sid_core+4>:    subs    r5, r1, #0

0xc033a984 <security_context_to_sid_core+8>:    sub     sp, sp, #60     ; 0x3c


I am suspecting that stack pop operation is not executed actually according to ARM calling conventions inside below functions.

0xc033a97c <security_context_to_sid_core>:      push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}


0xc033a9c0 <security_context_to_sid_core+68>:   bl      0xc038542c <strcmp>

0xc033a9fc <security_context_to_sid_core+128>:  bl      0xc020b974 <__kmalloc>

0xc033aa10 <security_context_to_sid_core+148>:  bl      0xc037cb00 <memcpy>

0xc033aa28 <security_context_to_sid_core+172>:  bl      0xc01e806c <kstrdup>

0xc033aa44 <security_context_to_sid_core+200>:  bl      0xc0ee8d48 <_raw_read_lock>

0xc033aa5c <security_context_to_sid_core+224>:  bl      0xc033a060 <string_to_context_struct>

0xc033aaa0 <security_context_to_sid_core+292>:  bl      0xc03340a0 <sidtab_context_to_sid>

0xc033aac8 <security_context_to_sid_core+332>:  bl      0xc03333e8 <ebitmap_destroy>

0xc033aadc <security_context_to_sid_core+352>:  bl      0xc037d260 <__memzero>

0xc033aaf4 <security_context_to_sid_core+376>:  bl      0xc020b2f8 <kfree>


(For example)

-000|string_to_context_struct(pol = 0xDEAD, sidtabp = 0x0, scontext = 0xBEEF, sc

-001|security_context_to_sid_core(?, scontext_len = 0, sid = 0x0, ?, gfp_flags =

-002|security_context_to_sid(?, ?, ?, ?)

-003|selinux_inode_setsecurity(inode = 0x0, ?, value = 0x0, size = 0, flags = 0)

-004|selinux_inode_notifysecctx(?, ?, ?)

-005|security_inode_notifysecctx(?, ?, ?)

-006|kernfs_type(inline)

-006|kernfs_refresh_inode(kn = 0xFFFFFFFF, inode = 0x80030013)

-007|kernfs_iop_getattr(?, ?, stat = 0xC4EABF50)

-008|vfs_getattr_nosec(?, ?)

-009|vfs_fstatat(dfd = 0, filename = 0xC4E2BCC0, stat = 0xC4EABF50, ?)

-010|SYSC_fstatat64(inline)

-010|sys_fstatat64(?, ?, statbuf = -1328228292, ?)

-011|ret_fast_syscall(asm)


After the above functions are executed, the stack address should have been updated as 0xC4EABE58 instead of 0xc4eabed8.  

[  257.879462 / 01-01 00:04:20.499][1] pc : [<ecb29f00>]    lr : [<c033ab00>]    psr: 80030013

[  257.879462 / 01-01 00:04:20.499][1] sp : c4eabed8  ip : e14870c0  fp : b0d70e50

[  257.879479 / 01-01 00:04:20.499][1] r10: 00000000  r9 : c4eaa000  r8 : 00000027


The stack dump can be compared between this coredump and normal operation.

this ramdump

the stack in case of normal operation

NSD:C4EABE90| 00 00 00 00  0x0

NSD:C4EABE94| 00 E7 93 C1  0xC193E700         \\vmlinux\Global\__tracepoint_kfree

NSD:C4EABE98| 00 00 00 00  0x0

NSD:C4EABE9C| 00 E7 93 C1  0xC193E700         \\vmlinux\Global\__tracepoint_kfree

NSD:C4EABEA0| 00 00 00 00  0x0

NSD:C4EABEA4| C0 E1 E2 C4  0xC4E2E1C0

NSD:C4EABEA8| C0 5C EF EB  0xEBEF5CC0

NSD:C4EABEAC| C0 BC E2 C4  0xC4E2BCC0

NSD:C4EABEB0| 27 00 00 00  0x27               \\vmlinux\Global\cpu_v7_suspend_size+0x3

NSD:C4EABEB4| 00 A0 EA C4  0xC4EAA000 //<<-R14 should have been \\security_context_to_sid+0x14 

NSD:C4EABEB8| 00 00 00 00  0x0  // <<-- new SP

NSD:C4EABEBC| 50 0E D7 B0  0xB0D70E50

NSD:C4EABEC0| C0 70 48 E1  0xE14870C0

NSD:C4EABEC4| D8 BE EA C4  0xC4EABED8 //<<-- should have been R14  \\selinux_inode_setsecurity+0x48

NSD:C4EABEC8| 00 AB 33 C0  0xC033AB00         \\vmlinux\services\security_context_to_sid_core+0x184

NSD:C4EABECC| 00 9F B2 EC  0xECB29F00

NSD:C4EABED0| 13 00 03 80  0x80030013

NSD:C4EABED4| FF FF FF FF  0xFFFFFFFF

NSD:C4EABED8| 50 BF EA C4  0xC4EABF50  //<<-- SP address at the moment of kernel panic

NSD:C4EABEDC| 00 9F B2 EC  0xECB29F00

NSD:C4EABEE0| 50 BF EA C4  0xC4EABF50

NSD:C4EABEE4| 9C A3 32 C0  0xC032A39C

NSD:C4EABEE8| 00 00 00 00  0x0

NSD:C4EABEEC| C0 BC E2 C4  0xC4E2BCC0

NSD:C4EABE90|_00_00_00_00__0x0

NSD:C4EABE94| CC CC 00 00  0xCCCC //<-- R4 where new SP

NSD:C4EABE98| 00 00 00 00  0x0

NSD:C4EABE9C| 00 00 00 00  0x0

NSD:C4EABEA0| 00 00 00 00  0x0

NSD:C4EABEA4| 00 00 00 00  0x0

NSD:C4EABEA8| 00 00 00 00  0x0

NSD:C4EABEAC| 00 00 00 00  0x0

NSD:C4EABEB0| 00 00 00 00  0x0  //<-- R11

NSD:C4EABEB4| E4 C5 33 C0  0xC033C5E4     //<<-R14 \\security_context_to_sid+0x14

NSD:C4EABEB8| AD DE 00 00  0xDEAD  //<<--R0 where new SP

NSD:C4EABEBC| 00 00 00 00  0x0        //<<--R1

NSD:C4EABEC0| EF BE 00 00  0xBEEF    //<<--R2

NSD:C4EABEC4| D8 A2 32 C0  0xC032A2D8       //<<--R14  \\selinux_inode_setsecurity+0x48

NSD:C4EABEC8| 00 AB 33 C0  0xC033AB00   //<-- R0, where new SP      \\security_context_to_sid_core+0x184

NSD:C4EABECC| 00 9F B2 EC  0xECB29F00    //<-- R1

NSD:C4EABED0| 13 00 03 80  0x80030013    //<-- R4

NSD:C4EABED4| FF FF FF FF  0xFFFFFFFF      //<-- R5

NSD:C4EABED8| 50 BF EA C4  0xC4EABF50  //<-- R6

NSD:C4EABEDC| 00 9F B2 EC  0xECB29F00  //<-- R7

NSD:C4EABEE0| 50 BF EA C4  0xC4EABF50  //<-- R8

NSD:C4EABEE4| 9C A3 32 C0  0xC032A39C         \\vmlinux\hooks\selinux_inode_notifysecctx+0x20 //<-- R14

NSD:C4EABEE8| 00 00 00 00  0x0   //<<-- SP

NSD:C4EABEEC| C0 BC E2 C4  0xC4E2BCC0

Weird signature is that PC should have been 0xFFFF_FFFF instead of 0xecb29f00.


Another scenario I can think of is that stack is scribbled inside below functions (i.e: memcpy of out-of-bound) 

0xc033a97c <security_context_to_sid_core>:      push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}


0xc033a9c0 <security_context_to_sid_core+68>:   bl      0xc038542c <strcmp>

0xc033a9fc <security_context_to_sid_core+128>:  bl      0xc020b974 <__kmalloc>

0xc033aa10 <security_context_to_sid_core+148>:  bl      0xc037cb00 <memcpy>

0xc033aa28 <security_context_to_sid_core+172>:  bl      0xc01e806c <kstrdup>

0xc033aa44 <security_context_to_sid_core+200>:  bl      0xc0ee8d48 <_raw_read_lock>

0xc033aa5c <security_context_to_sid_core+224>:  bl      0xc033a060 <string_to_context_struct>

0xc033aaa0 <security_context_to_sid_core+292>:  bl      0xc03340a0 <sidtab_context_to_sid>

0xc033aac8 <security_context_to_sid_core+332>:  bl      0xc03333e8 <ebitmap_destroy>

0xc033aadc <security_context_to_sid_core+352>:  bl      0xc037d260 <__memzero>

0xc033aaf4 <security_context_to_sid_core+376>:  bl      0xc020b2f8 <kfree>


Since the specific target device keeps crashing randomly, we swap the chipset from such device. After that, the crash disappears.