Analysis
Kernel log says that program counter indicates invalid address(0xecb29f00) as below.
[ 257.879321 / 01-01 00:04:20.499][1] Unable to handle kernel paging request at virtual address ecb29f00 [ 257.879343 / 01-01 00:04:20.499][1] pgd = c4ebc000 [ 257.879355 / 01-01 00:04:20.499][0] [ecb29f00] *pgd=6ca1141e(bad) [ 257.879372 / 01-01 00:04:20.499][1] Internal error: Oops: 8000000d [#1] PREEMPT SMP ARM [ 257.879384 / 01-01 00:04:20.499][0] Modules linked in: texfat(PO) [ 257.879403 / 01-01 00:04:20.499][1] CPU: 1 PID: 384 Comm: ueventd Tainted: P W O 3.18.31-perf-gd069b48-00001-g8a6d6e5 #1 [ 257.879416 / 01-01 00:04:20.499][1] task: eccc4d00 ti: c4eaa000 task.ti: c4eaa000 [ 257.879429 / 01-01 00:04:20.499][1] PC is at 0xecb29f00 [ 257.879447 / 01-01 00:04:20.499][1] LR is at security_context_to_sid_core+0x184/0x1b0 [ 257.879462 / 01-01 00:04:20.499][1] pc : [<ecb29f00>] lr : [<c033ab00>] psr: 80030013 [ 257.879462 / 01-01 00:04:20.499][1] sp : c4eabed8 ip : e14870c0 fp : b0d70e50 [ 257.879479 / 01-01 00:04:20.499][1] r10: 00000000 r9 : c4eaa000 r8 : 00000027 [ 257.879492 / 01-01 00:04:20.499][1] r7 : c4e2bcc0 r6 : ebef5cc0 r5 : c4e2e1c0 r4 : 00000000 [ 257.879504 / 01-01 00:04:20.499][1] r3 : c193e700 r2 : 00000000 r1 : c193e700 r0 : 00000000 [ 257.879517 / 01-01 00:04:20.499][1] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user |
As the linked register(R14) holds 0xc033ab00(security_context_to_sid_core+0x184), ueventd process has been running under the following callstacks.
-000|security_context_to_sid_core(scontext = 0xDEAD, scontext_len = 0, sid = 0xB -001|security_context_to_sid(?, ?, ?, ?) -002|selinux_inode_setsecurity(inode = 0x0, ?, value = 0x0, size = 0, flags = 0) -003|selinux_inode_notifysecctx(?, ?, ?) -004|security_inode_notifysecctx(?, ?, ?) -005|kernfs_type(inline) -005|kernfs_refresh_inode(kn = 0xFFFFFFFF, inode = 0x80030013) -006|kernfs_iop_getattr(?, ?, stat = 0xC4EABF50) -007|vfs_getattr_nosec(?, ?) -008|vfs_fstatat(dfd = 0, filename = 0xC4E2BCC0, stat = 0xC4EABF50, ?) -009|SYSC_fstatat64(inline) -009|sys_fstatat64(?, ?, statbuf = -1328228292, ?) -010|ret_fast_syscall(asm) |
When program counter is 0xc033a980, its stack address should have been holding 0xC4EABE94.
In case of 0xc033a984 address, the stack address is updated as 0xC4EABE58(0xC4EABE94-0x3c)
0xc033a97c <security_context_to_sid_core>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr} 0xc033a980 <security_context_to_sid_core+4>: subs r5, r1, #0 0xc033a984 <security_context_to_sid_core+8>: sub sp, sp, #60 ; 0x3c |
I am suspecting that stack pop operation is not executed actually according to ARM calling conventions inside below functions.
0xc033a97c <security_context_to_sid_core>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr} 0xc033a9c0 <security_context_to_sid_core+68>: bl 0xc038542c <strcmp> 0xc033a9fc <security_context_to_sid_core+128>: bl 0xc020b974 <__kmalloc> 0xc033aa10 <security_context_to_sid_core+148>: bl 0xc037cb00 <memcpy> 0xc033aa28 <security_context_to_sid_core+172>: bl 0xc01e806c <kstrdup> 0xc033aa44 <security_context_to_sid_core+200>: bl 0xc0ee8d48 <_raw_read_lock> 0xc033aa5c <security_context_to_sid_core+224>: bl 0xc033a060 <string_to_context_struct> 0xc033aaa0 <security_context_to_sid_core+292>: bl 0xc03340a0 <sidtab_context_to_sid> 0xc033aac8 <security_context_to_sid_core+332>: bl 0xc03333e8 <ebitmap_destroy> 0xc033aadc <security_context_to_sid_core+352>: bl 0xc037d260 <__memzero> 0xc033aaf4 <security_context_to_sid_core+376>: bl 0xc020b2f8 <kfree> |
(For example)
-000|string_to_context_struct(pol = 0xDEAD, sidtabp = 0x0, scontext = 0xBEEF, sc -001|security_context_to_sid_core(?, scontext_len = 0, sid = 0x0, ?, gfp_flags = -002|security_context_to_sid(?, ?, ?, ?) -003|selinux_inode_setsecurity(inode = 0x0, ?, value = 0x0, size = 0, flags = 0) -004|selinux_inode_notifysecctx(?, ?, ?) -005|security_inode_notifysecctx(?, ?, ?) -006|kernfs_type(inline) -006|kernfs_refresh_inode(kn = 0xFFFFFFFF, inode = 0x80030013) -007|kernfs_iop_getattr(?, ?, stat = 0xC4EABF50) -008|vfs_getattr_nosec(?, ?) -009|vfs_fstatat(dfd = 0, filename = 0xC4E2BCC0, stat = 0xC4EABF50, ?) -010|SYSC_fstatat64(inline) -010|sys_fstatat64(?, ?, statbuf = -1328228292, ?) -011|ret_fast_syscall(asm) |
After the above functions are executed, the stack address should have been updated as 0xC4EABE58 instead of 0xc4eabed8.
[ 257.879462 / 01-01 00:04:20.499][1] pc : [<ecb29f00>] lr : [<c033ab00>] psr: 80030013 [ 257.879462 / 01-01 00:04:20.499][1] sp : c4eabed8 ip : e14870c0 fp : b0d70e50 [ 257.879479 / 01-01 00:04:20.499][1] r10: 00000000 r9 : c4eaa000 r8 : 00000027 |
The stack dump can be compared between this coredump and normal operation.
this ramdump | the stack in case of normal operation |
NSD:C4EABE90| 00 00 00 00 0x0 NSD:C4EABE94| 00 E7 93 C1 0xC193E700 \\vmlinux\Global\__tracepoint_kfree NSD:C4EABE98| 00 00 00 00 0x0 NSD:C4EABE9C| 00 E7 93 C1 0xC193E700 \\vmlinux\Global\__tracepoint_kfree NSD:C4EABEA0| 00 00 00 00 0x0 NSD:C4EABEA4| C0 E1 E2 C4 0xC4E2E1C0 NSD:C4EABEA8| C0 5C EF EB 0xEBEF5CC0 NSD:C4EABEAC| C0 BC E2 C4 0xC4E2BCC0 NSD:C4EABEB0| 27 00 00 00 0x27 \\vmlinux\Global\cpu_v7_suspend_size+0x3 NSD:C4EABEB4| 00 A0 EA C4 0xC4EAA000 //<<-R14 should have been \\security_context_to_sid+0x14 NSD:C4EABEB8| 00 00 00 00 0x0 // <<-- new SP NSD:C4EABEBC| 50 0E D7 B0 0xB0D70E50 NSD:C4EABEC0| C0 70 48 E1 0xE14870C0 NSD:C4EABEC4| D8 BE EA C4 0xC4EABED8 //<<-- should have been R14 \\selinux_inode_setsecurity+0x48 NSD:C4EABEC8| 00 AB 33 C0 0xC033AB00 \\vmlinux\services\security_context_to_sid_core+0x184 NSD:C4EABECC| 00 9F B2 EC 0xECB29F00 NSD:C4EABED0| 13 00 03 80 0x80030013 NSD:C4EABED4| FF FF FF FF 0xFFFFFFFF NSD:C4EABED8| 50 BF EA C4 0xC4EABF50 //<<-- SP address at the moment of kernel panic NSD:C4EABEDC| 00 9F B2 EC 0xECB29F00 NSD:C4EABEE0| 50 BF EA C4 0xC4EABF50 NSD:C4EABEE4| 9C A3 32 C0 0xC032A39C NSD:C4EABEE8| 00 00 00 00 0x0 NSD:C4EABEEC| C0 BC E2 C4 0xC4E2BCC0 | NSD:C4EABE90|_00_00_00_00__0x0 NSD:C4EABE94| CC CC 00 00 0xCCCC //<-- R4 where new SP NSD:C4EABE98| 00 00 00 00 0x0 NSD:C4EABE9C| 00 00 00 00 0x0 NSD:C4EABEA0| 00 00 00 00 0x0 NSD:C4EABEA4| 00 00 00 00 0x0 NSD:C4EABEA8| 00 00 00 00 0x0 NSD:C4EABEAC| 00 00 00 00 0x0 NSD:C4EABEB0| 00 00 00 00 0x0 //<-- R11 NSD:C4EABEB4| E4 C5 33 C0 0xC033C5E4 //<<-R14 \\security_context_to_sid+0x14 NSD:C4EABEB8| AD DE 00 00 0xDEAD //<<--R0 where new SP NSD:C4EABEBC| 00 00 00 00 0x0 //<<--R1 NSD:C4EABEC0| EF BE 00 00 0xBEEF //<<--R2 NSD:C4EABEC4| D8 A2 32 C0 0xC032A2D8 //<<--R14 \\selinux_inode_setsecurity+0x48 NSD:C4EABEC8| 00 AB 33 C0 0xC033AB00 //<-- R0, where new SP \\security_context_to_sid_core+0x184 NSD:C4EABECC| 00 9F B2 EC 0xECB29F00 //<-- R1 NSD:C4EABED0| 13 00 03 80 0x80030013 //<-- R4 NSD:C4EABED4| FF FF FF FF 0xFFFFFFFF //<-- R5 NSD:C4EABED8| 50 BF EA C4 0xC4EABF50 //<-- R6 NSD:C4EABEDC| 00 9F B2 EC 0xECB29F00 //<-- R7 NSD:C4EABEE0| 50 BF EA C4 0xC4EABF50 //<-- R8 NSD:C4EABEE4| 9C A3 32 C0 0xC032A39C \\vmlinux\hooks\selinux_inode_notifysecctx+0x20 //<-- R14 NSD:C4EABEE8| 00 00 00 00 0x0 //<<-- SP NSD:C4EABEEC| C0 BC E2 C4 0xC4E2BCC0 |
Weird signature is that PC should have been 0xFFFF_FFFF instead of 0xecb29f00.
Another scenario I can think of is that stack is scribbled inside below functions (i.e: memcpy of out-of-bound)
0xc033a97c <security_context_to_sid_core>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr} 0xc033a9c0 <security_context_to_sid_core+68>: bl 0xc038542c <strcmp> 0xc033a9fc <security_context_to_sid_core+128>: bl 0xc020b974 <__kmalloc> 0xc033aa10 <security_context_to_sid_core+148>: bl 0xc037cb00 <memcpy> 0xc033aa28 <security_context_to_sid_core+172>: bl 0xc01e806c <kstrdup> 0xc033aa44 <security_context_to_sid_core+200>: bl 0xc0ee8d48 <_raw_read_lock> 0xc033aa5c <security_context_to_sid_core+224>: bl 0xc033a060 <string_to_context_struct> 0xc033aaa0 <security_context_to_sid_core+292>: bl 0xc03340a0 <sidtab_context_to_sid> 0xc033aac8 <security_context_to_sid_core+332>: bl 0xc03333e8 <ebitmap_destroy> 0xc033aadc <security_context_to_sid_core+352>: bl 0xc037d260 <__memzero> 0xc033aaf4 <security_context_to_sid_core+376>: bl 0xc020b2f8 <kfree> |
Since the specific target device keeps crashing randomly, we swap the chipset from such device. After that, the crash disappears.
'Kernel Crash Case-Studies' 카테고리의 다른 글
[Liunx][Kernel] Abort at __list_del_entry() inside process_one_work() (0) | 2019.03.09 |
---|---|
[KernelCrash] Abort at tty_wakeup() due to port_tty(null) (0) | 2019.03.09 |
[KernelCrash] Abort at mmc_wait_data_done() due to race (0) | 2019.03.09 |
[KernelCrash] Abort at __stack_chk_fail() due to defective memory (0) | 2019.03.09 |
[KernelCrash] Abort at rmqueue_bulk() due to page.lru->next corruption (0) | 2019.03.09 |