Kernel panic occurs with defective device under the following call trace.
crash> bt -I C01002D8 -S E7AABC08 0xE1804200 PID: 2285 TASK: e1804200 CPU: 5 COMMAND: "python" bt: WARNING: stack address:0xe7aabd80, program counter:0xc0ee5b60 #0 [<c01002d8>] (do_DataAbort) from [<c010ad58>] pc : [<c01d7308>] lr : [<c01d72ec>] psr: 60020193 sp : e7aabcf8 ip : c193e69c fp : edf34bf4 r10: 00000000 r9 : 0000001f r8 : 00000002 r7 : c1938280 r6 : c1938200 r5 : 00000010 r4 : ef4bddb4 r3 : ef4bddb4 r2 : 00000100 r1 : 00000000 r0 : ef4bdda0 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM #1 [<c010ad58>] (__dabt_svc) from [<c01d72ec>] #2 [<c01d7308>] (rmqueue_bulk.constprop.11) from [<c01d7540>] //<<-- kernel panic #3 [<c01d7540>] (get_page_from_freelist) from [<c01d79c4>] #4 [<c01d79c4>] (__alloc_pages_nodemask) from [<c01f7bf4>] #5 [<c01f7bf4>] (handle_mm_fault) from [<c011525c>] #6 [<c011525c>] (do_page_fault) from [<c01002d8>] #7 [<c01002d8>] (do_DataAbort) from [<c010b03c>] |
The data abort is raised since page.lru->next(R2) holds invalid address 0x100.
0xc01d72f4 <rmqueue_bulk.constprop.11+0x58>: cmp r10, #0 0xc01d72f8 <rmqueue_bulk.constprop.11+0x5c>: add r3, r0, #20 0xc01d72fc <rmqueue_bulk.constprop.11+0x60>: ldreq r2, [r4] 0xc01d7300 <rmqueue_bulk.constprop.11+0x64>: ldrne r2, [r4, #4] 0xc01d7304 <rmqueue_bulk.constprop.11+0x68>: strne r3, [r4, #4] 0xc01d7308 <rmqueue_bulk.constprop.11+0x6c>: streq r3, [r2, #4] //<<-- data abort |
crash> struct page.lru 0xEF4BDDA0 -px lru = { next = 0x100, //<<-- prev = 0x200 } |
After having code review, I have figured out that attribute of page is pcp(per-cpu page frame cache: buddy system, 0 order page)
static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count, struct list_head *list, int migratetype, bool cold) { int i; spin_lock(&zone->lock); for (i = 0; i < count; ++i) { struct page *page; //snip if (likely(!cold)) list_add(&page->lru, list); //<<-- else list_add_tail(&page->lru, list); |
To find out pcp address for CPU5, the following command s are used.
crash> p contig_page_data.node_zones[1].pageset $5 = (struct per_cpu_pageset *) 0xc177ebdc crash> struct per_cpu_pages EDF34BDC struct per_cpu_pages { count = 0x1, high = 0xba, batch = 0x1f, lists = {{ next = 0xef51fc74, //<<--MIGRATE_UNMOVABLE prev = 0xef51fc74 }, { next = 0xedf34bf0, //<<--MIGRATE_RECLAIMABLE prev = 0xedf34bf0 }, { next = 0xef4bdcd4,//<<--MIGRATE_MOVABLE prev = 0xef4bddf4 }, { next = 0xedf34c00, //<<--MIGRATE_PCPTYPES prev = 0xedf34c00 }} } (where) 0xEDF34BDC = 0xc177ebdc+0x2c7b6000 crash> p __per_cpu_offset[5] $7 = 0x2c7b6000 |
BTW the listed list 0xef4bdcd4 address is found to be corrupted as follows.
crash> list 0x0 0xef4bdcd4 ef4bdcd4 ef4bdcf4 ef4bdd14 ef4bdd34 ef4bdd54 ef4bdd74 ef4bddb4 100 |
(where) #0 [<c01002d8>] (do_DataAbort) from [<c010ad58>] pc : [<c01d7308>] lr : [<c01d72ec>] psr: 60020193 sp : e7aabcf8 ip : c193e69c fp : edf34bf4 r10: 00000000 r9 : 0000001f r8 : 00000002 r7 : c1938280 r6 : c1938200 r5 : 00000010 r4 : ef4bddb4 r3 : ef4bddb4 r2 : 00000100 r1 : 00000000 r0 : ef4bdda0 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM #1 [<c010ad58>] (__dabt_svc) from [<c01d72ec>] #2 [<c01d7308>] (rmqueue_bulk.constprop.11) from [<c01d7540>] //<<-- kernel panic #3 [<c01d7540>] (get_page_from_freelist) from [<c01d79c4>] |
After the device is disassembled again with another PMIC, the crash disappears.
'Kernel Crash Case-Studies' 카테고리의 다른 글
[KernelCrash] Abort at 0xecb29f00(defective device) (0) | 2019.03.09 |
---|---|
[KernelCrash] Abort at mmc_wait_data_done() due to race (0) | 2019.03.09 |
[KernelCrash] Abort at __stack_chk_fail() due to defective memory (0) | 2019.03.09 |
[리눅스커널][크래시분석] 뮤텍스 데드락(Mutex Deadlock) 락업(lockup) - "simpleperf" 디버깅 (0) | 2019.02.20 |
[리눅스커널][Trace32] T32로 wakelock 디버깅 - container_of 함수 분석 (0) | 2019.02.13 |