[KernelCrash] Abort at rmqueue_bulk() due to page.lru->next corruption

Kernel panic occurs with defective device under the following call trace.

crash> bt -I C01002D8 -S E7AABC08 0xE1804200

PID: 2285 TASK: e1804200 CPU: 5 COMMAND: "python"

bt: WARNING: stack address:0xe7aabd80, program counter:0xc0ee5b60

#0 [<c01002d8>] (do_DataAbort) from [<c010ad58>]

pc : [<c01d7308>] lr : [<c01d72ec>] psr: 60020193

sp : e7aabcf8 ip : c193e69c fp : edf34bf4

r10: 00000000 r9 : 0000001f r8 : 00000002

r7 : c1938280 r6 : c1938200 r5 : 00000010 r4 : ef4bddb4

r3 : ef4bddb4 r2 : 00000100 r1 : 00000000 r0 : ef4bdda0

Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM

#1 [<c010ad58>] (__dabt_svc) from [<c01d72ec>]

#2 [<c01d7308>] (rmqueue_bulk.constprop.11) from [<c01d7540>] //<<-- kernel panic

#3 [<c01d7540>] (get_page_from_freelist) from [<c01d79c4>]

#4 [<c01d79c4>] (__alloc_pages_nodemask) from [<c01f7bf4>]

#5 [<c01f7bf4>] (handle_mm_fault) from [<c011525c>]

#6 [<c011525c>] (do_page_fault) from [<c01002d8>]

#7 [<c01002d8>] (do_DataAbort) from [<c010b03c>]

The data abort is raised since page.lru->next(R2) holds invalid address 0x100.

0xc01d72f4 <rmqueue_bulk.constprop.11+0x58>: cmp r10, #0

0xc01d72f8 <rmqueue_bulk.constprop.11+0x5c>: add r3, r0, #20

0xc01d72fc <rmqueue_bulk.constprop.11+0x60>: ldreq r2, [r4]

0xc01d7300 <rmqueue_bulk.constprop.11+0x64>: ldrne r2, [r4, #4]

0xc01d7304 <rmqueue_bulk.constprop.11+0x68>: strne r3, [r4, #4]

0xc01d7308 <rmqueue_bulk.constprop.11+0x6c>: streq r3, [r2, #4] //<<-- data abort

crash> struct page.lru 0xEF4BDDA0 -px

lru = {

next = 0x100, //<<--

prev = 0x200

}

After having code review, I have figured out that attribute of page is pcp(per-cpu page frame cache: buddy system, 0 order page)

static int rmqueue_bulk(struct zone *zone, unsigned int order,

unsigned long count, struct list_head *list,

int migratetype, bool cold)

{

int i;

spin_lock(&zone->lock);

for (i = 0; i < count; ++i) {

struct page *page;

//snip

if (likely(!cold))

list_add(&page->lru, list); //<<--

else

list_add_tail(&page->lru, list);

To find out pcp address for CPU5, the following command s are used.

crash> p contig_page_data.node_zones[1].pageset

$5 = (struct per_cpu_pageset *) 0xc177ebdc

crash> struct per_cpu_pages EDF34BDC

struct per_cpu_pages {

count = 0x1,

high = 0xba,

batch = 0x1f,

lists = {{

next = 0xef51fc74, //<<--MIGRATE_UNMOVABLE

prev = 0xef51fc74

}, {

next = 0xedf34bf0, //<<--MIGRATE_RECLAIMABLE

prev = 0xedf34bf0

}, {

next = 0xef4bdcd4,//<<--MIGRATE_MOVABLE

prev = 0xef4bddf4

}, {

next = 0xedf34c00, //<<--MIGRATE_PCPTYPES

prev = 0xedf34c00

}}

}

(where) 0xEDF34BDC = 0xc177ebdc+0x2c7b6000

crash> p __per_cpu_offset[5]

$7 = 0x2c7b6000

BTW the listed list 0xef4bdcd4 address is found to be corrupted as follows.

crash> list 0x0 0xef4bdcd4

ef4bdcd4

ef4bdcf4

ef4bdd14

ef4bdd34

ef4bdd54

ef4bdd74

ef4bddb4

100

(where)

#0 [<c01002d8>] (do_DataAbort) from [<c010ad58>]

pc : [<c01d7308>] lr : [<c01d72ec>] psr: 60020193

sp : e7aabcf8 ip : c193e69c fp : edf34bf4

r10: 00000000 r9 : 0000001f r8 : 00000002

r7 : c1938280 r6 : c1938200 r5 : 00000010 r4 : ef4bddb4

r3 : ef4bddb4 r2 : 00000100 r1 : 00000000 r0 : ef4bdda0

Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM

#1 [<c010ad58>] (__dabt_svc) from [<c01d72ec>]

#2 [<c01d7308>] (rmqueue_bulk.constprop.11) from [<c01d7540>] //<<-- kernel panic

#3 [<c01d7540>] (get_page_from_freelist) from [<c01d79c4>]

After the device is disassembled again with another PMIC, the crash disappears.

저작자표시

'Kernel Crash Case-Studies' 카테고리의 다른 글

[KernelCrash] Abort at 0xecb29f00(defective device) (0)	2019.03.09
[KernelCrash] Abort at mmc_wait_data_done() due to race (0)	2019.03.09
[KernelCrash] Abort at __stack_chk_fail() due to defective memory (0)	2019.03.09
[리눅스커널][크래시분석] 뮤텍스 데드락(Mutex Deadlock) 락업(lockup) - "simpleperf" 디버깅 (0)	2019.02.20
[리눅스커널][Trace32] T32로 wakelock 디버깅 - container_of 함수 분석 (0)	2019.02.13

RISC-V and Arm Linux Kernel Hacks

[KernelCrash] Abort at rmqueue_bulk() due to page.lru->next corruption

'Kernel Crash Case-Studies' 카테고리의 다른 글

티스토리툴바

[KernelCrash] Abort at rmqueue_bulk() due to page.lru->next corruption

'Kernel Crash Case-Studies' 카테고리의 다른 글

'Kernel Crash Case-Studies' Related Articles

티스토리툴바