[Linux][Kernel] data abort@irq_affinity_notify

#커널 크래시 디버깅 및 TroubleShooting

아래 커널 로그와 함께 시스템이 커널 패닉으로 돌아가셨어요.

irq_affinity_notify+0x4c 란 함수에서 data abort가 난 것을 확인할 수 있어요.

[287229.435076] <c6>mmc0: mmc_start_bkops: raw_bkops_status=0x2, from_exception=1

[287230.328287] <c6>mmc0: mmc_start_bkops: Starting bkops

[287231.319886] <26>Unable to handle kernel NULL pointer dereference at virtual address 000000fc

[287231.319920] <26>pgd = c0004000

[287231.319936] <22>[000000fc] *pgd=00000000

[287231.319957] <6>Internal error: Oops: 5 [#1] PREEMPT SMP ARM

[287231.319974] <c2>Modules linked in: core_ctl(PO)

[287231.320000] <c6>CPU: 2 PID: 13 Comm: kworker/1:0 Tainted: P W O 3.10.49-gca5eb64 #1

[287231.320028] <c2>Workqueue: events irq_affinity_notify

[287231.320046] <c6>task: e0063fc0 ti: e01f2000 task.ti: e01f2000

[287231.320065] <c2>PC is at irq_affinity_notify+0x4c/0xf8

[287231.320083] <c2>LR is at pm_qos_update_target+0x20c/0x240

[287231.320102] <c2>pc : [<c006c194>] lr : [<c0068ce0>] psr: 00000013

코어 덤프를 올려서 왜 커널 패닉이 발생했는지 조금 더 살펴볼까요?

우선 crash-utility 프로그램으로 콜스택을 살펴보면 c006c194란 프로그램 카운터에서 data abort가 발생했음을 알 수 있어요.

crash> bt -I C0008400 -S E01F3DF0 0xE0063FC0

PID: 13 TASK: e0063fc0 CPU: 2 COMMAND: "kworker/1:0"

#0 [<c0008400>] (do_DataAbort) from [<c000d958>]

pc : [<c006c194>] lr : [<c0068ce0>] psr: 00000013

sp : e01f3ee0 ip : 00000004 fp : 00000000

r10: 00000000 r9 : c4826700 r8 : e0009a24

r7 : c0069124 r6 : e01f3ee4 r5 : 000000f8 r4 : ded06564

r3 : 00000100 r2 : e01f3e98 r1 : 60000013 r0 : 00000000

Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM

#1 [<c000d958>] (__dabt_svc) from [<c0068ce0>]

#2 [<c006c194>] (irq_affinity_notify) from [<c004248c>]

#3 [<c004248c>] (process_one_work) from [<c004320c>]

#4 [<c004320c>] (worker_thread) from [<c0047b10>]

#5 [<c0047b10>] (kthread) from [<c000de98>]

그럼 어떤 흐름으로 data abort가 일어났는지, 어셈블 코드 레벨로 디버깅을 좀 해보려고 해요.

이제 시작해볼까요?

1]: R7= *desc, (코드): struct irq_desc *desc = container_of(work, struct irq_desc, affinity_work);

[2]: 파라미터 struct work_struct *work를 r4에 저장

[3]: R8 = 0xE0009A24 = irq_desc.affinity_notify값, where: offsetof(struct irq_desc,affinity_notify) = 0x64

[4]: irq_desc.affinity_notify 주소를 가져오는데 where r5는 0xE0009A24이어야 하죠

[5]: r3 = 0x100, r8 = 0xE0009A24

[6]: list_for_each_entry() loop에 진입하면 안됨

[7]: "ldr r3, [r5, #4]" instruction의 base 주소 r5가 0xf8라서 Data Abort 발생

커널 로그(Unable to handle kernel NULL pointer dereference at virtual address 000000fc)에서 똑같은 정보를 확인할 수 있어요.

0xc006c148 <irq_affinity_notify>: push {r0, r1, r4, r5, r6, r7, r8, lr}

0xc006c14c <irq_affinity_notify+0x4>: subs r7, r0, #108 ; 0x6c //<<--[1]

0xc006c150 <irq_affinity_notify+0x8>: mov r4, r0 //<<--[2]

0xc006c154 <irq_affinity_notify+0xc>: beq 0xc006c234 <irq_affinity_notify+236>

0xc006c158 <irq_affinity_notify+0x10>: add r5, r7, #88 ; 0x58

0xc006c15c <irq_affinity_notify+0x14>: add r6, sp, #8

0xc006c160 <irq_affinity_notify+0x18>: add r8, r7, #100 ; 0x64 //<<--[3]

0xc006c164 <irq_affinity_notify+0x1c>: mov r0, r5

0xc006c168 <irq_affinity_notify+0x20>: bl 0xc0a34ca0 <_raw_spin_lock_irqsave>

0xc006c16c <irq_affinity_notify+0x24>: ldr r3, [r4, #-72] ; 0x48

0xc006c170 <irq_affinity_notify+0x28>: str r3, [r6, #-4]!

0xc006c174 <irq_affinity_notify+0x2c>: mov r1, r0

0xc006c178 <irq_affinity_notify+0x30>: mov r0, r5

0xc006c17c <irq_affinity_notify+0x34>: bl 0xc0a35028 <_raw_spin_unlock_irqrestore>

0xc006c180 <irq_affinity_notify+0x38>: ldr r5, [r4, #-8] //<<--[4]

0xc006c184 <irq_affinity_notify+0x3c>: sub r5, r5, #8

0xc006c188 <irq_affinity_notify+0x40>: add r3, r5, #8

0xc006c18c <irq_affinity_notify+0x44>: cmp r3, r8 //<<--[5]

0xc006c190 <irq_affinity_notify+0x48>: beq 0xc006c234 <irq_affinity_notify+236> //<<--[6]

0xc006c194 <irq_affinity_notify+0x4c>: ldr r3, [r5, #4] //<<--[7]

해당 C 코드는 아래와 같은데요. 참 신기하죠. &desc->affinity_notify가 NULL인데,

list_for_each_entry(notify, &desc->affinity_notify, list) 를 뚫고 들어가 실행을 한거란 말이죠.

static void irq_affinity_notify(struct work_struct *work)

{

struct irq_desc *desc =

container_of(work, struct irq_desc, affinity_work);

cpumask_var_t cpumask;

unsigned long flags;

struct irq_affinity_notify *notify;

// .. snip..

list_for_each_entry(notify, &desc->affinity_notify, list) { //<<--

if (!kref_get_unless_zero(&notify->kref))

continue;

notify->notify(notify, cpumask);

kref_put(&notify->kref, notify->release);

}

mutex_unlock(&desc->notify_lock);

free_cpumask_var(cpumask);

}

(To Be continue)

# Reference: For more information on 'Linux Kernel';

디버깅을 통해 배우는 리눅스 커널의 구조와 원리. 1

http://www.yes24.com/Product/Goods/90085976

https://wikibook.co.kr/linux-kernel-1/

디버깅을 통해 배우는 리눅스 커널의 구조와 원리. 2

http://www.yes24.com/Product/Goods/90087307

https://wikibook.co.kr/linux-kernel-2/

저작자표시

'Core BSP 분석 > 커널 트러블슈팅' 카테고리의 다른 글

[Kernel][Stability] tcp_v4_rcv -> __stack_chk_fail (0)	2023.05.07
[Linux][Kernel] data abort@cpuacct_charge - 리눅스 커널 (0)	2023.05.07
[Linux][Kernel] data abort @tty_wakeup - 리눅스 커널 (0)	2023.05.07
watchdog reset - race in ipv6_ifa_notify() - 리눅스 커널 (0)	2023.05.07
tombstone - __strlen_chk (0)	2023.05.07

RISC-V and Arm Linux Kernel Hacks

[Linux][Kernel] data abort@irq_affinity_notify - 리눅스 커널

'Core BSP 분석 > 커널 트러블슈팅' 카테고리의 다른 글

티스토리툴바

[Linux][Kernel] data abort@irq_affinity_notify - 리눅스 커널

'Core BSP 분석 > 커널 트러블슈팅' 카테고리의 다른 글

'Core BSP 분석/커널 트러블슈팅' Related Articles

티스토리툴바