본문 바로가기

Kernel Crash Case-Studies

(14)
[syzbot] Tons of crash issue with vmlinux and kernel log ([riscv] kernel panic) [syzbot] Tons of crash issue with vmlinux and kernel log ([riscv] kernel panic) Some of my friends asked me about how to find kernel crash signatures with kernel log. Because they would like to improve troubleshooting ability as Linux system software engineer. If you are eager to know about the pattern of __kernel crash__ signature, you can visit the following links:   1) syzbot weblink: The fol..
BUG(): CONFIG_PANIC_ON_OOPS, CONFIG_PANIC_ON_OOPS_VALUE! Sometime I noticed that system does not crash when the call to BUG() is made in the kernel driver. I just observed the stack trace from the kernel log and then find that target is running rather than entering crash mode. To make the target crash when BUG() is called, the following config should be present; CONFIG_PANIC_ON_OOPS=y CONFIG_PANIC_ON_OOPS_VALUE=1 Let's look at kernel die() function wh..
[Kernel] memory leak - debug(CONFIG_DEBUG_KMEMLEAK) 가끔 가다가 커널 메모리 누수(memory leak) 이슈가 생길 때가 있어요. OOM Killer가 메모리가 부족하다고 커널이 메시지를 남기며 스스로 자살을 하거나, Low Memory Killer가 너무나도 자주 돌아서 락업 현상으로 검출되죠. 이런 이슈가 나왔을 때 어떻게 디버깅을 하면 좋을까요? 한번 정리 좀 해볼께요. 1. 디버그 정보: contig_page_data.node_zones[0--1].free_area 우선 중 High/Low 메모리 Zone 중 어떤 Zone에서 페이지가 부족한 지 점검할 필요가 있어요. 만약에 Low 메모리 존에서 메모리가 부족하면 커널 동작으로 포커스를 맞추어야 하구요, 아래 경우와 같이 High Memory Zone에 Order 별로 free 페이지가 거의 없..
[Linux][Kernel][Stability] Kernel panic @0x0 from xfrm_local_error+0x4c #커널 크래시 디버깅 및 TroubleShootingRace로 mmc_wait_data_done() 함수에서 커널 패닉"cat /d/shrinker" 입력 시 커널 패닉함수 포인터 미지정으로 xfrm_local_error() 커널 패닉preempt 조건으로 ___might_sleep() 함수 크래시스택 카나리: __stack_chk_fail() 함수 크래시 스택 카나리: tcp_v4_rcv -> __stack_chk_fail 크래시뮤텍스 데드락(Mutex Deadlock)으로  락업(lockup)디바이스 드라이버 Signature 문제로 커널 크래시메모리 불량 커널 크래시 @find_vma_links() 메모리 불량 커널 크래시 @ttwu_do_activate()Race로 ipv6_ifa_notify()..
[KernelCrash] panic due to voltage droop in the specific device Kernel panic log 2107.232713 / 01-01 11:11:03.809][7] init: cannot find '/system/bin/qrngp' (No such file or directory), disabling 'qrngp'[ 2107.239317 / 01-01 11:11:03.809][5] Unable to handle kernel NULL pointer dereference at virtual address 00000028[ 2107.239351 / 01-01 11:11:03.809][5] pgd = e37ec000[ 2107.239366 / 01-01 11:11:03.809][0] [00000028] *pgd=00000000[ 2107.239388 / 01-01 11:11:0..
[KernelCrash] Abort at do_raw_spin_lock() with "cat /d/shrinker" When I enter the command adb shell "cat /d/shrinker", the system crashes with 100% after dumping the following kernel message as below.[ 761.636711] Unable to handle kernel paging request at virtual address f38a9a84[ 761.645048] pgd = e8074000[ 761.649800] [f38a9a84] *pgd=a0721811, *pte=00000000, *ppte=00000000[ 761.658106] Internal error: Oops: 7 [#1] PREEMPT SMP ARM[ 761.665481] Modules linked..
[Liunx][Kernel] Abort at __list_del_entry() inside process_one_work() Debugging Kernel panic occurs at 68 line inside __list_del_entry() whose caller function is process_one_work().Code Review at the moment of kernel panic 49void __list_del_entry(struct list_head *entry)50{51 struct list_head *prev, *next;5253 prev = entry->prev;54 next = entry->next;5556 if (WARN(next == LIST_POISON1,57 "list_del corruption, %p->next is LIST_POISON1 (%p)\n",58 entry, LIST_POISON1..
[KernelCrash] Abort at tty_wakeup() due to port_tty(null) I can restore callstack using T32 as followings;[] do_page_fault+0x338/0x3f8 [] do_DataAbort+0x38/0x98 [] __dabt_svc+0x38/0x60 [] tty_wakeup+0xc/0x64 [] gs_start_io+0x94/0xf4 [] gserial_connect+0xe0/0x180[] acm_set_alt+0x88/0x1a8 [] composite_setup+0xd34/0x1520 [] android_setup+0x1f4/0x1fc [] forward_to_driver+0x64/0x100 [] musb_g_ep0_irq+0x7d8/0x1c18 [] musb_interrupt+0x94/0xc78 [] generic_inte..