[KernelCrash] Abort at do_raw_spin_lock() with "cat /d/shrinker"

When I enter the command adb shell "cat /d/shrinker", the system crashes with 100% after dumping the following kernel message as below.

[ 761.636711] Unable to handle kernel paging request at virtual address f38a9a84

[ 761.645048] pgd = e8074000

[ 761.649800] [f38a9a84] *pgd=a0721811, *pte=00000000, *ppte=00000000

[ 761.658106] Internal error: Oops: 7 [#1] PREEMPT SMP ARM

[ 761.665481] Modules linked in:

[ 761.670605] CPU: 6 PID: 6978 Comm: sh Tainted: G W 3.18.31-g5b0528e-dirty #12

[ 761.680852] task: eb5b8000 ti: d228e000 task.ti: d228e000

[ 761.688315] PC is at do_raw_spin_lock+0x8/0x17c

[ 761.694912] LR is at list_lru_count_node+0x14/0x60

//--- skip ---

[ 761.885719] [<c016a650>] (do_raw_spin_lock) from [<c02057f4>] (list_lru_count_node+0x14/0x60)

[ 761.896307] [<c02057f4>] (list_lru_count_node) from [<c022dffc>] (super_cache_count+0x40/0x94)

[ 761.906982] [<c022dffc>] (super_cache_count) from [<c01f2708>] (debug_shrinker_show+0x44/0x7c)

[ 761.917658] [<c01f2708>] (debug_shrinker_show) from [<c024a56c>] (seq_read+0x1a4/0x3f0)

[ 761.927726] [<c024a56c>] (seq_read) from [<c022c87c>] (vfs_read+0x88/0x138)

[ 761.936755] [<c022c87c>] (vfs_read) from [<c022cda8>] (SyS_read+0x3c/0x7c)

[ 761.945695] [<c022cda8>] (SyS_read) from [<c0106840>] (ret_fast_syscall+0x0/0x38)

[ 761.955242] Code: e12fff1e deaf1eed e92d47f0 e1a08000 (e5902004)

After loading coredump using T32 program, we can find out that the data abort is triggered inside debug_spin_lock_before() function during sanity-check procedure of spinlock where list_lru_count_node() is attempting to acquire.

-000|do_DataAbort(addr = 170, fsr = 3247439880, regs = 0xEC4901C0)

-001|__dabt_svc(asm)

-->|exception

-002|debug_spin_lock_before(inline) //<<-- data abort

-002|do_raw_spin_lock(lock = 0xF38A9A80)

-003|list_lru_count_node(?, ?)

-004|super_cache_count(shrink = 0xEBC47DE0, sc = 0xD228FEF8)

-005|debug_shrinker_show(s = 0xE7CF63C0, ?)

-006|seq_read(file = 0xE805CA00, buf = 0xB3246000, size = 4096, ppos = 0xD228FF80)

-007|vfs_read(file = 0xFFFFFFFF, buf = 0x0, ?, pos = 0xD228FF80)

-008|SYSC_read(inline)

-008|sys_read(?, buf = -1289461760, count = 4096)

-009|ret_fast_syscall(asm)

Upon acquiring spinlock, all members of raw_spinlock_t(0xF38A9A80) have been scribbled.

It is necessary to examine the variable which contains spinlock as member.

(register raw_spinlock_t *) [R0] lock = 0xF38A9A80 -> (

(arch_spinlock_t) [NSD:0xF38A9A80] raw_lock = (

(u32) [NSD:0xF38A9A80] slock = 0x6F632F34,

(struct __raw_tickets) [NSD:0xF38A9A80] tickets = (

(u16) [NSD:0xF38A9A80] owner = 0x2F34, //<<--

(u16) [NSD:0xF38A9A82] next = 0x6F63)), //<<--

(unsigned int) [NSD:0xF38A9A84] magic = 0x6E6F6D6D //<<--

(unsigned int) [NSD:0xF38A9A88] owner_cpu = 0x6F6F675F,

(void *) [NSD:0xF38A9A8C] owner = 0x5F656C67))

The address of spin_lock is passed from &nlru->lock which is responsible for saving &lru->node[nid].

51unsigned long

52list_lru_count_node(struct list_lru *lru, int nid)

53{

54 unsigned long count = 0;

55 struct list_lru_node *nlru = &lru->node[nid]; <<--

57 spin_lock(&nlru->lock);

The list_lru_count_node() is called inside super_cache_count() [128 line.].

108static unsigned long super_cache_count(struct shrinker *shrink,

109 struct shrink_control *sc)

110{

111 struct super_block *sb;

112 long total_objects = 0;

113

114 sb = container_of(shrink, struct super_block, s_shrink);

//snip

124 if (sb->s_op && sb->s_op->nr_cached_objects)

125 total_objects = sb->s_op->nr_cached_objects(sb,

126 sc->nid);

127

128 total_objects += list_lru_count_node(&sb->s_dentry_lru, //<<--

129 sc->nid);

Tracing back to the caller function of super_cache_count() using T32 program, we can find out that debug_shrinker_show() calls super_cache_count().

Here, the second parameter in a call to super_cache_count() is &sc defined as local variable whose memory location belongs to stack area, which looks very *odd*. odd debug information.

205static int debug_shrinker_show(struct seq_file *s, void *unused)

206{

207 struct shrinker *shrinker;

208 struct shrink_control sc;

209

210 sc.gfp_mask = -1;

211 sc.nr_to_scan = 0;

212

213 down_read(&shrinker_rwsem);

214 list_for_each_entry(shrinker, &shrinker_list, list) {

215 int num_objs;

216

217 num_objs = shrinker->count_objects(shrinker, &sc); //<<-- shrinker->count_objects = super_cache_count

Here, let me wrap up the call trace flow as follows.

1. shrinker->count_objects(shrinker, &sc); // <<-- super_cache_count

2. list_lru_count_node(&sb->s_dentry_lru, sc->nid); // The second parameter sc->nid contains garbage value(since it is defined as local variable without initialization)

3. list_lru_count_node(struct list_lru *lru, int nid) // <<-- nid(sc->nid): garbage value

3.1 struct list_lru_node *nlru = &lru->node[nid]; // &lru->node[nid]; // access lru->node from nid(garbage value)

3.2 spin_lock(&nlru->lock); // &nlru->lock: garbage address

Solution

After visiting linux kernel community and google web-site, we have found that google already generated revert commit

https://chromium-review.googlesource.com/c/345895/

The commit message below seems to contain identical debugging signature what we analyzed(shrink_control.nid is used but not initialzed)

Commit log

ANDROID: Revert "mm: vmscan: Add a debug file for shrinkers"

Kernel panic when type "cat /sys/kernel/debug/shrinker"

Unable to handle kernel paging request at virtual address 0af37d40

pgd = d4dec000

[0af37d40] *pgd=00000000

Internal error: Oops: 5 [#1] PREEMPT SMP ARM

[<c0bb8f24>] (_raw_spin_lock) from [<c020aa08>] (list_lru_count_one+0x14/0x28)

[<c020aa08>] (list_lru_count_one) from [<c02309a8>] (super_cache_count+0x40/0xa0)

[<c02309a8>] (super_cache_count) from [<c01f6ab0>] (debug_shrinker_show+0x50/0x90)

[<c01f6ab0>] (debug_shrinker_show) from [<c024fa5c>] (seq_read+0x1ec/0x48c)

[<c024fa5c>] (seq_read) from [<c022e8f8>] (__vfs_read+0x20/0xd0)

[<c022e8f8>] (__vfs_read) from [<c022f0d0>] (vfs_read+0x7c/0x104)

[<c022f0d0>] (vfs_read) from [<c022f974>] (SyS_read+0x44/0x9c)

[<c022f974>] (SyS_read) from [<c0107580>] (ret_fast_syscall+0x0/0x3c)

Code: e1a04000 e3a00001 ebd66b39 f594f000 (e1943f9f)

---[ end trace 60c74014a63a9688 ]---

Kernel panic - not syncing: Fatal exception

shrink_control.nid is used but not initialzed, same for

shrink_control.memcg.

This reverts commit b0e7a582b2264cdf75874dcd8df915b6b4427755.

Reported-by: Xiaowen Liu <xiaowen.liu@freescale.com>

Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>

(cherry picked from https://android.googlesource.com/kernel/common android-4.4

commit ad95c12f66df9efae04b15d5c4d0d0ba56ab2620)

BUG=chromium:612862

TEST=build and boot on a few devices

저작자표시 (새창열림)

'Kernel Crash Case-Studies' 카테고리의 다른 글

[KernelCrash] panic due to voltage droop in the specific device (0)	2019.03.09
[Liunx][Kernel] Abort at __list_del_entry() inside process_one_work() (0)	2019.03.09
[KernelCrash] Abort at tty_wakeup() due to port_tty(null) (0)	2019.03.09
[KernelCrash] Abort at 0xecb29f00(defective device) (0)	2019.03.09
[KernelCrash] Abort at mmc_wait_data_done() due to race (0)	2019.03.09

RISC-V and Arm Linux Kernel Hacks

[KernelCrash] Abort at do_raw_spin_lock() with "cat /d/shrinker"

'Kernel Crash Case-Studies' 카테고리의 다른 글

티스토리툴바

[KernelCrash] Abort at do_raw_spin_lock() with "cat /d/shrinker"

'Kernel Crash Case-Studies' 카테고리의 다른 글

'Kernel Crash Case-Studies' Related Articles

티스토리툴바