Kernel中timer错误使用触发了BUG_ON
最近,AP终端老是重启,原因是timer中的cascade的BUG_ON语句被触发,这句BUG_ON的意思是检测挂在当前base数组中的timer,其结构中指向的base指针不是当前数组.按正常cascade,每次定时器因时间流逝,应该在切换base组的时候也切换自己内部的指针.这个base指针什么时候被改掉的呢?
static int cascade(struct tvec_base *base, struct tvec *tv, int index) { /* cascade all the timers from tv up one level */ struct timer_list *timer, *tmp; struct list_head tv_list; list_replace_init(tv->vec + index, &tv_list); /* * We are removing _all_ timers from the list, so we * don't have to detach them individually. */ list_for_each_entry_safe(timer, tmp, &tv_list, entry) { BUG_ON(tbase_get_base(timer->base) != base); internal_add_timer(base, timer); } return index; }
关于内核timer的使用,还要提到四个重要的接口:
add_timer, mod_timer, del_timer,init_timer.
函数名称已经很好的解释了函数的用法, 其中add_timer其实就是调用mod_timer,而mod_timer其实内部都调用了__mod_timer,那么__mode_timer做了那些事情呢?内核对mod_timer的注释说明了一切:
/** * mod_timer - modify a timer's timeout * @timer: the timer to be modified * @expires: new timeout in jiffies * * mod_timer() is a more efficient way to update the expire field of an * active timer (if the timer is inactive it will be activated) * * mod_timer(timer, expires) is equivalent to: * * del_timer(timer); timer->expires = expires; add_timer(timer); * * Note that if there are multiple unserialized concurrent users of the * same timer, then mod_timer() is the only safe way to modify the timeout, * since add_timer() cannot modify an already running timer. * * The function returns whether it has modified a pending timer or not. * (ie. mod_timer() of an inactive timer returns 0, mod_timer() of an * active timer returns 1.) */
可见,Init_timer后可以直接调用mod_timer来启动定时器.也可以在超时函数中调用mod_timer,以便形成周期定时器.del_timer用在模块退出或业务逻辑结束时,删除timer.
在Init_timer中,会对base重新赋值.代码如下:
static void __init_timer(struct timer_list *timer, const char *name, struct lock_class_key *key) { timer->entry.next = NULL; timer->base = __raw_get_cpu_var(tvec_bases); timer->slack = -1; #ifdef CONFIG_TIMER_STATS timer->start_site = NULL; timer->start_pid = -1; memset(timer->start_comm, 0, TASK_COMM_LEN); #endif lockdep_init_map(&timer->lockdep_map, name, key, 0); }
排查代码发现应用程序可能调用ioctl的时候,内核中初始化了两次timer,导致正在运行的timer->base被修改.准确的说是我们使用timer的方法不正确.
1. init_timer尽量放在module初始化中
2. mod_timer在适当的时候调用,以便启动定时器.
3. del_timer在module退出或业务流程走完后删除.
另外,timer中BUG_ON也可能是SMP的硬件问题,具体可yahoo关键字"bug_on panic in cascade in timer.c"