Kernel中timer错误使用触发了BUG_ON

    最近,AP终端老是重启,原因是timer中的cascade的BUG_ON语句被触发,这句BUG_ON的意思是检测挂在当前base数组中的timer,其结构中指向的base指针不是当前数组.按正常cascade,每次定时器因时间流逝,应该在切换base组的时候也切换自己内部的指针.这个base指针什么时候被改掉的呢?

static int cascade(struct tvec_base *base, struct tvec *tv, int index)
{
	/* cascade all the timers from tv up one level */
	struct timer_list *timer, *tmp;
	struct list_head tv_list;

	list_replace_init(tv->vec + index, &tv_list);

	/*
	 * We are removing _all_ timers from the list, so we
	 * don't have to detach them individually.
	 */
	list_for_each_entry_safe(timer, tmp, &tv_list, entry) {
		BUG_ON(tbase_get_base(timer->base) != base);
		internal_add_timer(base, timer);
	}

	return index;
}

     关于内核timer的使用,还要提到四个重要的接口:

     add_timer, mod_timer, del_timer,init_timer.

     函数名称已经很好的解释了函数的用法, 其中add_timer其实就是调用mod_timer,而mod_timer其实内部都调用了__mod_timer,那么__mode_timer做了那些事情呢?内核对mod_timer的注释说明了一切:

/**
 * mod_timer - modify a timer's timeout
 * @timer: the timer to be modified
 * @expires: new timeout in jiffies
 *
 * mod_timer() is a more efficient way to update the expire field of an
 * active timer (if the timer is inactive it will be activated)
 *
 * mod_timer(timer, expires) is equivalent to:
 *
 *     del_timer(timer); timer->expires = expires; add_timer(timer);
 *
 * Note that if there are multiple unserialized concurrent users of the
 * same timer, then mod_timer() is the only safe way to modify the timeout,
 * since add_timer() cannot modify an already running timer.
 *
 * The function returns whether it has modified a pending timer or not.
 * (ie. mod_timer() of an inactive timer returns 0, mod_timer() of an
 * active timer returns 1.)
 */

     可见,Init_timer后可以直接调用mod_timer来启动定时器.也可以在超时函数中调用mod_timer,以便形成周期定时器.del_timer用在模块退出或业务逻辑结束时,删除timer.

      在Init_timer中,会对base重新赋值.代码如下:

static void __init_timer(struct timer_list *timer,
			 const char *name,
			 struct lock_class_key *key)
{
	timer->entry.next = NULL;
	timer->base = __raw_get_cpu_var(tvec_bases);
	timer->slack = -1;
#ifdef CONFIG_TIMER_STATS
	timer->start_site = NULL;
	timer->start_pid = -1;
	memset(timer->start_comm, 0, TASK_COMM_LEN);
#endif
	lockdep_init_map(&timer->lockdep_map, name, key, 0);
}

     排查代码发现应用程序可能调用ioctl的时候,内核中初始化了两次timer,导致正在运行的timer->base被修改.准确的说是我们使用timer的方法不正确.

    1. init_timer尽量放在module初始化中

    2. mod_timer在适当的时候调用,以便启动定时器.

    3. del_timer在module退出或业务流程走完后删除.

    另外,timer中BUG_ON也可能是SMP的硬件问题,具体可yahoo关键字"bug_on panic in cascade in timer.c"

Posted by imouse 2012年10月27日 03:23