nfsv3 lockd和write的重传定时器对比

内核改造:

具体内核修改代码点此链接

nfsv3 NLM请求超时的问题测试结果:

[   74.865468] RPC:    56 setting alarm for 10000 ms
[   85.116887] RPC:    56 setting alarm for 20000 ms
[  105.599780] RPC:    56 setting alarm for 30000 ms
[  135.796650] RPC:    56 setting alarm for 60000 ms
[  135.938420] RPC:    56 setting alarm for 10000 ms
[  146.050487] RPC:    56 setting alarm for 20000 ms

nfsv3 wirte请求超时问题测试结果:

[   96.740830] RPC:    48 setting alarm for 60000 ms
[  156.800261] RPC:    48 setting alarm for 120000 ms
[  279.680731] RPC:    48 setting alarm for 180000 ms
[  459.904366] RPC:    48 setting alarm for 60000 ms
[  521.354910] RPC:    48 setting alarm for 120000 ms

可以看到lockd操作的重传时间是每次增加10000ms 而write则是每次都在基础时间加上timeout

修改一下内核代码,查看为什么会这样。

 diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index b1abf4848bbc..74cd024b0139 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -617,11 +617,20 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
        const struct rpc_timeout *to = req->rq_task->tk_client->cl_timeout;
        int status = 0;
 
+
        if (time_before(jiffies, req->rq_majortimeo)) {
                if (to->to_exponential)
+               {
+                       printk("PID: %d  xprt_adjust_timeout_expon timeout_before = %ld\n",current->pid,req->rq_timeout);
                        req->rq_timeout <<= 1;
+                       printk("PID: %d  xprt_adjust_timeout_expon timeout_after = %ld\n",current->pid,req->rq_timeout);
+               }
                else
+               {
+                       printk("PID: %d  xprt_adjust_timeout timeout_before = %ld increment = %ld\n",current->pid,req->rq_timeout,to->to_increment);
                        req->rq_timeout += to->to_increment;
+                       printk("PID: %d  xprt_adjust_timeout_noexpon timeout_after = %ld\n",current->pid,req->rq_timeout);
+               }
                if (to->to_maxval && req->rq_timeout >= to->to_maxval)
                        req->rq_timeout = to->to_maxval;
                req->rq_retries++;

先测试一下lockd打印一下值看一下:

[  173.820271] RPC:    45 setting alarm for 60000 ms
[  173.847847] RPC:    46 setting alarm for 60000 ms
[  174.045381] RPC:    47 setting alarm for 60000 ms
[  174.152906] RPC:    48 setting alarm for 60000 ms
[  174.189646] RPC:    49 setting alarm for 60000 ms
[  174.277126] RPC:    49 setting alarm for 60000 ms
[  174.321568] RPC:    48 setting alarm for 10000 ms
[  174.450915] RPC:    50 setting alarm for 60000 ms
[  174.492788] RPC:    51 setting alarm for 60000 ms
[  174.577074] RPC:    51 setting alarm for 60000 ms
[  174.624928] RPC:    50 setting alarm for 60000 ms
[  174.644539] RPC:    50 setting alarm for 60000 ms
[  174.826054] RPC:    52 setting alarm for 10000 ms
[  184.971505] RPC:    52 setting alarm for 20000 ms
[  205.446885] RPC:    52 setting alarm for 30000 ms
[  235.641735] RPC:    52 setting alarm for 60000 ms
[  235.697804] RPC:    53 setting alarm for 60000 ms
[  235.759510] RPC:    53 setting alarm for 60000 ms
[  235.833243] RPC:    52 setting alarm for 10000 ms
[  245.898263] RPC:    52 setting alarm for 20000 ms
[  266.375841] RPC:    52 setting alarm for 30000 ms
[  296.565926] RPC:    52 setting alarm for 60000 ms
[  296.618845] RPC:    54 setting alarm for 60000 ms
[  296.689948] RPC:    54 setting alarm for 60000 ms
[  296.768341] RPC:    52 setting alarm for 10000 ms

#以52为例子
[  174.826054] RPC:    52 setting alarm for 10000 ms   # 第一次
[  184.971505] RPC:    52 setting alarm for 20000 ms   # 第二次
[  205.446885] RPC:    52 setting alarm for 30000 ms   # 第三次
[  235.641735] RPC:    52 setting alarm for 60000 ms   # 第四次
[  235.833243] RPC:    52 setting alarm for 10000 ms   # 又回到初始
[  245.898263] RPC:    52 setting alarm for 20000 ms
[  266.375841] RPC:    52 setting alarm for 30000 ms
[  296.565926] RPC:    52 setting alarm for 60000 ms
[  296.768341] RPC:    52 setting alarm for 10000 ms

[  174.209316] PID: 7  xprt_adjust_timeout timeout_before = 60000 increment = 0  
[  174.209317] PID: 7  xprt_adjust_timeout_noexpon timeout_after = 60000
[  174.326249] PID: 543  xprt_adjust_timeout timeout_before = 10000 increment = 10000  #可以看到每次的情况都是初始赋值就已经决定每次重传送时间增加的时间了。
[  174.326250] PID: 543  xprt_adjust_timeout_noexpon timeout_after = 20000
[  174.511857] PID: 7  xprt_adjust_timeout timeout_before = 60000 increment = 0
[  174.511858] PID: 7  xprt_adjust_timeout_noexpon timeout_after = 60000
[  174.631486] PID: 543  xprt_adjust_timeout timeout_before = 60000 increment = 0
[  174.631486] PID: 543  xprt_adjust_timeout_noexpon timeout_after = 60000
[  184.940567] PID: 543  xprt_adjust_timeout timeout_before = 10000 increment = 10000
[  184.943163] PID: 543  xprt_adjust_timeout_noexpon timeout_after = 20000
[  205.416110] PID: 543  xprt_adjust_timeout timeout_before = 20000 increment = 10000
[  205.418692] PID: 543  xprt_adjust_timeout_noexpon timeout_after = 30000
[  235.717880] PID: 179  xprt_adjust_timeout timeout_before = 60000 increment = 0
[  235.717883] PID: 179  xprt_adjust_timeout_noexpon timeout_after = 60000
[  245.867901] PID: 543  xprt_adjust_timeout timeout_before = 10000 increment = 10000
[  245.870501] PID: 543  xprt_adjust_timeout_noexpon timeout_after = 20000
[  266.347213] PID: 543  xprt_adjust_timeout timeout_before = 20000 increment = 10000
[  266.349671] PID: 543  xprt_adjust_timeout_noexpon timeout_after = 30000
[  296.638350] PID: 485  xprt_adjust_timeout timeout_before = 60000 increment = 0
[  296.638351] PID: 485  xprt_adjust_timeout_noexpon timeout_after = 60000

再测试wite看一下

[   60.751322] PID: 542  xprt_adjust_timeout timeout_before = 60000 increment = 0
[   60.753885] PID: 542  xprt_adjust_timeout_noexpon timeout_after = 60000
[   60.763116] PID: 542  xprt_adjust_timeout timeout_before = 60000 increment = 60000
[   60.765747] PID: 542  xprt_adjust_timeout_noexpon timeout_after = 120000 #write这里是按照nfs挂载参数来初始化增量参数的

照这么来看,好像这两个函数调用的都是线性增长? 只不过是timout的初始值,和increment赋值不一样。

所以找一下lockd这两个值初始化是在那里设置的吧?

在 fs/lockd/clntproc.c 的 nlmclnt_init() 中初始化

nlmclnt_init
    nlmclnt_lookup_host
        nlm_bind_host
            rpc_create
                rpc_create_args
                    .timeout    = &timeparms,
                        .to_initval = increment,
                        .to_increment   = increment,
                        .to_maxval  = increment * 6UL,
                            unsigned long increment = nlmsvc_timeout;
                                nlmsvc_timeout = nlm_timeout * HZ;
                                    static unsigned long        nlm_timeout = LOCKD_DFLT_TIMEO;
                                        #define LOCKD_DFLT_TIMEO    10

这里的代码说明了,在其nlm请求的时候,nlm的timout初始值和增加的值默认是等于10的

另外一个问题,好像并不是所有有的setting alram for 都是按照线性增加的,有一些rpc任务是重传一两次就结束了。

可以去看一下这重传一两次的请求是不是lockd请求?

在内核代码中修改:

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 92d88aa62085..16de9a96997b 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -93,6 +93,13 @@ __rpc_add_timer(struct rpc_wait_queue *queue, struct rpc_task *task)
        dprintk("RPC: %5u setting alarm for %u ms\n",
                task->tk_pid, jiffies_to_msecs(task->tk_timeout));
 
+
+       //判断NLM请求类型 
+    const char *progname = (task->tk_client && task->tk_client->cl_program) ? task->tk_client->cl_program->name : "NULL";
+    const char *procname = (task->tk_msg.rpc_proc) ? task->tk_msg.rpc_proc->p_name : "NULL";
+    printk("RPC: %5u, client=%s, proc=%s, timeout=%u ms\n",
+        task->tk_pid,progname, procname,jiffies_to_msecs(task->tk_timeout));
+
        task->u.tk_wait.expires = jiffies + task->tk_timeout;
        if (list_empty(&queue->timer_list.list) || time_before(task->u.tk_wait.expires, queue->timer_list.expires))
                rpc_set_queue_timer(queue, task->u.tk_wait.expires);
[  371.270194] RPC:    45, client=nfs, proc=GETATTR, timeout=60000 ms
[  371.387614] RPC:    46, client=nfs, proc=GETATTR, timeout=60000 ms
[  371.479991] RPC:    47, client=nfs, proc=ACCESS, timeout=60000 ms
[  371.509836] RPC:    48, client=lockd, proc=TEST, timeout=60000 ms
[  371.510276] RPC:    48, client=lockd, proc=TEST, timeout=10000 ms
[  371.510349] RPC:    48, client=lockd, proc=TEST, timeout=20000 ms
[  371.509912] RPC:    49, client=rpcbind, proc=GETPORT, timeout=60000 ms
[  371.510045] RPC:    49, client=rpcbind, proc=GETPORT, timeout=60000 ms

[  371.270193] RPC:    45 setting alarm for 60000 ms
[  371.387613] RPC:    46 setting alarm for 60000 ms
[  371.479991] RPC:    47 setting alarm for 60000 ms
[  371.509835] RPC:    48 setting alarm for 60000 ms
[  371.510274] RPC:    48 setting alarm for 10000 ms
[  371.510349] RPC:    48 setting alarm for 20000 ms
[  371.509911] RPC:    49 setting alarm for 60000 ms
[  371.510044] RPC:    49 setting alarm for 60000 ms

可以看到当我的client=lockd则是NLM请求,proc名称则是TEST也与NLM请求相关,这样就揭示了为什么从日志信息来看会有不同从初始时间。

是否会造成5s这样的倍数? 这里有一个5的倍数代码:

get_lockd_grace_period return nlm_timeout * 5 * HZ;

在这个代码里面增加打印

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 346ed161756d..acb013fc39f3 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -88,11 +88,19 @@ static struct ctl_table_header * nlm_sysctl_table;
 
 static unsigned long get_lockd_grace_period(void)
 {
+       /* 打印调用栈 */
+    dump_stack();
        /* Note: nlm_timeout should always be nonzero */
        if (nlm_grace_period)
+       {
+               printk("get_lockd_grace_period = %ld",roundup(nlm_grace_period, nlm_timeout));
                return roundup(nlm_grace_period, nlm_timeout) * HZ;
+       }
        else
+       {
+               printk("get_lockd_grace_period nlm_timeout = %ld",nlm_timeout * 5 * HZ);
                return nlm_timeout * 5 * HZ;
+       }
 }

打印到日志查看一下这个值的内容: [ 266.981120] get_lockd_grace_period nlm_timeout = 50000

确实调用了,但只是调用了一次

这个函数是计算 NFS 锁管理服务的客户端恢复崩溃后重新声明锁的时间窗口 从日志信息来看而且这个函数是会发生在nlmsvc_lock之前

这个函数的调用栈:

vfs_write
    __vfs_write
        nfsd_svc
            nfsd_startup_net
                lockd_up.cold
                    set_grace_period
                        get_lockd_grace_period