{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "title: stata教程08-中介效应分析\n",
    "date: 2018-12-22 15:17:55\n",
    "tags: [stata]\n",
    "toc: true\n",
    "mathjax: true\n",
    "\n",
    "---\n",
    "\n",
    "\n",
    "<span></span>\n",
    "<!-- more -->"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 中介分析原理\n",
    "\n",
    "下面是我之前写过的关于中介效应的文章, 大家看后就知道原理了:\n",
    "\n",
    "<a href=\"/2015/05/23/SPSS实例：[16]中介效应的检验过程/\">\n",
    "    <span class=\"archive-date\">\n",
    "        May 2015\n",
    "    </span>\n",
    "    SPSS实例：[16]中介效应的检验过程\n",
    "</a>\n",
    "<a href=\"/2016/02/01/SPSS实例：[18]中介效应占总效应百分比/\">\n",
    "    <span class=\"archive-date\">\n",
    "        Feb 2016\n",
    "    </span>\n",
    "    SPSS实例：[18]中介效应占总效应百分比\n",
    "</a>\n",
    "\n",
    "<a href=\"/2016/01/11/SPSS实例：[20]检验中介效应的操作方法/\">\n",
    "    <span class=\"archive-date\">\n",
    "        Jan 2016\n",
    "    </span>\n",
    "    SPSS实例：[20]检验中介效应的操作方法\n",
    "</a>\n",
    "\n",
    "<a href=\"/2016/10/15/SPSS实例：[17]进行sobel检验(小白教程)/\">\n",
    "    <span class=\"archive-date\">\n",
    "        Oct 2016\n",
    "    </span>\n",
    "    SPSS实例：[17]进行sobel检验(小白教程)\n",
    "</a>\n",
    "\n",
    "<a href=\"https://mlln.cn/2018/09/26/%E5%9C%A8%E7%BA%BF%E7%BB%98%E5%88%B6%E4%B8%AD%E4%BB%8B%E6%95%88%E5%BA%94%E5%9B%BE/\">\n",
    "    <span class=\"archive-date\">\n",
    "        Oct 2016\n",
    "    </span>\n",
    "    在线绘制中介效应图\n",
    "</a>\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在这里我重新声明一下具体的过程:\n",
    "\n",
    "下面的回归模型中都带有控制变量，只不过为了简洁，没有在下面描述。首先使用自变量ind预测因变量dep, 得到模型1(`dep=c1 * ind +e1`), 然后使用自变量ind预测中介变量med, 得到模型2(`med=a * ind +e2`), 最后使用自变量ind和中介变量med预测因变量dep, 得到模型3(`dep=b* med + c2 * ind + e3`)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 本案例的数据介绍\n",
    "\n",
    "本案例使用的是自己编制的数据，自变量就是ind, 因变量就是dep, 中介变量就是med, 其他控制变量都以`control+数字`的格式命名。\n",
    "\n",
    "下面加载这个数据:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "use \"data/mediator-data.dta\", clear"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 过滤缺失值\n",
    "\n",
    "我们需要做三个回归分析, 但是因为回归分析涉及的变量不同, 如果变量存在缺失值, 那么很有可能造成三个回归方程使用的观测数据有差异(因为有不同的缺失值), 所以我们再做回归之前, 先要生成一个miss_num变量, 如果自变量/中介变量/因变量/控制变量都没有缺失, 那么miss_num=0, 否则miss_num>0。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "egen miss_num = rowmiss(dep med ind control1 control2 control3 control4 control5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "看一下缺失情况: 从下表可以看出, 没有缺失的有861个样本。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "   miss_num |      Freq.     Percent        Cum.\n",
      "------------+-----------------------------------\n",
      "          0 |        861       73.72       73.72\n",
      "          1 |        298       25.51       99.23\n",
      "          2 |          1        0.09       99.32\n",
      "          3 |          1        0.09       99.40\n",
      "          4 |          7        0.60      100.00\n",
      "------------+-----------------------------------\n",
      "      Total |      1,168      100.00\n"
     ]
    }
   ],
   "source": [
    "tab miss_num"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 回归1: 自变量预测因变量\n",
    "\n",
    "`dep=c1 * ind +e1`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "      Source |       SS           df       MS      Number of obs   =       861\n",
      "-------------+----------------------------------   F(6, 854)       =      2.76\n",
      "       Model |  .459843972         6  .076640662   Prob > F        =    0.0115\n",
      "    Residual |  23.6942548       854  .027745029   R-squared       =    0.0190\n",
      "-------------+----------------------------------   Adj R-squared   =    0.0121\n",
      "       Total |  24.1540988       860  .028086161   Root MSE        =    .16657\n",
      "\n",
      "------------------------------------------------------------------------------\n",
      "         dep |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]\n",
      "-------------+----------------------------------------------------------------\n",
      "         ind |  -.0745096   .0352931    -2.11   0.035    -.1437809   -.0052382\n",
      "    control1 |  -.0003018   .0043662    -0.07   0.945    -.0088715     .008268\n",
      "    control2 |  -.0133247     .00578    -2.31   0.021    -.0246693     -.00198\n",
      "    control3 |  -.0044711   .0044726    -1.00   0.318    -.0132497    .0043075\n",
      "    control4 |   .1799002   .0983623     1.83   0.068      -.01316    .3729603\n",
      "    control5 |   .0340114   .0191444     1.78   0.076    -.0035642    .0715871\n",
      "       _cons |   .7167078   .2593365     2.76   0.006     .2076962     1.22572\n",
      "------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "reg dep ind control1 control2 control3 control4 control5 if miss_num==0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从上面的结果中可以看到, c1这个系数是显著的, c1 = -.0745096, sc1 = .0352931"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 回归2: 自变量预测中介变量\n",
    "\n",
    "`med=a * ind +e2`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "      Source |       SS           df       MS      Number of obs   =       861\n",
      "-------------+----------------------------------   F(6, 854)       =      2.90\n",
      "       Model |  3.83093084         6  .638488473   Prob > F        =    0.0083\n",
      "    Residual |  187.834574       854  .219946808   R-squared       =    0.0200\n",
      "-------------+----------------------------------   Adj R-squared   =    0.0131\n",
      "       Total |  191.665505       860  .222866867   Root MSE        =    .46898\n",
      "\n",
      "------------------------------------------------------------------------------\n",
      "         med |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]\n",
      "-------------+----------------------------------------------------------------\n",
      "         ind |  -.1950457   .0993702    -1.96   0.050    -.3900841   -7.35e-06\n",
      "    control1 |   .0279925   .0122933     2.28   0.023     .0038638    .0521211\n",
      "    control2 |   .0083398    .016274     0.51   0.608    -.0236019    .0402814\n",
      "    control3 |   .0271718   .0125929     2.16   0.031     .0024551    .0518885\n",
      "    control4 |   .8851904   .2769459     3.20   0.001     .3416161    1.428765\n",
      "    control5 |   .0229235   .0539025     0.43   0.671    -.0828733    .1287204\n",
      "       _cons |   2.094221     .73018     2.87   0.004     .6610633    3.527379\n",
      "------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "reg med ind control1 control2 control3 control4 control5 if miss_num == 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从上面的结果中可以看到, 这个系数是显著的, a =-.1950457, sa = .0993702"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 回归3: 自变量和中介变量预测因变量\n",
    "\n",
    "`dep=b* med + c2 * ind + e3`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "      Source |       SS           df       MS      Number of obs   =       861\n",
      "-------------+----------------------------------   F(7, 853)       =      2.99\n",
      "       Model |  .578216371         7  .082602339   Prob > F        =    0.0042\n",
      "    Residual |  23.5758824       853  .027638784   R-squared       =    0.0239\n",
      "-------------+----------------------------------   Adj R-squared   =    0.0159\n",
      "       Total |  24.1540988       860  .028086161   Root MSE        =    .16625\n",
      "\n",
      "------------------------------------------------------------------------------\n",
      "         dep |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]\n",
      "-------------+----------------------------------------------------------------\n",
      "         med |  -.0251037   .0121303    -2.07   0.039    -.0489124   -.0012949\n",
      "         ind |  -.0794059   .0353048    -2.25   0.025    -.1487004   -.0101114\n",
      "    control1 |    .000401    .004371     0.09   0.927    -.0081783    .0089802\n",
      "    control2 |  -.0131153   .0057698    -2.27   0.023      -.02444   -.0017906\n",
      "    control3 |   -.003789   .0044762    -0.85   0.398    -.0125746    .0049966\n",
      "    control4 |   .2021217   .0987592     2.05   0.041     .0082821    .3959613\n",
      "    control5 |   .0345869   .0191098     1.81   0.071    -.0029208    .0720946\n",
      "       _cons |   .7692805   .2600831     2.96   0.003     .2588026    1.279758\n",
      "------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "reg dep med ind control1 control2 control3 control4 control5 if miss_num == 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从上面的结果中可以看到, 这个b系数是显著的, b =--.0251037, sb =  .0121303\n",
    "\n",
    "c2系数也是显著的: c2 = -.0794059, sc2 = .0353048"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 结论\n",
    "\n",
    "因为所有系数都是显著的, 所以我们可以认为中介效应是存在的, 并且属于部分中介效应。中介的效应量可以这样计算:\n",
    "\n",
    "`m = a*b`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ".00489637\n"
     ]
    }
   ],
   "source": [
    "display -.1950457 * -.0251037"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "中介效应占总效应的百分比就是:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-6.5714602\n"
     ]
    }
   ],
   "source": [
    "display (-.1950457 * -.0251037) /  -.0745096 * 100"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 使用插件\n",
    "\n",
    "实际上我们可以把以上的代码都封装成一个命令, 恰好我在网上找到了一段代码, 可以做中介效应。你需要把以下代码保存到stata安装目录的这个路径下:\n",
    "\n",
    "`Stata15\\ado\\base\\s`, 在这个文件夹下创建一个文件名为`sgmediation.ado`, 把以下代码贴进去, 然后重启stata。\n",
    "\n",
    "```stata\n",
    "*! version 1.1.1 -- 5/17/06 -- pbe\n",
    "*! verion 1.0 -- 2/28/05 -- pbe\n",
    "program define sgmediation\n",
    "/* sobel-goodman mediation tests */\n",
    "version 8.0\n",
    "syntax varlist(max=1) [if/] [in], iv(varlist numeric max=1) ///\n",
    "   mv(varlist numeric max=1) [ cv(varlist numeric) BOOTstrap reps(integer 200) level(integer 95)]\n",
    "marksample touse\n",
    "markout `touse' `varlist' `mv' `iv' `cv'\n",
    "tempname coef emat\n",
    "\n",
    "display\n",
    "tabulate  `mv' if `touse'\n",
    "\n",
    "display\n",
    "display as text \"Model with dv regressed on iv\"\n",
    "regress `varlist' `iv' `cv' if `touse'\n",
    "local ccoef=_b[`iv']\n",
    "\n",
    "display\n",
    "display \"Model with mediator regressed on iv\"\n",
    "regress `mv' `iv' `cv' if `touse'\n",
    "\n",
    "local acoef=_b[`iv']\n",
    "local avar=_se[`iv']^2\n",
    "\n",
    "display\n",
    "display \"Model with dv regressed on mediator and iv\"\n",
    "regress `varlist' `mv' `iv' `cv' if `touse'\n",
    "\n",
    "local bcoef=_b[`mv']\n",
    "local bvar=_se[`mv']^2\n",
    "\n",
    "local sobel =(`acoef'*`bcoef')\n",
    "local serr=sqrt(`bcoef'^2*`avar' + `acoef'^2*`bvar')\n",
    "local stest=`sobel'/`serr'\n",
    "local g1err=sqrt(`bcoef'^2*`avar' + `acoef'^2*`bvar' + `avar'*`bvar')\n",
    "local good1=`sobel'/`g1err'\n",
    "local g2err=sqrt(`bcoef'^2*`avar' + `acoef'^2*`bvar' - `avar'*`bvar')\n",
    "local good2=`sobel'/`g2err'\n",
    "local toteff = `sobel'/((`acoef'*`bcoef')+(`ccoef'-(`acoef'*`bcoef')))\n",
    "local ratio = `sobel'/((`ccoef'-(`acoef'*`bcoef')))\n",
    "\n",
    "display\n",
    "display \"Sobel-Goodman Mediation Tests\"\n",
    "display\n",
    "display \"             Coef         Std Err     Z           P>|Z|\"\n",
    "display as txt \"Sobel       \" as res `sobel' _skip(4) `serr'  %8.4g ///\n",
    "`stest', _skip(5) 2*(1-norm(abs(`stest')))\n",
    "display as txt \"Goodman-1   \" as res `sobel' _skip(4) `g1err' %8.4g ///\n",
    "`good1', _skip(5) 2*(1-norm(abs(`good1')))\n",
    "display as txt \"Goodman-2   \" as res `sobel' _skip(4) `g2err' %8.4g ///\n",
    "`good2', _skip(5) 2*(1-norm(abs(`good2')))\n",
    "display\n",
    "display as txt \"Pecent of total effect that is mediated: \", as res ///\n",
    "%5.2f 100*`toteff',\"%\"\n",
    "display as txt \"Ratio of indirect to direct effect:     \", as res %8.4f `ratio'\n",
    "\n",
    "if \"`bootstrap'\"~=\"\" {\n",
    "  display \n",
    "  display as txt \"Percentile and Bias-corrected bootstrap results for Sobel: `reps' replications\"\n",
    "  display\n",
    "\n",
    "  quietly bootstrap coef=r(sobel), reps(`reps') level(`level'): sgboot `varlist' , mv(`mv') iv(`iv') cv(`cv' )\n",
    "  estat bootstrap, bc percentile noheader\n",
    "  }\n",
    "\n",
    "end\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最后你在使用的时候, 就可以直接调用这个命令即可:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "        med |      Freq.     Percent        Cum.\n",
      "------------+-----------------------------------\n",
      "          0 |        573       66.55       66.55\n",
      "          1 |        288       33.45      100.00\n",
      "------------+-----------------------------------\n",
      "      Total |        861      100.00\n",
      "\n",
      "        med |      Freq.     Percent        Cum.\n",
      "------------+-----------------------------------\n",
      "          0 |        730       62.50       62.50\n",
      "          1 |        438       37.50      100.00\n",
      "------------+-----------------------------------\n",
      "      Total |      1,168      100.00\n",
      "\n",
      "Model with dv regressed on iv\n",
      "\n",
      "      Source |       SS           df       MS      Number of obs   =       861\n",
      "-------------+----------------------------------   F(6, 854)       =      2.76\n",
      "       Model |  .459843972         6  .076640662   Prob > F        =    0.0115\n",
      "    Residual |  23.6942548       854  .027745029   R-squared       =    0.0190\n",
      "-------------+----------------------------------   Adj R-squared   =    0.0121\n",
      "       Total |  24.1540988       860  .028086161   Root MSE        =    .16657\n",
      "\n",
      "------------------------------------------------------------------------------\n",
      "         dep |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]\n",
      "-------------+----------------------------------------------------------------\n",
      "         ind |  -.0745096   .0352931    -2.11   0.035    -.1437809   -.0052382\n",
      "    control1 |  -.0003018   .0043662    -0.07   0.945    -.0088715     .008268\n",
      "    control2 |  -.0133247     .00578    -2.31   0.021    -.0246693     -.00198\n",
      "    control3 |  -.0044711   .0044726    -1.00   0.318    -.0132497    .0043075\n",
      "    control4 |   .1799002   .0983623     1.83   0.068      -.01316    .3729603\n",
      "    control5 |   .0340114   .0191444     1.78   0.076    -.0035642    .0715871\n",
      "       _cons |   .7167078   .2593365     2.76   0.006     .2076962     1.22572\n",
      "------------------------------------------------------------------------------\n",
      "\n",
      "Model with mediator regressed on iv\n",
      "\n",
      "      Source |       SS           df       MS      Number of obs   =       861\n",
      "-------------+----------------------------------   F(6, 854)       =      2.90\n",
      "       Model |  3.83093084         6  .638488473   Prob > F        =    0.0083\n",
      "    Residual |  187.834574       854  .219946808   R-squared       =    0.0200\n",
      "-------------+----------------------------------   Adj R-squared   =    0.0131\n",
      "       Total |  191.665505       860  .222866867   Root MSE        =    .46898\n",
      "\n",
      "------------------------------------------------------------------------------\n",
      "         med |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]\n",
      "-------------+----------------------------------------------------------------\n",
      "         ind |  -.1950457   .0993702    -1.96   0.050    -.3900841   -7.35e-06\n",
      "    control1 |   .0279925   .0122933     2.28   0.023     .0038638    .0521211\n",
      "    control2 |   .0083398    .016274     0.51   0.608    -.0236019    .0402814\n",
      "    control3 |   .0271718   .0125929     2.16   0.031     .0024551    .0518885\n",
      "    control4 |   .8851904   .2769459     3.20   0.001     .3416161    1.428765\n",
      "    control5 |   .0229235   .0539025     0.43   0.671    -.0828733    .1287204\n",
      "       _cons |   2.094221     .73018     2.87   0.004     .6610633    3.527379\n",
      "------------------------------------------------------------------------------\n",
      "\n",
      "Model with dv regressed on mediator and iv\n",
      "\n",
      "      Source |       SS           df       MS      Number of obs   =       861\n",
      "-------------+----------------------------------   F(7, 853)       =      2.99\n",
      "       Model |  .578216371         7  .082602339   Prob > F        =    0.0042\n",
      "    Residual |  23.5758824       853  .027638784   R-squared       =    0.0239\n",
      "-------------+----------------------------------   Adj R-squared   =    0.0159\n",
      "       Total |  24.1540988       860  .028086161   Root MSE        =    .16625\n",
      "\n",
      "------------------------------------------------------------------------------\n",
      "         dep |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]\n",
      "-------------+----------------------------------------------------------------\n",
      "         med |  -.0251037   .0121303    -2.07   0.039    -.0489124   -.0012949\n",
      "         ind |  -.0794059   .0353048    -2.25   0.025    -.1487004   -.0101114\n",
      "    control1 |    .000401    .004371     0.09   0.927    -.0081783    .0089802\n",
      "    control2 |  -.0131153   .0057698    -2.27   0.023      -.02444   -.0017906\n",
      "    control3 |   -.003789   .0044762    -0.85   0.398    -.0125746    .0049966\n",
      "    control4 |   .2021217   .0987592     2.05   0.041     .0082821    .3959613\n",
      "    control5 |   .0345869   .0191098     1.81   0.071    -.0029208    .0720946\n",
      "       _cons |   .7692805   .2600831     2.96   0.003     .2588026    1.279758\n",
      "------------------------------------------------------------------------------\n",
      "\n",
      "Sobel-Goodman Mediation Tests\n",
      "\n",
      "             Coef         Std Err     Z           P>|Z|\n",
      "Sobel       .00489637    .       .      .\n",
      "Goodman-1   .00489637    .       .      .\n",
      "Goodman-2   .00489637    .       .      .\n",
      "\n",
      "Pecent of total effect that is mediated:  -6.57 %\n",
      "Ratio of indirect to direct effect:       -0.0617\n"
     ]
    }
   ],
   "source": [
    "sgmediation dep, mv(med) iv(ind) cv(control1 control2 control3 control4 control5  )"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Stata",
   "language": "stata",
   "name": "stata"
  },
  "language_info": {
   "file_extension": ".do",
   "mimetype": "text/x-stata",
   "name": "stata"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}