{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"title: stata教程08-中介效应分析\n",
"date: 2018-12-22 15:17:55\n",
"tags: [stata]\n",
"toc: true\n",
"mathjax: true\n",
"\n",
"---\n",
"\n",
"\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 中介分析原理\n",
"\n",
"下面是我之前写过的关于中介效应的文章, 大家看后就知道原理了:\n",
"\n",
"\n",
" \n",
" May 2015\n",
" \n",
" SPSS实例:[16]中介效应的检验过程\n",
"\n",
"\n",
" \n",
" Feb 2016\n",
" \n",
" SPSS实例:[18]中介效应占总效应百分比\n",
"\n",
"\n",
"\n",
" \n",
" Jan 2016\n",
" \n",
" SPSS实例:[20]检验中介效应的操作方法\n",
"\n",
"\n",
"\n",
" \n",
" Oct 2016\n",
" \n",
" SPSS实例:[17]进行sobel检验(小白教程)\n",
"\n",
"\n",
"\n",
" \n",
" Oct 2016\n",
" \n",
" 在线绘制中介效应图\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在这里我重新声明一下具体的过程:\n",
"\n",
"下面的回归模型中都带有控制变量,只不过为了简洁,没有在下面描述。首先使用自变量ind预测因变量dep, 得到模型1(`dep=c1 * ind +e1`), 然后使用自变量ind预测中介变量med, 得到模型2(`med=a * ind +e2`), 最后使用自变量ind和中介变量med预测因变量dep, 得到模型3(`dep=b* med + c2 * ind + e3`)。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 本案例的数据介绍\n",
"\n",
"本案例使用的是自己编制的数据,自变量就是ind, 因变量就是dep, 中介变量就是med, 其他控制变量都以`control+数字`的格式命名。\n",
"\n",
"下面加载这个数据:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"use \"data/mediator-data.dta\", clear"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 过滤缺失值\n",
"\n",
"我们需要做三个回归分析, 但是因为回归分析涉及的变量不同, 如果变量存在缺失值, 那么很有可能造成三个回归方程使用的观测数据有差异(因为有不同的缺失值), 所以我们再做回归之前, 先要生成一个miss_num变量, 如果自变量/中介变量/因变量/控制变量都没有缺失, 那么miss_num=0, 否则miss_num>0。"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"egen miss_num = rowmiss(dep med ind control1 control2 control3 control4 control5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"看一下缺失情况: 从下表可以看出, 没有缺失的有861个样本。"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" miss_num | Freq. Percent Cum.\n",
"------------+-----------------------------------\n",
" 0 | 861 73.72 73.72\n",
" 1 | 298 25.51 99.23\n",
" 2 | 1 0.09 99.32\n",
" 3 | 1 0.09 99.40\n",
" 4 | 7 0.60 100.00\n",
"------------+-----------------------------------\n",
" Total | 1,168 100.00\n"
]
}
],
"source": [
"tab miss_num"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 回归1: 自变量预测因变量\n",
"\n",
"`dep=c1 * ind +e1`"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" Source | SS df MS Number of obs = 861\n",
"-------------+---------------------------------- F(6, 854) = 2.76\n",
" Model | .459843972 6 .076640662 Prob > F = 0.0115\n",
" Residual | 23.6942548 854 .027745029 R-squared = 0.0190\n",
"-------------+---------------------------------- Adj R-squared = 0.0121\n",
" Total | 24.1540988 860 .028086161 Root MSE = .16657\n",
"\n",
"------------------------------------------------------------------------------\n",
" dep | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n",
"-------------+----------------------------------------------------------------\n",
" ind | -.0745096 .0352931 -2.11 0.035 -.1437809 -.0052382\n",
" control1 | -.0003018 .0043662 -0.07 0.945 -.0088715 .008268\n",
" control2 | -.0133247 .00578 -2.31 0.021 -.0246693 -.00198\n",
" control3 | -.0044711 .0044726 -1.00 0.318 -.0132497 .0043075\n",
" control4 | .1799002 .0983623 1.83 0.068 -.01316 .3729603\n",
" control5 | .0340114 .0191444 1.78 0.076 -.0035642 .0715871\n",
" _cons | .7167078 .2593365 2.76 0.006 .2076962 1.22572\n",
"------------------------------------------------------------------------------\n"
]
}
],
"source": [
"reg dep ind control1 control2 control3 control4 control5 if miss_num==0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"从上面的结果中可以看到, c1这个系数是显著的, c1 = -.0745096, sc1 = .0352931"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 回归2: 自变量预测中介变量\n",
"\n",
"`med=a * ind +e2`"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" Source | SS df MS Number of obs = 861\n",
"-------------+---------------------------------- F(6, 854) = 2.90\n",
" Model | 3.83093084 6 .638488473 Prob > F = 0.0083\n",
" Residual | 187.834574 854 .219946808 R-squared = 0.0200\n",
"-------------+---------------------------------- Adj R-squared = 0.0131\n",
" Total | 191.665505 860 .222866867 Root MSE = .46898\n",
"\n",
"------------------------------------------------------------------------------\n",
" med | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n",
"-------------+----------------------------------------------------------------\n",
" ind | -.1950457 .0993702 -1.96 0.050 -.3900841 -7.35e-06\n",
" control1 | .0279925 .0122933 2.28 0.023 .0038638 .0521211\n",
" control2 | .0083398 .016274 0.51 0.608 -.0236019 .0402814\n",
" control3 | .0271718 .0125929 2.16 0.031 .0024551 .0518885\n",
" control4 | .8851904 .2769459 3.20 0.001 .3416161 1.428765\n",
" control5 | .0229235 .0539025 0.43 0.671 -.0828733 .1287204\n",
" _cons | 2.094221 .73018 2.87 0.004 .6610633 3.527379\n",
"------------------------------------------------------------------------------\n"
]
}
],
"source": [
"reg med ind control1 control2 control3 control4 control5 if miss_num == 0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"从上面的结果中可以看到, 这个系数是显著的, a =-.1950457, sa = .0993702"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 回归3: 自变量和中介变量预测因变量\n",
"\n",
"`dep=b* med + c2 * ind + e3`"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" Source | SS df MS Number of obs = 861\n",
"-------------+---------------------------------- F(7, 853) = 2.99\n",
" Model | .578216371 7 .082602339 Prob > F = 0.0042\n",
" Residual | 23.5758824 853 .027638784 R-squared = 0.0239\n",
"-------------+---------------------------------- Adj R-squared = 0.0159\n",
" Total | 24.1540988 860 .028086161 Root MSE = .16625\n",
"\n",
"------------------------------------------------------------------------------\n",
" dep | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n",
"-------------+----------------------------------------------------------------\n",
" med | -.0251037 .0121303 -2.07 0.039 -.0489124 -.0012949\n",
" ind | -.0794059 .0353048 -2.25 0.025 -.1487004 -.0101114\n",
" control1 | .000401 .004371 0.09 0.927 -.0081783 .0089802\n",
" control2 | -.0131153 .0057698 -2.27 0.023 -.02444 -.0017906\n",
" control3 | -.003789 .0044762 -0.85 0.398 -.0125746 .0049966\n",
" control4 | .2021217 .0987592 2.05 0.041 .0082821 .3959613\n",
" control5 | .0345869 .0191098 1.81 0.071 -.0029208 .0720946\n",
" _cons | .7692805 .2600831 2.96 0.003 .2588026 1.279758\n",
"------------------------------------------------------------------------------\n"
]
}
],
"source": [
"reg dep med ind control1 control2 control3 control4 control5 if miss_num == 0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"从上面的结果中可以看到, 这个b系数是显著的, b =--.0251037, sb = .0121303\n",
"\n",
"c2系数也是显著的: c2 = -.0794059, sc2 = .0353048"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 结论\n",
"\n",
"因为所有系数都是显著的, 所以我们可以认为中介效应是存在的, 并且属于部分中介效应。中介的效应量可以这样计算:\n",
"\n",
"`m = a*b`"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
".00489637\n"
]
}
],
"source": [
"display -.1950457 * -.0251037"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"中介效应占总效应的百分比就是:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-6.5714602\n"
]
}
],
"source": [
"display (-.1950457 * -.0251037) / -.0745096 * 100"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 使用插件\n",
"\n",
"实际上我们可以把以上的代码都封装成一个命令, 恰好我在网上找到了一段代码, 可以做中介效应。你需要把以下代码保存到stata安装目录的这个路径下:\n",
"\n",
"`Stata15\\ado\\base\\s`, 在这个文件夹下创建一个文件名为`sgmediation.ado`, 把以下代码贴进去, 然后重启stata。\n",
"\n",
"```stata\n",
"*! version 1.1.1 -- 5/17/06 -- pbe\n",
"*! verion 1.0 -- 2/28/05 -- pbe\n",
"program define sgmediation\n",
"/* sobel-goodman mediation tests */\n",
"version 8.0\n",
"syntax varlist(max=1) [if/] [in], iv(varlist numeric max=1) ///\n",
" mv(varlist numeric max=1) [ cv(varlist numeric) BOOTstrap reps(integer 200) level(integer 95)]\n",
"marksample touse\n",
"markout `touse' `varlist' `mv' `iv' `cv'\n",
"tempname coef emat\n",
"\n",
"display\n",
"tabulate `mv' if `touse'\n",
"\n",
"display\n",
"display as text \"Model with dv regressed on iv\"\n",
"regress `varlist' `iv' `cv' if `touse'\n",
"local ccoef=_b[`iv']\n",
"\n",
"display\n",
"display \"Model with mediator regressed on iv\"\n",
"regress `mv' `iv' `cv' if `touse'\n",
"\n",
"local acoef=_b[`iv']\n",
"local avar=_se[`iv']^2\n",
"\n",
"display\n",
"display \"Model with dv regressed on mediator and iv\"\n",
"regress `varlist' `mv' `iv' `cv' if `touse'\n",
"\n",
"local bcoef=_b[`mv']\n",
"local bvar=_se[`mv']^2\n",
"\n",
"local sobel =(`acoef'*`bcoef')\n",
"local serr=sqrt(`bcoef'^2*`avar' + `acoef'^2*`bvar')\n",
"local stest=`sobel'/`serr'\n",
"local g1err=sqrt(`bcoef'^2*`avar' + `acoef'^2*`bvar' + `avar'*`bvar')\n",
"local good1=`sobel'/`g1err'\n",
"local g2err=sqrt(`bcoef'^2*`avar' + `acoef'^2*`bvar' - `avar'*`bvar')\n",
"local good2=`sobel'/`g2err'\n",
"local toteff = `sobel'/((`acoef'*`bcoef')+(`ccoef'-(`acoef'*`bcoef')))\n",
"local ratio = `sobel'/((`ccoef'-(`acoef'*`bcoef')))\n",
"\n",
"display\n",
"display \"Sobel-Goodman Mediation Tests\"\n",
"display\n",
"display \" Coef Std Err Z P>|Z|\"\n",
"display as txt \"Sobel \" as res `sobel' _skip(4) `serr' %8.4g ///\n",
"`stest', _skip(5) 2*(1-norm(abs(`stest')))\n",
"display as txt \"Goodman-1 \" as res `sobel' _skip(4) `g1err' %8.4g ///\n",
"`good1', _skip(5) 2*(1-norm(abs(`good1')))\n",
"display as txt \"Goodman-2 \" as res `sobel' _skip(4) `g2err' %8.4g ///\n",
"`good2', _skip(5) 2*(1-norm(abs(`good2')))\n",
"display\n",
"display as txt \"Pecent of total effect that is mediated: \", as res ///\n",
"%5.2f 100*`toteff',\"%\"\n",
"display as txt \"Ratio of indirect to direct effect: \", as res %8.4f `ratio'\n",
"\n",
"if \"`bootstrap'\"~=\"\" {\n",
" display \n",
" display as txt \"Percentile and Bias-corrected bootstrap results for Sobel: `reps' replications\"\n",
" display\n",
"\n",
" quietly bootstrap coef=r(sobel), reps(`reps') level(`level'): sgboot `varlist' , mv(`mv') iv(`iv') cv(`cv' )\n",
" estat bootstrap, bc percentile noheader\n",
" }\n",
"\n",
"end\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"最后你在使用的时候, 就可以直接调用这个命令即可:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
" med | Freq. Percent Cum.\n",
"------------+-----------------------------------\n",
" 0 | 573 66.55 66.55\n",
" 1 | 288 33.45 100.00\n",
"------------+-----------------------------------\n",
" Total | 861 100.00\n",
"\n",
" med | Freq. Percent Cum.\n",
"------------+-----------------------------------\n",
" 0 | 730 62.50 62.50\n",
" 1 | 438 37.50 100.00\n",
"------------+-----------------------------------\n",
" Total | 1,168 100.00\n",
"\n",
"Model with dv regressed on iv\n",
"\n",
" Source | SS df MS Number of obs = 861\n",
"-------------+---------------------------------- F(6, 854) = 2.76\n",
" Model | .459843972 6 .076640662 Prob > F = 0.0115\n",
" Residual | 23.6942548 854 .027745029 R-squared = 0.0190\n",
"-------------+---------------------------------- Adj R-squared = 0.0121\n",
" Total | 24.1540988 860 .028086161 Root MSE = .16657\n",
"\n",
"------------------------------------------------------------------------------\n",
" dep | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n",
"-------------+----------------------------------------------------------------\n",
" ind | -.0745096 .0352931 -2.11 0.035 -.1437809 -.0052382\n",
" control1 | -.0003018 .0043662 -0.07 0.945 -.0088715 .008268\n",
" control2 | -.0133247 .00578 -2.31 0.021 -.0246693 -.00198\n",
" control3 | -.0044711 .0044726 -1.00 0.318 -.0132497 .0043075\n",
" control4 | .1799002 .0983623 1.83 0.068 -.01316 .3729603\n",
" control5 | .0340114 .0191444 1.78 0.076 -.0035642 .0715871\n",
" _cons | .7167078 .2593365 2.76 0.006 .2076962 1.22572\n",
"------------------------------------------------------------------------------\n",
"\n",
"Model with mediator regressed on iv\n",
"\n",
" Source | SS df MS Number of obs = 861\n",
"-------------+---------------------------------- F(6, 854) = 2.90\n",
" Model | 3.83093084 6 .638488473 Prob > F = 0.0083\n",
" Residual | 187.834574 854 .219946808 R-squared = 0.0200\n",
"-------------+---------------------------------- Adj R-squared = 0.0131\n",
" Total | 191.665505 860 .222866867 Root MSE = .46898\n",
"\n",
"------------------------------------------------------------------------------\n",
" med | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n",
"-------------+----------------------------------------------------------------\n",
" ind | -.1950457 .0993702 -1.96 0.050 -.3900841 -7.35e-06\n",
" control1 | .0279925 .0122933 2.28 0.023 .0038638 .0521211\n",
" control2 | .0083398 .016274 0.51 0.608 -.0236019 .0402814\n",
" control3 | .0271718 .0125929 2.16 0.031 .0024551 .0518885\n",
" control4 | .8851904 .2769459 3.20 0.001 .3416161 1.428765\n",
" control5 | .0229235 .0539025 0.43 0.671 -.0828733 .1287204\n",
" _cons | 2.094221 .73018 2.87 0.004 .6610633 3.527379\n",
"------------------------------------------------------------------------------\n",
"\n",
"Model with dv regressed on mediator and iv\n",
"\n",
" Source | SS df MS Number of obs = 861\n",
"-------------+---------------------------------- F(7, 853) = 2.99\n",
" Model | .578216371 7 .082602339 Prob > F = 0.0042\n",
" Residual | 23.5758824 853 .027638784 R-squared = 0.0239\n",
"-------------+---------------------------------- Adj R-squared = 0.0159\n",
" Total | 24.1540988 860 .028086161 Root MSE = .16625\n",
"\n",
"------------------------------------------------------------------------------\n",
" dep | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n",
"-------------+----------------------------------------------------------------\n",
" med | -.0251037 .0121303 -2.07 0.039 -.0489124 -.0012949\n",
" ind | -.0794059 .0353048 -2.25 0.025 -.1487004 -.0101114\n",
" control1 | .000401 .004371 0.09 0.927 -.0081783 .0089802\n",
" control2 | -.0131153 .0057698 -2.27 0.023 -.02444 -.0017906\n",
" control3 | -.003789 .0044762 -0.85 0.398 -.0125746 .0049966\n",
" control4 | .2021217 .0987592 2.05 0.041 .0082821 .3959613\n",
" control5 | .0345869 .0191098 1.81 0.071 -.0029208 .0720946\n",
" _cons | .7692805 .2600831 2.96 0.003 .2588026 1.279758\n",
"------------------------------------------------------------------------------\n",
"\n",
"Sobel-Goodman Mediation Tests\n",
"\n",
" Coef Std Err Z P>|Z|\n",
"Sobel .00489637 . . .\n",
"Goodman-1 .00489637 . . .\n",
"Goodman-2 .00489637 . . .\n",
"\n",
"Pecent of total effect that is mediated: -6.57 %\n",
"Ratio of indirect to direct effect: -0.0617\n"
]
}
],
"source": [
"sgmediation dep, mv(med) iv(ind) cv(control1 control2 control3 control4 control5 )"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Stata",
"language": "stata",
"name": "stata"
},
"language_info": {
"file_extension": ".do",
"mimetype": "text/x-stata",
"name": "stata"
}
},
"nbformat": 4,
"nbformat_minor": 2
}