{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "title: stata教程08-中介效应分析\n", "date: 2018-12-22 15:17:55\n", "tags: [stata]\n", "toc: true\n", "mathjax: true\n", "\n", "---\n", "\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 中介分析原理\n", "\n", "下面是我之前写过的关于中介效应的文章, 大家看后就知道原理了:\n", "\n", "\n", " \n", " May 2015\n", " \n", " SPSS实例:[16]中介效应的检验过程\n", "\n", "\n", " \n", " Feb 2016\n", " \n", " SPSS实例:[18]中介效应占总效应百分比\n", "\n", "\n", "\n", " \n", " Jan 2016\n", " \n", " SPSS实例:[20]检验中介效应的操作方法\n", "\n", "\n", "\n", " \n", " Oct 2016\n", " \n", " SPSS实例:[17]进行sobel检验(小白教程)\n", "\n", "\n", "\n", " \n", " Oct 2016\n", " \n", " 在线绘制中介效应图\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在这里我重新声明一下具体的过程:\n", "\n", "下面的回归模型中都带有控制变量,只不过为了简洁,没有在下面描述。首先使用自变量ind预测因变量dep, 得到模型1(`dep=c1 * ind +e1`), 然后使用自变量ind预测中介变量med, 得到模型2(`med=a * ind +e2`), 最后使用自变量ind和中介变量med预测因变量dep, 得到模型3(`dep=b* med + c2 * ind + e3`)。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 本案例的数据介绍\n", "\n", "本案例使用的是自己编制的数据,自变量就是ind, 因变量就是dep, 中介变量就是med, 其他控制变量都以`control+数字`的格式命名。\n", "\n", "下面加载这个数据:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "use \"data/mediator-data.dta\", clear" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 过滤缺失值\n", "\n", "我们需要做三个回归分析, 但是因为回归分析涉及的变量不同, 如果变量存在缺失值, 那么很有可能造成三个回归方程使用的观测数据有差异(因为有不同的缺失值), 所以我们再做回归之前, 先要生成一个miss_num变量, 如果自变量/中介变量/因变量/控制变量都没有缺失, 那么miss_num=0, 否则miss_num>0。" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "egen miss_num = rowmiss(dep med ind control1 control2 control3 control4 control5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "看一下缺失情况: 从下表可以看出, 没有缺失的有861个样本。" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " miss_num | Freq. Percent Cum.\n", "------------+-----------------------------------\n", " 0 | 861 73.72 73.72\n", " 1 | 298 25.51 99.23\n", " 2 | 1 0.09 99.32\n", " 3 | 1 0.09 99.40\n", " 4 | 7 0.60 100.00\n", "------------+-----------------------------------\n", " Total | 1,168 100.00\n" ] } ], "source": [ "tab miss_num" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 回归1: 自变量预测因变量\n", "\n", "`dep=c1 * ind +e1`" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " Source | SS df MS Number of obs = 861\n", "-------------+---------------------------------- F(6, 854) = 2.76\n", " Model | .459843972 6 .076640662 Prob > F = 0.0115\n", " Residual | 23.6942548 854 .027745029 R-squared = 0.0190\n", "-------------+---------------------------------- Adj R-squared = 0.0121\n", " Total | 24.1540988 860 .028086161 Root MSE = .16657\n", "\n", "------------------------------------------------------------------------------\n", " dep | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n", "-------------+----------------------------------------------------------------\n", " ind | -.0745096 .0352931 -2.11 0.035 -.1437809 -.0052382\n", " control1 | -.0003018 .0043662 -0.07 0.945 -.0088715 .008268\n", " control2 | -.0133247 .00578 -2.31 0.021 -.0246693 -.00198\n", " control3 | -.0044711 .0044726 -1.00 0.318 -.0132497 .0043075\n", " control4 | .1799002 .0983623 1.83 0.068 -.01316 .3729603\n", " control5 | .0340114 .0191444 1.78 0.076 -.0035642 .0715871\n", " _cons | .7167078 .2593365 2.76 0.006 .2076962 1.22572\n", "------------------------------------------------------------------------------\n" ] } ], "source": [ "reg dep ind control1 control2 control3 control4 control5 if miss_num==0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "从上面的结果中可以看到, c1这个系数是显著的, c1 = -.0745096, sc1 = .0352931" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 回归2: 自变量预测中介变量\n", "\n", "`med=a * ind +e2`" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " Source | SS df MS Number of obs = 861\n", "-------------+---------------------------------- F(6, 854) = 2.90\n", " Model | 3.83093084 6 .638488473 Prob > F = 0.0083\n", " Residual | 187.834574 854 .219946808 R-squared = 0.0200\n", "-------------+---------------------------------- Adj R-squared = 0.0131\n", " Total | 191.665505 860 .222866867 Root MSE = .46898\n", "\n", "------------------------------------------------------------------------------\n", " med | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n", "-------------+----------------------------------------------------------------\n", " ind | -.1950457 .0993702 -1.96 0.050 -.3900841 -7.35e-06\n", " control1 | .0279925 .0122933 2.28 0.023 .0038638 .0521211\n", " control2 | .0083398 .016274 0.51 0.608 -.0236019 .0402814\n", " control3 | .0271718 .0125929 2.16 0.031 .0024551 .0518885\n", " control4 | .8851904 .2769459 3.20 0.001 .3416161 1.428765\n", " control5 | .0229235 .0539025 0.43 0.671 -.0828733 .1287204\n", " _cons | 2.094221 .73018 2.87 0.004 .6610633 3.527379\n", "------------------------------------------------------------------------------\n" ] } ], "source": [ "reg med ind control1 control2 control3 control4 control5 if miss_num == 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "从上面的结果中可以看到, 这个系数是显著的, a =-.1950457, sa = .0993702" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 回归3: 自变量和中介变量预测因变量\n", "\n", "`dep=b* med + c2 * ind + e3`" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " Source | SS df MS Number of obs = 861\n", "-------------+---------------------------------- F(7, 853) = 2.99\n", " Model | .578216371 7 .082602339 Prob > F = 0.0042\n", " Residual | 23.5758824 853 .027638784 R-squared = 0.0239\n", "-------------+---------------------------------- Adj R-squared = 0.0159\n", " Total | 24.1540988 860 .028086161 Root MSE = .16625\n", "\n", "------------------------------------------------------------------------------\n", " dep | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n", "-------------+----------------------------------------------------------------\n", " med | -.0251037 .0121303 -2.07 0.039 -.0489124 -.0012949\n", " ind | -.0794059 .0353048 -2.25 0.025 -.1487004 -.0101114\n", " control1 | .000401 .004371 0.09 0.927 -.0081783 .0089802\n", " control2 | -.0131153 .0057698 -2.27 0.023 -.02444 -.0017906\n", " control3 | -.003789 .0044762 -0.85 0.398 -.0125746 .0049966\n", " control4 | .2021217 .0987592 2.05 0.041 .0082821 .3959613\n", " control5 | .0345869 .0191098 1.81 0.071 -.0029208 .0720946\n", " _cons | .7692805 .2600831 2.96 0.003 .2588026 1.279758\n", "------------------------------------------------------------------------------\n" ] } ], "source": [ "reg dep med ind control1 control2 control3 control4 control5 if miss_num == 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "从上面的结果中可以看到, 这个b系数是显著的, b =--.0251037, sb = .0121303\n", "\n", "c2系数也是显著的: c2 = -.0794059, sc2 = .0353048" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 结论\n", "\n", "因为所有系数都是显著的, 所以我们可以认为中介效应是存在的, 并且属于部分中介效应。中介的效应量可以这样计算:\n", "\n", "`m = a*b`" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ".00489637\n" ] } ], "source": [ "display -.1950457 * -.0251037" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "中介效应占总效应的百分比就是:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-6.5714602\n" ] } ], "source": [ "display (-.1950457 * -.0251037) / -.0745096 * 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 使用插件\n", "\n", "实际上我们可以把以上的代码都封装成一个命令, 恰好我在网上找到了一段代码, 可以做中介效应。你需要把以下代码保存到stata安装目录的这个路径下:\n", "\n", "`Stata15\\ado\\base\\s`, 在这个文件夹下创建一个文件名为`sgmediation.ado`, 把以下代码贴进去, 然后重启stata。\n", "\n", "```stata\n", "*! version 1.1.1 -- 5/17/06 -- pbe\n", "*! verion 1.0 -- 2/28/05 -- pbe\n", "program define sgmediation\n", "/* sobel-goodman mediation tests */\n", "version 8.0\n", "syntax varlist(max=1) [if/] [in], iv(varlist numeric max=1) ///\n", " mv(varlist numeric max=1) [ cv(varlist numeric) BOOTstrap reps(integer 200) level(integer 95)]\n", "marksample touse\n", "markout `touse' `varlist' `mv' `iv' `cv'\n", "tempname coef emat\n", "\n", "display\n", "tabulate `mv' if `touse'\n", "\n", "display\n", "display as text \"Model with dv regressed on iv\"\n", "regress `varlist' `iv' `cv' if `touse'\n", "local ccoef=_b[`iv']\n", "\n", "display\n", "display \"Model with mediator regressed on iv\"\n", "regress `mv' `iv' `cv' if `touse'\n", "\n", "local acoef=_b[`iv']\n", "local avar=_se[`iv']^2\n", "\n", "display\n", "display \"Model with dv regressed on mediator and iv\"\n", "regress `varlist' `mv' `iv' `cv' if `touse'\n", "\n", "local bcoef=_b[`mv']\n", "local bvar=_se[`mv']^2\n", "\n", "local sobel =(`acoef'*`bcoef')\n", "local serr=sqrt(`bcoef'^2*`avar' + `acoef'^2*`bvar')\n", "local stest=`sobel'/`serr'\n", "local g1err=sqrt(`bcoef'^2*`avar' + `acoef'^2*`bvar' + `avar'*`bvar')\n", "local good1=`sobel'/`g1err'\n", "local g2err=sqrt(`bcoef'^2*`avar' + `acoef'^2*`bvar' - `avar'*`bvar')\n", "local good2=`sobel'/`g2err'\n", "local toteff = `sobel'/((`acoef'*`bcoef')+(`ccoef'-(`acoef'*`bcoef')))\n", "local ratio = `sobel'/((`ccoef'-(`acoef'*`bcoef')))\n", "\n", "display\n", "display \"Sobel-Goodman Mediation Tests\"\n", "display\n", "display \" Coef Std Err Z P>|Z|\"\n", "display as txt \"Sobel \" as res `sobel' _skip(4) `serr' %8.4g ///\n", "`stest', _skip(5) 2*(1-norm(abs(`stest')))\n", "display as txt \"Goodman-1 \" as res `sobel' _skip(4) `g1err' %8.4g ///\n", "`good1', _skip(5) 2*(1-norm(abs(`good1')))\n", "display as txt \"Goodman-2 \" as res `sobel' _skip(4) `g2err' %8.4g ///\n", "`good2', _skip(5) 2*(1-norm(abs(`good2')))\n", "display\n", "display as txt \"Pecent of total effect that is mediated: \", as res ///\n", "%5.2f 100*`toteff',\"%\"\n", "display as txt \"Ratio of indirect to direct effect: \", as res %8.4f `ratio'\n", "\n", "if \"`bootstrap'\"~=\"\" {\n", " display \n", " display as txt \"Percentile and Bias-corrected bootstrap results for Sobel: `reps' replications\"\n", " display\n", "\n", " quietly bootstrap coef=r(sobel), reps(`reps') level(`level'): sgboot `varlist' , mv(`mv') iv(`iv') cv(`cv' )\n", " estat bootstrap, bc percentile noheader\n", " }\n", "\n", "end\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "最后你在使用的时候, 就可以直接调用这个命令即可:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", " med | Freq. Percent Cum.\n", "------------+-----------------------------------\n", " 0 | 573 66.55 66.55\n", " 1 | 288 33.45 100.00\n", "------------+-----------------------------------\n", " Total | 861 100.00\n", "\n", " med | Freq. Percent Cum.\n", "------------+-----------------------------------\n", " 0 | 730 62.50 62.50\n", " 1 | 438 37.50 100.00\n", "------------+-----------------------------------\n", " Total | 1,168 100.00\n", "\n", "Model with dv regressed on iv\n", "\n", " Source | SS df MS Number of obs = 861\n", "-------------+---------------------------------- F(6, 854) = 2.76\n", " Model | .459843972 6 .076640662 Prob > F = 0.0115\n", " Residual | 23.6942548 854 .027745029 R-squared = 0.0190\n", "-------------+---------------------------------- Adj R-squared = 0.0121\n", " Total | 24.1540988 860 .028086161 Root MSE = .16657\n", "\n", "------------------------------------------------------------------------------\n", " dep | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n", "-------------+----------------------------------------------------------------\n", " ind | -.0745096 .0352931 -2.11 0.035 -.1437809 -.0052382\n", " control1 | -.0003018 .0043662 -0.07 0.945 -.0088715 .008268\n", " control2 | -.0133247 .00578 -2.31 0.021 -.0246693 -.00198\n", " control3 | -.0044711 .0044726 -1.00 0.318 -.0132497 .0043075\n", " control4 | .1799002 .0983623 1.83 0.068 -.01316 .3729603\n", " control5 | .0340114 .0191444 1.78 0.076 -.0035642 .0715871\n", " _cons | .7167078 .2593365 2.76 0.006 .2076962 1.22572\n", "------------------------------------------------------------------------------\n", "\n", "Model with mediator regressed on iv\n", "\n", " Source | SS df MS Number of obs = 861\n", "-------------+---------------------------------- F(6, 854) = 2.90\n", " Model | 3.83093084 6 .638488473 Prob > F = 0.0083\n", " Residual | 187.834574 854 .219946808 R-squared = 0.0200\n", "-------------+---------------------------------- Adj R-squared = 0.0131\n", " Total | 191.665505 860 .222866867 Root MSE = .46898\n", "\n", "------------------------------------------------------------------------------\n", " med | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n", "-------------+----------------------------------------------------------------\n", " ind | -.1950457 .0993702 -1.96 0.050 -.3900841 -7.35e-06\n", " control1 | .0279925 .0122933 2.28 0.023 .0038638 .0521211\n", " control2 | .0083398 .016274 0.51 0.608 -.0236019 .0402814\n", " control3 | .0271718 .0125929 2.16 0.031 .0024551 .0518885\n", " control4 | .8851904 .2769459 3.20 0.001 .3416161 1.428765\n", " control5 | .0229235 .0539025 0.43 0.671 -.0828733 .1287204\n", " _cons | 2.094221 .73018 2.87 0.004 .6610633 3.527379\n", "------------------------------------------------------------------------------\n", "\n", "Model with dv regressed on mediator and iv\n", "\n", " Source | SS df MS Number of obs = 861\n", "-------------+---------------------------------- F(7, 853) = 2.99\n", " Model | .578216371 7 .082602339 Prob > F = 0.0042\n", " Residual | 23.5758824 853 .027638784 R-squared = 0.0239\n", "-------------+---------------------------------- Adj R-squared = 0.0159\n", " Total | 24.1540988 860 .028086161 Root MSE = .16625\n", "\n", "------------------------------------------------------------------------------\n", " dep | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n", "-------------+----------------------------------------------------------------\n", " med | -.0251037 .0121303 -2.07 0.039 -.0489124 -.0012949\n", " ind | -.0794059 .0353048 -2.25 0.025 -.1487004 -.0101114\n", " control1 | .000401 .004371 0.09 0.927 -.0081783 .0089802\n", " control2 | -.0131153 .0057698 -2.27 0.023 -.02444 -.0017906\n", " control3 | -.003789 .0044762 -0.85 0.398 -.0125746 .0049966\n", " control4 | .2021217 .0987592 2.05 0.041 .0082821 .3959613\n", " control5 | .0345869 .0191098 1.81 0.071 -.0029208 .0720946\n", " _cons | .7692805 .2600831 2.96 0.003 .2588026 1.279758\n", "------------------------------------------------------------------------------\n", "\n", "Sobel-Goodman Mediation Tests\n", "\n", " Coef Std Err Z P>|Z|\n", "Sobel .00489637 . . .\n", "Goodman-1 .00489637 . . .\n", "Goodman-2 .00489637 . . .\n", "\n", "Pecent of total effect that is mediated: -6.57 %\n", "Ratio of indirect to direct effect: -0.0617\n" ] } ], "source": [ "sgmediation dep, mv(med) iv(ind) cv(control1 control2 control3 control4 control5 )" ] } ], "metadata": { "kernelspec": { "display_name": "Stata", "language": "stata", "name": "stata" }, "language_info": { "file_extension": ".do", "mimetype": "text/x-stata", "name": "stata" } }, "nbformat": 4, "nbformat_minor": 2 }