推荐书: Python开发最佳实践

stata教程12-回归模型的非线性检验

分享时@该用户已经被封, 我就能回答你的问题奥!

文章目录
  1. 1. 理论知识
  2. 2. stata实践


理论知识

很多变量之间的关系是非线性的, 因此多元线性回归只能被看作非线性经济关系的一种一介近似。但是二阶甚至更高阶的函数关系也很重要, 那么当昨晚多元线性回归后, 我们可以检验是否存在多阶的函数关系, 具体来说就是我们可以做Ramsey’s RESET检验和连接检验。

考虑以下回归方程:

$$ y = x'\beta + \xi $$

回归后的拟合值:

$$ \hat y = x'b $$

RESET检验就是构建以下回归方程, 并对原假设($H0: \delta_2=\delta_3=\delta_4=0$

RESET检验的另一种形式是使用解释变量的幂作为非线性项。

另一种模型设定检验方法是连接检验(link test), 它的回归方程是:

$$ y = \delta_0 + \delta_1 \hat y + \delta_2 \hat y^2 + e $$

stata实践

data/nerlove.dta数据为例, 下面先加载数据:

1
use data/nerlove.dta, clear
输出(stream):
(Nerlove 1963 paper)

看一下数据的基本情况:

1
des
输出(stream):
Contains data from data/nerlove.dta obs: 145 Nerlove 1963 paper vars: 10 13 Aug 2012 10:00 size: 5,220 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- storage display value variable name type format label variable label --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- tc float %9.0g total cost q int %8.0g total output pl float %9.0g price of labor pf float %9.0g price of fuel pk int %8.0g user cost of capital lntc float %9.0g lnq float %9.0g lnpf float %9.0g lnpk float %9.0g lnpl float %9.0g --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sorted by:

首先进行多元线性回归:

1
reg lntc lnq lnpl lnpk lnpf
输出(stream):
Source | SS df MS Number of obs = 145 -------------+---------------------------------- F(4, 140) = 437.90 Model | 269.524728 4 67.3811819 Prob > F = 0.0000 Residual | 21.5420958 140 .153872113 R-squared = 0.9260 -------------+---------------------------------- Adj R-squared = 0.9239 Total | 291.066823 144 2.02129738 Root MSE = .39227 ------------------------------------------------------------------------------ lntc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnq | .7209135 .0174337 41.35 0.000 .6864462 .7553808 lnpl | .4559645 .299802 1.52 0.131 -.1367602 1.048689 lnpk | -.2151476 .3398295 -0.63 0.528 -.8870089 .4567136 lnpf | .4258137 .1003218 4.24 0.000 .2274721 .6241554 _cons | -3.566513 1.779383 -2.00 0.047 -7.084448 -.0485779 ------------------------------------------------------------------------------

进行连接检验:

1
linktest
输出(stream):
Source | SS df MS Number of obs = 145 -------------+---------------------------------- F(2, 142) = 1460.70 Model | 277.574775 2 138.787388 Prob > F = 0.0000 Residual | 13.4920481 142 .095014423 R-squared = 0.9536 -------------+---------------------------------- Adj R-squared = 0.9530 Total | 291.066823 144 2.02129738 Root MSE = .30824 ------------------------------------------------------------------------------ lntc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _hat | .791953 .0293837 26.95 0.000 .733867 .8500389 _hatsq | .0941454 .0102281 9.20 0.000 .0739264 .1143643 _cons | -.0962174 .0425807 -2.26 0.025 -.1803914 -.0120434 ------------------------------------------------------------------------------

我们可以看到, 二次项的系数显著, 可以拒绝原假设, 说明模型存在着设定误差, 因此需要考虑假如多次项, 下面进行RESET检验:

1
estat ovtest
输出(stream):
Ramsey RESET test using powers of the fitted values of lntc Ho: model has no omitted variables F(3, 137) = 32.72 Prob > F = 0.0000

F检验的p值显著, 说明存在设定误差, 下面使用解释变量的幂次项:

1
estat ovtest, rhs
输出(stream):
Ramsey RESET test using powers of the independent variables Ho: model has no omitted variables F(12, 128) = 8.96 Prob > F = 0.0000

结果同样显著。

因此我们考虑纳入解释变量lnq的二次项:

1
gen lnq2 = lnq^2
1
reg lntc lnq lnpl lnpk lnpf lnq2
输出(stream):
Source | SS df MS Number of obs = 145 -------------+---------------------------------- F(5, 139) = 622.86 Model | 278.630831 5 55.7261661 Prob > F = 0.0000 Residual | 12.4359927 139 .089467573 R-squared = 0.9573 -------------+---------------------------------- Adj R-squared = 0.9557 Total | 291.066823 144 2.02129738 Root MSE = .29911 ------------------------------------------------------------------------------ lntc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnq | .1166562 .0613522 1.90 0.059 -.004648 .2379605 lnpl | .0206146 .2326431 0.09 0.930 -.4393621 .4805913 lnpk | -.568725 .2614871 -2.17 0.031 -1.085732 -.0517185 lnpf | .4804816 .0766894 6.27 0.000 .3288531 .6321101 lnq2 | .0536124 .0053141 10.09 0.000 .0431055 .0641194 _cons | -.1627064 1.398139 -0.12 0.908 -2.927075 2.601662 ------------------------------------------------------------------------------

从上面的结果中可以看出, lnq2的系数是显著的, 说明这个变量的确影响了被解释变量。

下面再次进行连接检验:

1
linktest
输出(stream):
Source | SS df MS Number of obs = 145 -------------+---------------------------------- F(2, 142) = 1591.85 Model | 278.638903 2 139.319451 Prob > F = 0.0000 Residual | 12.4279206 142 .087520568 R-squared = 0.9573 -------------+---------------------------------- Adj R-squared = 0.9567 Total | 291.066823 144 2.02129738 Root MSE = .29584 ------------------------------------------------------------------------------ lntc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _hat | 1.009721 .0365875 27.60 0.000 .9373943 1.082047 _hatsq | -.0031437 .0103516 -0.30 0.762 -.0236068 .0173193 _cons | -.0013733 .0394759 -0.03 0.972 -.0794096 .0766631 ------------------------------------------------------------------------------

二次项的系数已经不显著了, 再次进行RESET检验:

1
estat ovtest
输出(stream):
Ramsey RESET test using powers of the fitted values of lntc Ho: model has no omitted variables F(3, 136) = 1.19 Prob > F = 0.3165

RESET检验在此说明, 函数设定误差基本被消除。

注意
本文由jupyter notebook转换而来, 您可以在这里下载notebook
有问题可以直接在下方留言
或者给我发邮件675495787[at]qq.com
请记住我的网址: mlln.cn 或者 jupyter.cn