| Approximate Solutions to Linear Regression ProblemsHere we assume that we have a first order solution to a regression problem, in the form
 
Y = Σ  wi Ri, 
where Y is the response, wi are the regression coefficients, and Ri
are the independent variables. The number of variables is very high, and the independent variables are highly correlated.
We want to improve the model by considering a second order regression of the form
 
Y = Σ  wi Ri + Σ  wij cij mij Ri Rj , 
where 
 
In practice, some of the Ris are highly correlated and grouped into clusters. These clusters can be identified
by using a clustering algorithm on the cijs. For example, one could think of a model with  two clusters A and B such ascij = correlation between Ri and Rj
wij = |wiwj|0.5 x sgn(wiwj)
mij  are arbitrary constants
 
Y = Σ  wi Ri + mA ΣA wij cij  Ri Rj 
+ mB  ΣB wij cij  Ri Rj 
where 
 
An interesting case occurs when the cluster structure is
so strong thatΣA (resp. ΣB) are taken over all i < j belonging to A (resp. B)
mij = mA (constant) if i, j belong to cluster A
mij = mB (constant) if i, j belong to cluster B
 
This particular case results in|cij| = 1 if i and j belong to the same cluster (either A or B)
cij = 0 otherwise
 
mA = 4 / [1 + (1+8kA)0.5] mB =  4 / [1 + (1+8kB)0.5]
 
where 
kA= ΣA |cij| and kB= ΣB |cij|.
 
Question
 
If the cluster structure is moderately strong, with the correlations cij close to 1, -1 or 0, how accurate is the above formula 
involving kA and kB? Here we assume that the wis are known or approximated. Typically, wi is a constant or
 wi is a simple function of the correlation between Y and Ri.
 
Alternate Approach
 
Let us consider a simplified model involving one cluster, with mij = constant = m. For instance, the unique cluster could
consist of all variables i, j with |cij| > 0.70. The model can be written as 
 
Y = Σ  wi Ri + m Σ  wij cij Ri Rj. 
We want to find m that provides the best improvement over the first order model, in terms of residual error. The first order model corresponds to
m = 0.
Let us introduce the following notations:
 
Without loss of generality, let us consider the slightly modified (centered) modelW = Σ  wij cij Ri Rj, 
V = W - u, where u = average(W) (Thus V is the centered W, with mean 0),
S= Σ  wi Ri. (average(S) = average(Y) by construction) 
 
Y = S + m V. 
Then 
 
m = [ Transposed(V) x (Y-S) ]  / [ Transposed(V) x V ], 
where 
Further ImprovementsY, S, and V are vectors with n rows, 
n is the number of observations.
 
The alternate approach could be incorporated in an iterative algorithm, where at each step a new cluster is added. So at each step we would have the same computation for m, optimizing the residual error on
 
Y = S + m V. 
However this time, S would contain all the clusters detected during the previous step, and V would contain the new cluster
being added to the model.
 
 
 |