Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: maybe, S- and M-estimator for multivariate mean, scatter for large k_vars #9244

Open
josef-pkt opened this issue May 12, 2024 · 3 comments

Comments

@josef-pkt
Copy link
Member

josef-pkt commented May 12, 2024

(This is just an idea, I have not seen it in the literature, but I did not look at the high-dimension robust cov literature)

Problem
As k_vars increase, efficiency of CovS increases but also the bias.
According Rocke this is because distribution of mahalanobis distance concentrates around mean and we reject too few points.
(or something like that. I did not go through the details.)
To avoid this problem with standard norms like Bisquare, he proposes a new norm that rejects on both sides (low and high values) and is adjusted to changing k_vars.

Idea
We use chisquare as reference distribution for the mahalanobis distances.
Instead of using the mahalanobis distances we use maha distances that are scaled and demeaned so the usual norm would reject on both sides.
https://en.wikipedia.org/wiki/Chi-squared_distribution#Related_distributions

as k->inf: $(\chi _{k}^2-k)/{\sqrt{2k}}~{\xrightarrow {d}}\ N(0,1)$
and
If $X\sim \chi _{\nu }^{2},$ and $c>0$, then $cX\sim \Gamma (k=\nu /2,\theta =2c)$ (gamma distribution)
see also https://modelassist.epixanalytics.com/space/EA/26575265/Normal+approximation+to+the+Chi+Squared+distribution

So we could define new norm, rho that adjusts to k_vars directly
rho(d2) = rho_base(c * d2 - m)

or maybe better use z-scoring for chi distribution for d = sqrt(d2) instead of chisquare. The approximation should be better.
That is expectations for breakdown point and efficiency are computed using chi distribution with loc, scale != (0,1).

standard recommendation is CovMM for k_vars <= 15, or maybe <= 10
Somewhere (Maronna article to compare robust cov?) it is recommended to use OGK for large k_vars, and Rocke norm for intermediate k_vars.

The above "zscored" maha might be an alternative to ogk for large k_vars for M and DetS.

Needs to be verified.

@josef-pkt
Copy link
Member Author

thinking a bit more:

All the standard theory applies, we just use a different norm than the base norm, we just use a different, transformed random variable in the base norm.

Aside: I haven't checked. Does Rocke norm in his parameterization satisfy the assumptions on an M-scale rho?
rho(0) ?

@josef-pkt
Copy link
Member Author

josef-pkt commented May 15, 2024

This is close to what Rocke is doing, for the limit as k -> inf
e.g. rocke top of p. 1339
Note that, for large p, the constants are approximately M = sqrt(p) and c = z_alpha / sqrt(2)

For large k: mean of chi distribution is approximately sqrt(k - 0.5) and var = 0.5

However in smaller samples center M shifts with width using predefined fixed asymptotic rejection point M+c

One problem: The mean would be for maha given the normal distribution. If the distribution is not normal, then the mean will differ and the loss function is not concentrated around the mean/central observations.
Rocke mentions median scaling in the application/examples, but does not provide details.

My guess is that if we rescale by median of maha to match median of chi/chi2, then we recenter the maha to the appropiate part of the loss/weight function.
However, this will add an extra scaling term in the computation. we already have M-scale to estimate the scale.

AFAIU, for the usual norms, we get good results for elliptical distribution, but only the scale is inconsistent when computing an estimate for the covariance, variances if we don't have a normal distribution. (Also breakdown point is calibrated for normal distribution and breakdown point will be different for non-normal distributions.)
AFAIU, Rocke norm would also mess up the location and shape estimate if norm is not correctly centered.

Possibly:
We could add a scaling option to mahalanobis function directly, that does the rescaling.
Current function covariance._rescale rescales the covariance matrix and not the maha distances directly.
Rescaling in maha function would be cheap.

There is still the problem with the additional scaling extra computation.
Under normal reference, scaling facto is just 1, but for standard errors and similar we need to take the extra transformation into account, e.g. derivatives of rho, psi, weights, .. with respect to parameters.

@josef-pkt
Copy link
Member Author

trying out just shifting d in the tukeybiweight norm by subtracting median of chi distribution.
This did not work when computing relative efficiency.
I get strange results either eff too small or eff > 1.
I don't see why a linear transformation inside a standard norm like biweight does not seem to work. (or the efficiency formula has some hidden assumptions)

I guess just implementing translated biweight tbiweight and biflat in Rocke will be more straightforward.
i.e. we need 2 extra norms instead of generic transformed norms.
(similar norm still missing is smoothed threshold, which might be similar to tbiweight)

aside
tbiweight converges to winsorized norm as c -> 0. (metric trimming) LWS in Rocke, our TrimmedMean norm

The main problem is that biweight S and MM-estimator which we have now are not good at k_vars greater than 10 or 15.
So we don't have anything in this case, but ogk is ok even for large k_vars.

(The results in Rocke are for fixed breakdown point as k increases. We could also limit efficiency for larger k_vars.
The problem with biweight compared to tbiweight is that biweight cannot have a steep slope right after the bulk of the data.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant