ENH: add CovDetMCD and det for regression #9227

josef-pkt · 2024-04-22T15:56:21Z

plan is to add CovDetMCD, CovDetS for covariance and RLMDetS and RLMDetMM for regression.

code for starting sets and CovDetXxx is in robust.covariance

CovDetMCD
no sixpack starting sets yet
The results, cov estimate are in the same neighborhood as R robustbase/rrcov, but still different. I don't see where the difference comes from, I am not sure I understand what R is doing. Currently I'm only comparing "raw" mcd estimates.
example dataset is hbk

statsmodels/robust/covariance.py

pep8speaks · 2024-04-23T16:08:27Z

Hello @josef-pkt! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-05-25 01:25:23 UTC

josef-pkt · 2024-04-23T17:20:09Z

trying out RLMDetS and MM with hbk datasets

DetS with current starts excludes the good influential points 10:14, besides the outliers and bad influential points 0:10.
(S in R robustbase also includes the good influential points 10:14 in the S-estimator. I'm currently not able to get those because I don't allow x-outlier/ influential points in the starting set, and I'm not getting a slope that would catch the 10:14 points)

MM gets the good influential points back
And using RLM with high efficience biweight mean norm, and high breakdown m-scale norm with the same S-estimator start_params produces results very similar to MM.
res_ms = RLM.from_formula("Y ~ X1 + X2 + X3", dta_hbk, M=norm_m).fit(start_params=res.params, scale_est=mscale_biw)
instead of fixed scale as in MM
(I don't remember if this is "restricted M" estimator.

resid scaled for DetS

resid scaled for MM using DetS as start_params

josef-pkt · 2024-04-24T20:22:27Z

R includes the first step of BACON in the r6pack starting sets.
This seems to use mahalanobis distance base on median and euclidean distance.
We apply in to robust zscored data.

This is essentially the same as the initial step in _cov_starting

The BACON article has a version 1 base set that uses standard maha distance with mean and pearson covariance.
Billor, Nedret, Ali S. Hadi, and Paul F. Velleman. 2000. “BACON: Blocked Adaptive Computationally Efficient Outlier Nominators.” Computational Statistics & Data Analysis 34 (3): 279–98. https://doi.org/10.1016/S0167-9473(99)00101-2.

aside:
r6pack includes some kind of reweighting step (that I'm not able to replicate).
I don't see a reference for it, and I have not seen the reweighting step is in the original DetXxx articles,

In my current version my 6pack (or 4 pack right now) differs from the R version in around 4 to 7 indices, but r6pack also excludes all influential points (1:14) including good influential points.

I'm giving up trying to match R Det starting sets exactly. Plus I will keep some of the extra starting sets in _cov_starting.

josef-pkt · 2024-04-24T21:05:43Z

more local search:

in the hbk example include_endog in starting maha distances improves the estimate, M-scale is lower than exog only.

mod_smm = RLMDetSMM(dta_hbk["Y"], exog_df, include_endog=True)
res_smm = mod_smm.fit(h=40)

By default we could use both types of starting sets, exog only and [endog, exog]. Then I need trivariate option for
start data "exog", "endog", "both".
This will increase the chance of finding an (almost) global optimum and still be much cheaper than FastXxx.

josef-pkt · 2024-04-25T16:57:28Z

Problem: What if we have only one regressor besides constant?

It breaks in my current (uncommitted) version, e.g. in spearmanr starting set.
In general the sixpack has several that are only correlation, i.e. diagonal is 1. Spearmanr in my scipy version raises exception.
In other cases (don't remember where) I had used returning only corr coefficient if we have only two variables, instead of 2 by 2 cov matrix.

Example Hertzsprung Russel diagram in notebook https://www.statsmodels.org/dev/examples/notebooks/generated/robust_models_1.html
which is also used in R robustbase lmrob docstring

needs to be fixed for RLMDetS, and maybe special cased or raising exception in covariance.DetMCD and similar.

It looks like my current RLMDet models need at least 3 slope variables in the starting set computation.

problem with scipy, I'm using '1.7.3'

spearmanr has exception, 1 value or cov matrix depending on number of columns

stats.spearmanr(np.random.randn(50, 3))
SpearmanrResult(correlation=array([[ 1.        , -0.11308523,  0.29392557],
       [-0.11308523,  1.        ,  0.13939976],
       [ 0.29392557,  0.13939976,  1.        ]]), pvalue=array([[0.        , 0.43426076, 0.03828331],
       [0.43426076, 0.        , 0.33429506],
       [0.03828331, 0.33429506, 0.        ]]))

stats.spearmanr(np.random.randn(50, 2))
SpearmanrResult(correlation=-0.03442977190876351, pvalue=0.8123713972775561)

stats.spearmanr(np.random.randn(50, 1))
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_2228\3897072665.py in <module>
----> 1 stats.spearmanr(np.random.randn(50, 1))

~\Downloads\WPy64-39100\python-3.9.10.amd64\lib\site-packages\scipy\stats\stats.py in spearmanr(a, b, axis, nan_policy, alternative)
   4479 
   4480     if axisout == 0:
-> 4481         if (a[:, 0][0] == a[:, 0]).all() or (a[:, 1][0] == a[:, 1]).all():
   4482             # If an input is constant, the correlation coefficient
   4483             # is not defined.

IndexError: index 1 is out of bounds for axis 1 with size 1

I guess I can fix the case for k = 2, e.g. in spearmanr,
and add special code for k=1, e.g. h set closest to median, smallest abs(x - median(x)), scale does not matter.

update I replaced scipy spearmanr but new stats.covariance.corr_rank which always returns 2-dim matrix
Then case k=2 works.

Case k=1 without special casing now fails in cov_iter starting set computation

For this case I could get two starting sets

smallest abs(x - median(x)) (metric trimming)
and equal tail trimming: take h // 2 observations from both sides of median, i.e. quantiles, adjusted for count=h.

even if k=1 case, i.e. only 1 exog slope variable, we could still include endog to get to k=2 in dataset for finding starting sets.

another case: k=0, i.e. constant only regression
What do we do?
There are no exog outlier.
Any "outlier" ranking of exog would be arbitrary and not depend on exog values.
But we still have multiple local optima with redescending estimators, e.g. 3 clusters of y, middle cluster is small.
What's the S-estimator for the mean?
standard S-estimator starting with median (and possibly quantiles) and qn_scale (or mad, scale_tau, ...)

Possible problems with 3 or more clusters.
What if the median is not in the main cluster?
E.g. two outside x or/and y clusters with the "outlier" cluster in the middle.

That's not really the use case for "robust" including "resistant", we assume that our model is for the central data.
Identifying and estimating for several clusters is a different issue, e.g. mixed model, cluster analysis, ...
e.g. if no cluster has 50% of the observations.

josef-pkt · 2024-04-25T17:29:12Z

~~(faking MM by adding two random exog)~~ now replaced

This includes MM results of the model with 1 slope parameter, but including endog in the start maha computation

statsmodels/robust/covariance.py

josef-pkt · 2024-04-29T15:02:46Z

problem for citing RLMDetS and RLMDetMM

I don't find a reference for it. All the printed articles that I used, are for multivariate location and scatter, and I don't find an article directly for the regression version.
DetR package also does not have any citation directly related to deterministic starts in regression.
There is a conference abstract that includes DetLTS, but I don't find an article for it. https://ww2.amstat.org/meetings/JSM/2010/onlineprogram/AbstractDetails.cfm?abstractid=307805

There is an article (*) for multivariate regression based on MCD scatter matrices in the linear moment conditions.
(Article is older than DetMCD).

Maybe I saw detxxx regression in comments in articles, but I need to go through all the articles again.

(*) Rousseeuw, Peter J., Stefan Van Aelst, Katrien Van Driessen, and Jose Agulló. 2004. “Robust Multivariate Regression.” Technometrics 46 (3): 293–305.

josef-pkt · 2024-05-01T16:37:15Z

notebook draft for S- and MM-regression, based on current (uncommitted) code
https://gist.github.com/josef-pkt/dd18a8ec5e968a3e4c426c98a8aa43d0

strange, the "raw" cell with the results from R does not show up in the gist
fixed now by converting raw to markdown cell

josef-pkt · 2024-05-02T14:47:49Z

I still have problems with norm class versus norm instance as argument to function and model __init__.
continuation of #9186 (comment)

The inplace modification of a user provided norm instance to adjust tuning parameter sounds much too fragile.
But if I don't have an instance as an argument, then I need the extra keyword parameters.

current problem: what should the norm be in RLMDetS, string, class or instance?

    def __init__(self, endog, exog, norm=None, breakdown_point=0.5,
                 col_indices=None, include_endog=False):

If string or class, then I need kwds_norm
If instance, then I need to set the tuning parameter, if breakdown_point is not None.

related problem: RLMDetSMM needs two instances of the norm with different tuning parameters. So, I need to be able to create a new instance with the same kwd arguments but different tuning.

    def __init__(self, endog, exog, norm=None, efficiency=0.95, 
                 breakdown_point=0.5, col_indices=None, include_endog=False):

One possibility is to add clone or copy to create new instances that can be modified.
This will add even more code just to avoid the kwds_norm.
But there is also still the problem that tuning parameters have inconsistent naming across norms. i.e. we don't know generically the keyword name(s) for the norm parameters. :(

solution for now:
I will add a "clone", not "inplace" option to _set_tuning_param. This method has all the norm specific information and we don't need generic copy/clone.
def _set_tuning_param(self, c, inplace=False)

I'm currently only working with TukeyBiweight for DetS and DetMM. Other redescending norms can be added once the design has mostly settled.

josef-pkt · 2024-05-02T18:51:04Z

oops, unit test for tools fail.

inplace=False as default is backwards incompatible with robust.tools.
So, I will use for now the default inplace=True. (amending last commit and force pushing)
However, this should change back to False, when all norm classes have the keyword and can do inplace=False.

josef-pkt · 2024-05-08T22:03:59Z

some notes on CovM, similar to enhanced RLM (with M-scale, usage for delegated to by S-estimator)

weight and scale options, similar to M-estimators that I have already in functions, however several versions how mean and scatter norms or weights are linked
- for S-estimator, common norm: weight function for mean and shape and norm for scale (what I have in draft for usage by CovDetS)
- M-estimator with common norm, objective function, e.g. Kent, Tyler 1991, uniqueness for weakly redescending norms, t(df>=1) has unique scatter
- M-estimator of shape/scatter: scale not estimated, e.g. fixed to 1
- shape estimation with "ad-hoc" scaling, e.g. median maha
general M-estimation: norm for mean and scatter/scale can differ.

I might need a method option method="S" "M", "MM", with maybe extra option about scaling of scatter matrix, if not implied by method.

Problem: I still don't know how to compute efficiency for multivariate case to get the tuning parameter for desired efficiency.
Also, which efficiency, mean or shape, or even cov (which also depends on scale estimator).

Given this CovM, CovS only needs to handle starting sets and CovMM is just a call to CovM with fixed scale.
same pattern as for RLM, RLMDetS and RLMDetMM.
minimal code, but not tuned for computational efficiency and speed.

statsmodels/robust/norms.py

@@ -1,6 +1,6 @@
 import numpy as np

-# TODO: add plots to weighting functions for online docs.
+from . import tools as rtools


statsmodels/robust/tools.py

+def tuning_s_cov(norm, k_vars, breakdown_point=0.5, limits=()):
+    """Tuning parameter for multivariate S-estimator given breakdown point.
+    """
+    from .norms import TukeyBiweight  # avoid circular import


statsmodels/robust/covariance.py

josef-pkt · 2024-05-10T14:31:53Z

aside: rrcov CovSest and CovMMest only allows for biweight and Rocke norms.
CovMMest docstring does not mention which norm, but I guess it is implied by the Sest options.

currently I'm only working with biweight, but intend to add norm options.

josef-pkt · 2024-05-10T18:17:27Z

oops: Tyler constrained M-estimation requires scale > scale_S (not what I initially thought and used that scale < scale_S)

rho is an increasing function
increasing scale lowers rho(u / s), so lower rho value satisfies inequality constrained rho(u / s) <= s_0

However weights are a decreasing function w(|u| / s)
increasing scale increases weights, and becomes more inclusive
lowering scale lowers weights and becomes more restrictive

However, CM has a different objective from my MM scale adjustment.
CM wants to increase efficiency, i.e. reduce weights and include more points
MM with lower scale wants to find a better local solution without giving up the constrained that we increase scale beyond the MM with fixed scale_0 which would have guaranteed breakdown point.

maybe: to make my MM scaling theoretically clean, I could just re-estimate the MM-estimate with a new S-estimator start given by the MM parameter estimates. An extra detour step as guarantee to have the theoretical assertion on maintaining the breakdown point.

(all because I don't know what the breakdown point is if rho_mean and rho_scale have different breakdown points. Huber has an article but not directly with the result, and/or I don't understand enough to figure it out.)

josef-pkt · 2024-05-10T18:52:24Z

Kudraszow, Nadia L., and Ricardo A. Maronna. 2011. “Estimates of MM Type for the Multivariate Linear Model.” Journal of Multivariate Analysis 102 (9): 1280–92. https://doi.org/10.1016/j.jmva.2011.04.011.

has table 1 tuning parameter for breakdown point and
table 2 tuning parameter for given efficiency of mean parameter estimates
formula for ARE of mean is in equ. 6.6 which has 3 expectations, i.e. 3 integrals.

table 1 is the same as I have for bp=0.5 and k=3
however, for 95% efficiency at k=3 KM have c=5.48 in table 2 and rrcov CovMMest uses 5.49024917207033

CovMMest has c = 6.09626 if 95% efficiency for shape is requested (default)

(Maronna uses rho normalized to max rho = 1. That does not affect the rho function, but it's in terms of u/c.)

josef-pkt · 2024-05-11T14:11:58Z

CovMCD is unfinished and not working correctly yet (does not replicate R), I got distracted by CovM, CovS, covMM
CovMCD methods are just copy-pasted function from my experimental code.

methods take data, data is not in __init__. We need that at least for the helper methods to compute iteration for different starts and subsets of data.
The fit method could use data provided by __init__
This way CovMCD would be like a standard model and like CovM, and we hide the computation with multiple starts in private methods.
In contrast, RLMDetS just delegates the computation to RLM and creates a new RLM instance for every start. For MCD we don't have a different model class to delegate to.
CovDetS will be able to delegate to CovM.

update
stupid mistake call to maha function in CovDetMCD is wrong. The mahalanobis function does not take mean as argument
call is d = mahalanobis(x - mean, cov) instead of
d = mahalanobis(x, mean, cov)

better change signature of maha function, I make the same mistake also in interactive work.

My current uncommitted CovDetMCD now produces the same results as rrcov
CovMcd(x = hbk, raw.only = TRUE, nsamp = "deterministic", use.correction=FALSE)

statsmodels/robust/covariance.py

+        """
+
+        x = self.data
+        nobs, k_vars = x.shape


statsmodels/robust/covariance.py

+        """
+
+        x = self.data
+        nobs, k_vars = x.shape


josef-pkt · 2024-05-13T12:03:23Z

a bit strange:
default maxiter in refinement step for CovDetMCD was hardcoded at 100.
If I fix it and reduce maxiter_step to recommended 2, then unit test fails.
It looks like my current implementation requires large maxiter for every starting set.

one possible reason: Currently I only iterate the one best start to large maxiter. DetMCD article uses 2 best starts.

josef-pkt · 2024-05-15T14:17:36Z

Problem with merge conflict, cannot run CI anymore; rebase on main and force push

last commit added the last estimator class that I wanted.
CovDetMM
generic computation of efficiency and corresponding tuning parameter search is still missing.
Currently only values from Table 2 from Kudraszow and Maronna JMA 2011

detail:
Kent Tyler 1996 last page of appendix has explicit formulas for several statistics for biweight norm used in efficiency and similar calculations, mainly using explicit expression for truncated moments of chisquare distribution.
Kent, John T., and David E. Tyler. 1996. “Constrained M-Estimation for Multivariate Location and Scatter.” The Annals of Statistics 24 (3): 1346–70.

… for 0 or 1 start_exog

josef-pkt · 2024-05-15T21:55:28Z

efficiency again

The following expressions look like they are defined in the way I compute it.
Should be close enough to reverse engineer asymptotic relative efficiency ARE

equ. (5.3) on p. 1671 for cov of mean estimate
an AFAIR sigma_1 in equ (5.5) for shape estimate

Lopuhaa, Hendrik P. 1989. “On the Relation between S-Estimators and M-Estimators of Multivariate Location and Covariance.” The Annals of Statistics 17 (4): 1662–83.

proposition 7 and equ 6.6 in KM 2011 are similar for ARE of mean estimate
but Maronna uses different definition/parameterization and annoying notation.
(It should be the same once we define things in terms for psi, or notation in terms of basic functions.

Kudraszow, Nadia L., and Ricardo A. Maronna. 2011. “Estimates of MM Type for the Multivariate Linear Model.” Journal of Multivariate Analysis 102 (9): 1280–92. https://doi.org/10.1016/j.jmva.2011.04.011.

The empirical versions (w.r.t. sample distribution) will be relevant for computing standard errors, cov_params (under i.i.d. assumptions)

All this is under assumption for elliptical (elliptically symmetric) distributions so mean and cov (shape, scale) estimates are orthogonal (in expectation and asymptotically).
As in OLS cov_params for mean is then a separate block from block of cov_params for scale/cov of residuals.

update got it

numerical integration and rootfinding for tukeybiweight
integration w.r.t. chi distribution stats.chi(k).expect(...)

compare to table values
Note: c at (4, 0.95) differs a bit from table but agrees with R rrcov at 7 decimals.
rrcov has c = 5.81031555752526,
I have c = 5.810315517691177
we will have possible precision loss from dividing 2 values both from numerical integration, and rootfinding on top.

for k, eff0 in table:
    c0 = table[(k, eff0)]
    c_i = tuning_s_cov_eff(norm, k, efficiency=eff0, limits=())
    print((k, eff0), np.round(c_i, 4), c0)
(1, 0.8) 3.1369 3.14
(2, 0.8) 3.5101 3.51
(3, 0.8) 3.8235 3.82
(4, 0.8) 4.0976 4.1
(5, 0.8) 4.3433 4.34
(10, 0.8) 5.3219 5.39
(1, 0.9) 3.8827 3.88
(2, 0.9) 4.2821 4.28
(3, 0.9) 4.6175 4.62
(4, 0.9) 4.9104 4.91
(5, 0.9) 5.1727 5.18
(10, 0.9) 6.2124 6.38
(1, 0.95) 4.6851 4.68
(2, 0.95) 5.123 5.12
(3, 0.95) 5.4902 5.48
(4, 0.95) 5.8103 5.76
(5, 0.95) 6.0963 6.1
(10, 0.95) 7.2235 7.67

…*kwds) - 1 / eff). In the function: test_eff from the module: test_tools with norm.continuous != 2 use everywhere: _var_normal_jump.

josef-pkt · 2024-05-20T15:36:15Z

I'm still not sure how to add the tables for tuning parameters

in tools or separate module, current tools is relatively small file, so I added it there. If we add many tables it creates a lot of noise (extra objects) in the tools module.
which form
- latest pattern for multivariate mean efficiency: numpy array table -> dict -> function that returns dict value or computes value and adds to (module global) dict. This adds 4 objects to module for tukeybiweight. Need the same for shape efficiency.
  A class might make it easier to organize the different parts, but it should be a singleton with caching during a session.
- for univariate and scale: table dict hardcoded in module, access function is in norms.TukeyBiweight.get_tuning but currently only for tabulated values. (No fallback yet for KeyError.

For multivariate efficiency the keys are (k_vars, eff)
For univariate and scale we only have a single key, either bp or eff.

I'm currently preparing this only for TukeyBiweight.

Aside: I guess modifying a module global could cause problems in parallel processing, but caching new values should be an exceptional usecase for tukeybiweight (or others with hardcoded tables).
Maybe not after a brief search. It looks like multiprocessing creates separate variables, i.e. cache dict is not shared between processes, whether global or class variable.

statsmodels/robust/tools.py

+        tp = table_dict[(k, eff)]
+    except KeyError:
+        # compute and cache
+        from .norms import TukeyBiweight  # avoid circular import


josef-pkt · 2024-05-22T16:04:10Z

something weird in mcd reweighting in R

It looks like in the truncation correction, CovMcd/covMcd use the actual trimming and not the fraction used in the chisquare quantile.
not trimmed fraction is around 0.81 in hbk which has a much larger correction factor than using 0.975 from the chisquare threshold.

Maybe I did not read something correctly in the reweighting usage. However, I don't see why the number of outliers should affect the reweighted covariance. Adding more outliers keeping the same base data would increase the reweighted cov estimate.
something fishy.
Reference in CovMcd in rrcov for the reweighting step is Pison et al

It looks like CovOGK has a similar problem.
and my comments in unit test mention differences to R for ogk also

I need to read Pison in more details.
but using actual fraction for metric trimming seems wrong to me.
The raw MCD is trimming a fixed fraction of observations and there using that fraction is appropriate.

update
with my default reweighting
number of outliers does not affect result.
"clean" data given chi2 threshold remains the same

The same in R has changing cov because the actual trim fraction changes. (decreasing because actual trim fraction decreases)

rcov.CovDetMCD(x).fit(h=40).cov
array([[ 1.17921167,  0.06984352,  0.18960059,  0.10435   ],
       [ 0.06984352,  1.17914946,  0.16918226, -0.0038821 ],
       [ 0.18960059,  0.16918226,  1.11607161, -0.15249982],
       [ 0.10435   , -0.0038821 , -0.15249982,  0.32588457]])

rcov.CovDetMCD(x[5:]).fit(h=40).cov
array([[ 1.17921167,  0.06984352,  0.18960059,  0.10435   ],
       [ 0.06984352,  1.17914946,  0.16918226, -0.0038821 ],
       [ 0.18960059,  0.16918226,  1.11607161, -0.15249982],
       [ 0.10435   , -0.0038821 , -0.15249982,  0.32588457]])

rcov.CovDetMCD(x[10:]).fit(h=40).cov
array([[ 1.17921167,  0.06984352,  0.18960059,  0.10435   ],
       [ 0.06984352,  1.17914946,  0.16918226, -0.0038821 ],
       [ 0.18960059,  0.16918226,  1.11607161, -0.15249982],
       [ 0.10435   , -0.0038821 , -0.15249982,  0.32588457]])

I will stick with this and add to docs.

ogk changes sometimes, when an additional observation is identified as outlier

rcov.cov_ogk(x[:]).cov
array([[ 1.34572852,  0.11345852,  0.17108152,  0.09717998],
       [ 0.11345852,  1.35630001,  0.14182314, -0.02660783],
       [ 0.17108152,  0.14182314,  1.29784466, -0.14478753],
       [ 0.09717998, -0.02660783, -0.14478753,  0.37545695]])

rcov.cov_ogk(x[5:]).cov
array([[ 1.34572852,  0.11345852,  0.17108152,  0.09717998],
       [ 0.11345852,  1.35630001,  0.14182314, -0.02660783],
       [ 0.17108152,  0.14182314,  1.29784466, -0.14478753],
       [ 0.09717998, -0.02660783, -0.14478753,  0.37545695]])

rcov.cov_ogk(x[8:]).cov
array([[ 1.33775982,  0.07923416,  0.2150929 ,  0.11838014],
       [ 0.07923416,  1.33768924,  0.19192926, -0.00440405],
       [ 0.2150929 ,  0.19192926,  1.26613041, -0.17300383],
       [ 0.11838014, -0.00440405, -0.17300383,  0.36970062]])

rcov.cov_ogk(x[10:]).cov
array([[ 1.33775982,  0.07923416,  0.2150929 ,  0.11838014],
       [ 0.07923416,  1.33768924,  0.19192926, -0.00440405],
       [ 0.2150929 ,  0.19192926,  1.26613041, -0.17300383],
       [ 0.11838014, -0.00440405, -0.17300383,  0.36970062]])

josef-pkt · 2024-05-25T03:31:21Z

all green, merging
I need a break from robust

josef-pkt added type-enh comp-robust labels Apr 22, 2024

github-advanced-security bot found potential problems Apr 22, 2024

View reviewed changes

statsmodels/robust/covariance.py Fixed Show fixed Hide fixed

statsmodels/robust/covariance.py Fixed Show fixed Hide fixed

statsmodels/robust/covariance.py Fixed Show fixed Hide fixed

statsmodels/robust/covariance.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Apr 25, 2024

View reviewed changes

statsmodels/robust/covariance.py Fixed Show fixed Hide fixed

josef-pkt mentioned this pull request Apr 26, 2024

BUG: RLM fit with start_params raises if only one parameter #9233

Closed

josef-pkt mentioned this pull request Apr 29, 2024

ENH/BUG: fixed scale in RLM, float scale_est #9211

Closed

josef-pkt force-pushed the robust_resistant branch from 41b4bb2 to bb0a83c Compare May 2, 2024 19:03

josef-pkt mentioned this pull request May 10, 2024

ENH: robust default start_params in M-estimators, RLM, and CovM ? #9243

Open

github-advanced-security bot found potential problems May 10, 2024

View reviewed changes

josef-pkt mentioned this pull request May 10, 2024

Reference: scaling of scatter matrix to get covariance #3220

Open

github-advanced-security bot found potential problems May 12, 2024

View reviewed changes

statsmodels/robust/covariance.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems May 12, 2024

View reviewed changes

ENH: add CovDetMCD preliminary version

0bbdbfc

josef-pkt added 10 commits May 15, 2024 10:19

ENH: first version of RLMDetS

55e50f1

ENH: add minimal RLMDetSMM (including mscale version with constraint)

0803825

ENH/REF: add other cov to deterministic start, some ref and bug

f529a4c

BUG/ENH: fix RLM start_params closes statsmodels#9233. enable RLMDetS…

4520796

… for 0 or 1 start_exog

ENH/REF: norms in resistant models, smaller ENH in older code

aacdb2f

REF: norms _set_tuning_param default to inplace=False

a1cc26a

ENH: starting with CovM and CovS, small doc fix in TukeyBiweight

a491655

REF/BUG: fix CovDetMCD

6640925

ENH: add CovDetS, working version

23e27e2

ENH: add CovDetMM

d773281

josef-pkt force-pushed the robust_resistant branch from 888485a to d773281 Compare May 15, 2024 14:24

josef-pkt mentioned this pull request May 18, 2024

ENH: Improved performance test_eff(test_tools) 6 times faster #9248

Closed

3 tasks

Replacement in _get_tuning_param line: return _var_normal_jump(norm(*…

8517bc3

…*kwds) - 1 / eff). In the function: test_eff from the module: test_tools with norm.continuous != 2 use everywhere: _var_normal_jump.

ENH: add tuning parameter and efficiency computation for multivariate

e2e851f

github-advanced-security bot found potential problems May 20, 2024

View reviewed changes

statsmodels/robust/tools.py

tp = table_dict[(k, eff)]

except KeyError:

# compute and cache

from .norms import TukeyBiweight # avoid circular import

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module

statsmodels.robust.norms

begins an import cycle.

ENH/DOC: CovDetMCD add reweight, improve docstrings

a6dca5b

josef-pkt force-pushed the robust_resistant branch from 16454cb to 8a002f2 Compare May 25, 2024 01:24

STY: pep-8, flake-8

539f80f

josef-pkt force-pushed the robust_resistant branch from 8a002f2 to 539f80f Compare May 25, 2024 01:25

josef-pkt merged commit c1774ca into statsmodels:main May 25, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add CovDetMCD and det for regression #9227

ENH: add CovDetMCD and det for regression #9227

josef-pkt commented Apr 22, 2024

pep8speaks commented Apr 23, 2024 •

edited

josef-pkt commented Apr 23, 2024

josef-pkt commented Apr 24, 2024

josef-pkt commented Apr 24, 2024

josef-pkt commented Apr 25, 2024 •

edited

josef-pkt commented Apr 25, 2024 •

edited

josef-pkt commented Apr 29, 2024

josef-pkt commented May 1, 2024 •

edited

josef-pkt commented May 2, 2024

josef-pkt commented May 2, 2024

josef-pkt commented May 8, 2024

josef-pkt commented May 10, 2024

josef-pkt commented May 10, 2024

josef-pkt commented May 10, 2024

josef-pkt commented May 11, 2024 •

edited

josef-pkt commented May 13, 2024

josef-pkt commented May 15, 2024 •

edited

josef-pkt commented May 15, 2024 •

edited

josef-pkt commented May 20, 2024

josef-pkt commented May 22, 2024 •

edited

josef-pkt commented May 25, 2024

ENH: add CovDetMCD and det for regression #9227

ENH: add CovDetMCD and det for regression #9227

Conversation

josef-pkt commented Apr 22, 2024

pep8speaks commented Apr 23, 2024 • edited

Comment last updated at 2024-05-25 01:25:23 UTC

josef-pkt commented Apr 23, 2024

josef-pkt commented Apr 24, 2024

josef-pkt commented Apr 24, 2024

josef-pkt commented Apr 25, 2024 • edited

josef-pkt commented Apr 25, 2024 • edited

josef-pkt commented Apr 29, 2024

josef-pkt commented May 1, 2024 • edited

josef-pkt commented May 2, 2024

josef-pkt commented May 2, 2024

josef-pkt commented May 8, 2024

josef-pkt commented May 10, 2024

josef-pkt commented May 10, 2024

josef-pkt commented May 10, 2024

josef-pkt commented May 11, 2024 • edited

josef-pkt commented May 13, 2024

josef-pkt commented May 15, 2024 • edited

josef-pkt commented May 15, 2024 • edited

josef-pkt commented May 20, 2024

josef-pkt commented May 22, 2024 • edited

josef-pkt commented May 25, 2024

pep8speaks commented Apr 23, 2024 •

edited

josef-pkt commented Apr 25, 2024 •

edited

josef-pkt commented Apr 25, 2024 •

edited

josef-pkt commented May 1, 2024 •

edited

josef-pkt commented May 11, 2024 •

edited

josef-pkt commented May 15, 2024 •

edited

josef-pkt commented May 15, 2024 •

edited

josef-pkt commented May 22, 2024 •

edited