Other changes:
- Removed reference to the
.A
attribute and replaced it with.toarray()
. - Add support between formulaic and pandas 3.0
Breaking changes:
- To unify the API,
DenseMatrix
does not inherit fromnp.ndarray
anymore. To convert aDenseMatrix
to anp.ndarray
, useDenseMatrix.unpack
. - Similarly,
SparseMatrix
does not inherit fromsps.csc_matrix
anymore. To convert aSparseMatrix
to asps.csc_matrix
, useSparseMatrix.unpack
.
New features:
- Added column name and term name metadata to
MatrixBase
objects. These are automatically populated when initializing aMatrixBase
from apandas.DataFrame
. In addition, they can be accessed and modified via theMatrixBase.column_names
andMatrixBase.term_names
properties. - Added a formula interface for creating tabmat matrices from pandas data frames. See
tabmat.from_formula
for details. - Added support for missing values in
CategoricalMatrix
by either creating a separate category for them or treating them as all-zero rows. - Added support for handling missing categorical values in pandas data frames.
Bug fix:
- Added cython compiler directive
legacy_implicit_noexcept = True
to fix performance regression with cython 3.
Other changes:
- Refactored the pre-commit hooks to use ruff.
- Refactored
CategoricalMatrix.transpose_matvec
to be deterministic when using OpenMP. - Adjusted transformation to sparse format in
tabmat.from_pandas
to future changes in pandas.
Other changes:
- Pypi release is now done using trusted publisher.
- Fix build and upload of
x86_64
wheels on Linux.
Other changes:
- Fixed macos arm64 wheels with proper linkage.
Other changes:
- Improve the performance of
from_pandas
in the case of low-cardinality categorical variables. - Require Python>=3.9 in line with NEP 29
- Build and test with Python 3.12 in CI.
- Fixed macos arm64 wheels with proper linkage.
Bug fixes:
- We fixed a bug in the dense sandwich product, which would previously segfault for very large matrices.
- Fixed the column order when initializing a
SplitMatrix
from a list containing otherSplitMatrix
objects. - Fixed
getcol
not respecting thedrop_first
attribute of aCategoricalMatrix
.
Other changes:
- Support building on architectures that are unsupported by xsimd.
Other changes:
- The C++ types have been refactored. Loop indices are now using the
Py_ssize_t
type. Integers now have a templated type as well. - The documentation for
matvec
andmatvec_transpose
has been updated to reflect actual behavior. - Checks for dimension mismatch in
matvec
andmatvec_transpose
arguments have been added. - Remove upper pin on xsimd.
Bug fix:
- We fixed a bug in the cross sandwich product, which would previously segfault for very large matrices.
Bug fix:
- We fixed a bug in the dense sandwich product, which would previously segfault for very large F-contiguous matrices.
Bug fix:
- We fixed a bug in the dense matrix-vector and sandwich products, which would previously segfault for very large matrices.
Bug fix:
- Fixed the loading of jemalloc in Apple Silicon wheels.
Other changes:
- Build and upload wheels for Apple Silicon.
Other changes:
- Next attempt to build wheel for PyPI without
march=native
.
Other changes:
- Add Python 3.10 support to CI (remove Python 3.6).
- Build wheel for PyPI without
march=native
.
New feature
tabmat.CategoricalMatrix
now accepts a drop_first argurment. This allows the user to drop the first column of a CategoricalMatrix to avoid multicollinearity problems in unregularized models.tabmat.StandardizedMatrix
andtabmat.MatrixBase
now support the multiply method.
Bug fix
- Always use 64bit integers for indexing in
tabmat.ext.sparse.sparse_sandwich
to avoid segmentation faults on very wide problems.
Bug fix
- Disable the use of static TLS in the Linux wheels to avoid issues with too small TLS on some distributions.
Bug fix
- We fixed a bug in
tabmat.SplitMatrix.matvec
, where incorrect matrix vector products were computed when aSplitMatrix
did not contain any dense components.
Other changes
- We are now specifying the run time dependencies in
setup.py
, so that missing dependencies are automatically installed from PyPI when installingtabmat
via pip.
Other changes
- tabmat is now available on PyPI and will be automatically updated when a new release is published.
Bug fix
- We now support
xsimd>=8
and support alternative jemalloc installations.
Bug fix
- Allow to link to alternatively suffixed jemalloc installation to work around #113 .
Bug fix
- The license was mistakenly left as proprietary. Corrected to BSD-3-Clause.
Other changes
- ReadTheDocs integration.
- CONTRIBUTING.md
- Correct pyproject.toml to work with PEP-517
Breaking changes:
- The package has been renamed to
tabmat
. CELEBRATE! - The
one_over_var_inf_to_val
function has been made private. - The
csc_to_split
function has been re-named totabmat.from_csc
to match thetabmat.from_pandas
function. - The
tabmat.MatrixBase.get_col_means
andtabmat.MatrixBase.get_col_stds
methods have been made private. - The
cross_sandwich
method has also been made private.
Bug fix
StandardizedMatrix.transpose_matvec
was giving the wrong answer when the out parameter was provided. This is now fixed.SplitMatrix.__repr__
now calls the __repr__ method of component matrices instead of __str__.
Other changes
- Optimized the
tabmat.SparseMatrix.matvec
andtabmat.SparseMatrix.transpose_matvec
for whenrows
andcols
are None. - Implemented
CategoricalMatrix.__rmul__
- Reorganizing the documentation and updating the text to match the current API.
- Enable indexing the rows of a
CategoricalMatrix
. PreviouslyCategoricalMatrix.__getitem__
only supported column indexing. - Allow creating a
SplitMatrix
from a list of anyMatrixBase
objects including anotherSplitMatrix
. - Reduced memory usage in
tabmat.SplitMatrix.matvec
.
Bug fix
- In
SplitMatrix.sandwich
, when a col subset was specified, incorrect output was produced if the components of the indices array were not sorted.SplitMatrix.__init__
now checks for sorted indices and maintains sorted index lists when combining matrices.
Other changes
SplitMatrix.__init__
now filters out any empty matrices.StandardizedMatrix.sandwich
passesrows=None
andcols=None
onwards to the underlying matrix instead of replacing them with full arrays of indices. This should improve performance slightly.SplitMatrix.__repr__
now includes the type of the underlying matrix objects in the string output.
Bug fix
Sparse matrices now accept 64-bit indices on Windows.
Bug fix:
Split matrices now also work on Windows.
Breaking changes:
We renamed several public functions to make them private. These include functions in tabmat.benchmark
that are unlikely to be used outside of this package as well as
tabmat.dense_matrix._matvec_helper
tabmat.sparse_matrix._matvec_helper
.tabmat.split_matrix._prepare_out_array
.
Other changes:
- We removed the dependency on
sparse_dot_mkl
. We now usescipy.sparse.csr_matvec
instead ofsparse_dot_mkl.dot_product_mkl
on all platforms, because the former suffered from poor performance, especially on narrow problems. This also means that we removed the functiontabmat.sparse_matrix._dot_product_maybe_mkl
. - We updated the pre-commit hooks and made sure the code is line with the new hooks.
Other changes:
We are now also making releases for Windows.
Other changes:
Still trying.
Other changes:
We are trying to make releases for Windows.
Bug fixes:
- Added a check that matrices are two-dimensional in the
SplitMatrix.__init__
- Replace
np.int
withnp.int64
where appropriate due to NumPy deprecation ofnp.int
.
Other changes:
- Added Python 3.9 support.
- Use
scipy.sparse
dot product when MKL isn't available.
Bug fixes:
- Handling for nulls when setting up a
CategoricalMatrix
- Fixes to make several functions work with both row and col restrictions and out
Other changes:
- Added various tests and documentation improvements
Breaking change:
- Rename dot to matvec. Our dot function supports matrix-vector multiplication for every subclass, but only supports matrix-matrix multiplication for some. We therefore rename it to matvec in line with other libraries.
Bug fix:
- Fix a bug in matvec for categorical components when the number of categories exceeds the number of rows.
See git history.