up follow livre

This commit is contained in:
Tykayn 2025-08-30 18:14:14 +02:00 committed by tykayn
parent b4b4398bb0
commit 3a7a3849ae
12242 changed files with 2564461 additions and 6914 deletions

View file

@ -0,0 +1,350 @@
"""
===================================
Sparse arrays (:mod:`scipy.sparse`)
===================================
.. currentmodule:: scipy.sparse
.. toctree::
:hidden:
sparse.csgraph
sparse.linalg
sparse.migration_to_sparray
SciPy 2-D sparse array package for numeric data.
.. note::
This package is switching to an array interface, compatible with
NumPy arrays, from the older matrix interface. We recommend that
you use the array objects (`bsr_array`, `coo_array`, etc.) for
all new work.
When using the array interface, please note that:
- ``x * y`` no longer performs matrix multiplication, but
element-wise multiplication (just like with NumPy arrays). To
make code work with both arrays and matrices, use ``x @ y`` for
matrix multiplication.
- Operations such as ``sum``, that used to produce dense matrices, now
produce arrays, whose multiplication behavior differs similarly.
- Sparse arrays use array style *slicing* operations, returning scalars,
1D, or 2D sparse arrays. If you need 2D results, use an appropriate index.
E.g. ``A[:, i, None]`` or ``A[:, [i]]``.
- All index arrays for a given sparse array should be of same dtype.
For example, for CSR format, ``indices`` and ``indptr`` should have
the same dtype. For COO, each array in `coords` should have same dtype.
The construction utilities (`eye`, `kron`, `random`, `diags`, etc.)
have appropriate replacements (see :ref:`sparse-construction-functions`).
For more information see
:ref:`Migration from spmatrix to sparray <migration_to_sparray>`.
Submodules
==========
.. autosummary::
csgraph - Compressed sparse graph routines
linalg - Sparse linear algebra routines
Sparse array classes
====================
.. autosummary::
:toctree: generated/
bsr_array - Block Sparse Row array
coo_array - A sparse array in COOrdinate format
csc_array - Compressed Sparse Column array
csr_array - Compressed Sparse Row array
dia_array - Sparse array with DIAgonal storage
dok_array - Dictionary Of Keys based sparse array
lil_array - Row-based list of lists sparse array
sparray - Sparse array base class
.. _sparse-construction-functions:
Building sparse arrays
----------------------
.. autosummary::
:toctree: generated/
diags_array - Return a sparse array from diagonals
eye_array - Sparse MxN array whose k-th diagonal is all ones
random_array - Random values in a given shape array
block_array - Build a sparse array from sub-blocks
.. _combining-arrays:
Combining arrays
----------------
.. autosummary::
:toctree: generated/
kron - Kronecker product of two sparse arrays
kronsum - Kronecker sum of sparse arrays
block_diag - Build a block diagonal sparse array
tril - Lower triangular portion of a sparse array
triu - Upper triangular portion of a sparse array
hstack - Stack sparse arrays horizontally (column wise)
vstack - Stack sparse arrays vertically (row wise)
Sparse tools
------------
.. autosummary::
:toctree: generated/
save_npz - Save a sparse array to a file using ``.npz`` format.
load_npz - Load a sparse array from a file using ``.npz`` format.
find - Return the indices and values of the nonzero elements
get_index_dtype - determine a good dtype for index arrays.
safely_cast_index_arrays - cast index array dtype or raise if shape too big
Identifying sparse arrays
-------------------------
.. autosummary::
:toctree: generated/
issparse - Check if the argument is a sparse object (array or matrix).
Sparse matrix classes
=====================
.. autosummary::
:toctree: generated/
bsr_matrix - Block Sparse Row matrix
coo_matrix - A sparse matrix in COOrdinate format
csc_matrix - Compressed Sparse Column matrix
csr_matrix - Compressed Sparse Row matrix
dia_matrix - Sparse matrix with DIAgonal storage
dok_matrix - Dictionary Of Keys based sparse matrix
lil_matrix - Row-based list of lists sparse matrix
spmatrix - Sparse matrix base class
Building sparse matrices
------------------------
.. autosummary::
:toctree: generated/
eye - Sparse MxN matrix whose k-th diagonal is all ones
identity - Identity matrix in sparse matrix format
diags - Return a sparse matrix from diagonals
spdiags - Return a sparse matrix from diagonals
bmat - Build a sparse matrix from sparse sub-blocks
random - Random values in a given shape matrix
rand - Random values in a given shape matrix (old interface)
**Combining matrices use the same functions as for** :ref:`combining-arrays`.
Identifying sparse matrices
---------------------------
.. autosummary::
:toctree: generated/
issparse
isspmatrix
isspmatrix_csc
isspmatrix_csr
isspmatrix_bsr
isspmatrix_lil
isspmatrix_dok
isspmatrix_coo
isspmatrix_dia
Warnings
========
.. autosummary::
:toctree: generated/
SparseEfficiencyWarning
SparseWarning
Usage information
=================
There are seven available sparse array types:
1. csc_array: Compressed Sparse Column format
2. csr_array: Compressed Sparse Row format
3. bsr_array: Block Sparse Row format
4. lil_array: List of Lists format
5. dok_array: Dictionary of Keys format
6. coo_array: COOrdinate format (aka IJV, triplet format)
7. dia_array: DIAgonal format
To construct an array efficiently, use any of `coo_array`,
`dok_array` or `lil_array`. `dok_array` and `lil_array`
support basic slicing and fancy indexing with a similar syntax
to NumPy arrays. The COO format does not support indexing (yet)
but can also be used to efficiently construct arrays using coord
and value info.
Despite their similarity to NumPy arrays, it is **strongly discouraged**
to use NumPy functions directly on these arrays because NumPy typically
treats them as generic Python objects rather than arrays, leading to
unexpected (and incorrect) results. If you do want to apply a NumPy
function to these arrays, first check if SciPy has its own implementation
for the given sparse array class, or **convert the sparse array to
a NumPy array** (e.g., using the `toarray` method of the class)
before applying the method.
All conversions among the CSR, CSC, and COO formats are efficient,
linear-time operations.
To perform manipulations such as multiplication or inversion, first
convert the array to either CSC or CSR format. The `lil_array`
format is row-based, so conversion to CSR is efficient, whereas
conversion to CSC is less so.
Matrix vector product
---------------------
To do a vector product between a 2D sparse array and a vector use
the matmul operator (i.e., ``@``) which performs a dot product (like the
``dot`` method):
>>> import numpy as np
>>> from scipy.sparse import csr_array
>>> A = csr_array([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
>>> v = np.array([1, 0, -1])
>>> A @ v
array([ 1, -3, -1], dtype=int64)
The CSR format is especially suitable for fast matrix vector products.
Example 1
---------
Construct a 1000x1000 `lil_array` and add some values to it:
>>> from scipy.sparse import lil_array
>>> from scipy.sparse.linalg import spsolve
>>> from numpy.linalg import solve, norm
>>> from numpy.random import rand
>>> A = lil_array((1000, 1000))
>>> A[0, :100] = rand(100)
>>> A.setdiag(rand(1000))
Now convert it to CSR format and solve A x = b for x:
>>> A = A.tocsr()
>>> b = rand(1000)
>>> x = spsolve(A, b)
Convert it to a dense array and solve, and check that the result
is the same:
>>> x_ = solve(A.toarray(), b)
Now we can compute norm of the error with:
>>> err = norm(x-x_)
>>> err < 1e-9
True
It should be small :)
Example 2
---------
Construct an array in COO format:
>>> from scipy import sparse
>>> from numpy import array
>>> I = array([0,3,1,0])
>>> J = array([0,3,1,2])
>>> V = array([4,5,7,9])
>>> A = sparse.coo_array((V,(I,J)),shape=(4,4))
Notice that the indices do not need to be sorted.
Duplicate (i,j) entries are summed when converting to CSR or CSC.
>>> I = array([0,0,1,3,1,0,0])
>>> J = array([0,2,1,3,1,0,0])
>>> V = array([1,1,1,1,1,1,1])
>>> B = sparse.coo_array((V,(I,J)),shape=(4,4)).tocsr()
This is useful for constructing finite-element stiffness and mass matrices.
Further details
---------------
CSR column indices are not necessarily sorted. Likewise for CSC row
indices. Use the ``.sorted_indices()`` and ``.sort_indices()`` methods when
sorted indices are required (e.g., when passing data to other libraries).
"""
# Original code by Travis Oliphant.
# Modified and extended by Ed Schofield, Robert Cimrman,
# Nathan Bell, and Jake Vanderplas.
import warnings as _warnings
import importlib as _importlib
from ._base import *
from ._csr import *
from ._csc import *
from ._lil import *
from ._dok import *
from ._coo import *
from ._dia import *
from ._bsr import *
from ._construct import *
from ._extract import *
from ._matrix import spmatrix
from ._matrix_io import *
from ._sputils import get_index_dtype, safely_cast_index_arrays
# Deprecated namespaces, to be removed in v2.0.0
from . import (
base, bsr, compressed, construct, coo, csc, csr, data, dia, dok, extract,
lil, sparsetools, sputils
)
_submodules = ["csgraph", "linalg"]
__all__ = [s for s in dir() if not s.startswith('_')] + _submodules
# Filter PendingDeprecationWarning for np.matrix introduced with numpy 1.15
msg = 'the matrix subclass is not the recommended way'
_warnings.filterwarnings('ignore', message=msg)
def __dir__():
return __all__
def __getattr__(name):
if name in _submodules:
return _importlib.import_module(f'scipy.sparse.{name}')
else:
try:
return globals()[name]
except KeyError:
raise AttributeError(
f"Module 'scipy.sparse' has no attribute '{name}'"
)
from scipy._lib._testutils import PytestTester
test = PytestTester(__name__)
del PytestTester

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,880 @@
"""Compressed Block Sparse Row format"""
__docformat__ = "restructuredtext en"
__all__ = ['bsr_array', 'bsr_matrix', 'isspmatrix_bsr']
from warnings import warn
import numpy as np
from scipy._lib._util import copy_if_needed
from ._matrix import spmatrix
from ._data import _data_matrix, _minmax_mixin
from ._compressed import _cs_matrix
from ._base import issparse, _formats, _spbase, sparray
from ._sputils import (isshape, getdtype, getdata, to_native, upcast,
check_shape)
from . import _sparsetools
from ._sparsetools import (bsr_matvec, bsr_matvecs, csr_matmat_maxnnz,
bsr_matmat, bsr_transpose, bsr_sort_indices,
bsr_tocsr)
class _bsr_base(_cs_matrix, _minmax_mixin):
_format = 'bsr'
def __init__(self, arg1, shape=None, dtype=None, copy=False,
blocksize=None, *, maxprint=None):
_data_matrix.__init__(self, arg1, maxprint=maxprint)
if issparse(arg1):
if arg1.format == self.format and copy:
arg1 = arg1.copy()
else:
arg1 = arg1.tobsr(blocksize=blocksize)
self.indptr, self.indices, self.data, self._shape = (
arg1.indptr, arg1.indices, arg1.data, arg1._shape
)
elif isinstance(arg1,tuple):
if isshape(arg1):
# it's a tuple of matrix dimensions (M,N)
self._shape = check_shape(arg1)
M,N = self.shape
# process blocksize
if blocksize is None:
blocksize = (1,1)
else:
if not isshape(blocksize):
raise ValueError(f'invalid blocksize={blocksize}')
blocksize = tuple(blocksize)
self.data = np.zeros((0,) + blocksize, getdtype(dtype, default=float))
R,C = blocksize
if (M % R) != 0 or (N % C) != 0:
raise ValueError('shape must be multiple of blocksize')
# Select index dtype large enough to pass array and
# scalar parameters to sparsetools
idx_dtype = self._get_index_dtype(maxval=max(M//R, N//C, R, C))
self.indices = np.zeros(0, dtype=idx_dtype)
self.indptr = np.zeros(M//R + 1, dtype=idx_dtype)
elif len(arg1) == 2:
# (data,(row,col)) format
coo = self._coo_container(arg1, dtype=dtype, shape=shape)
bsr = coo.tobsr(blocksize=blocksize)
self.indptr, self.indices, self.data, self._shape = (
bsr.indptr, bsr.indices, bsr.data, bsr._shape
)
elif len(arg1) == 3:
# (data,indices,indptr) format
(data, indices, indptr) = arg1
# Select index dtype large enough to pass array and
# scalar parameters to sparsetools
maxval = 1
if shape is not None:
maxval = max(shape)
if blocksize is not None:
maxval = max(maxval, max(blocksize))
idx_dtype = self._get_index_dtype((indices, indptr), maxval=maxval,
check_contents=True)
if not copy:
copy = copy_if_needed
self.indices = np.array(indices, copy=copy, dtype=idx_dtype)
self.indptr = np.array(indptr, copy=copy, dtype=idx_dtype)
self.data = getdata(data, copy=copy, dtype=dtype)
if self.data.ndim != 3:
raise ValueError(
f'BSR data must be 3-dimensional, got shape={self.data.shape}'
)
if blocksize is not None:
if not isshape(blocksize):
raise ValueError(f'invalid blocksize={blocksize}')
if tuple(blocksize) != self.data.shape[1:]:
raise ValueError(
f'mismatching blocksize={blocksize}'
f' vs {self.data.shape[1:]}'
)
else:
raise ValueError('unrecognized bsr_array constructor usage')
else:
# must be dense
try:
arg1 = np.asarray(arg1)
except Exception as e:
raise ValueError("unrecognized form for "
f"{self.format}_matrix constructor") from e
if isinstance(self, sparray) and arg1.ndim != 2:
raise ValueError(f"BSR arrays don't support {arg1.ndim}D input. Use 2D")
arg1 = self._coo_container(arg1, dtype=dtype).tobsr(blocksize=blocksize)
self.indptr, self.indices, self.data, self._shape = (
arg1.indptr, arg1.indices, arg1.data, arg1._shape
)
if shape is not None:
self._shape = check_shape(shape)
else:
if self.shape is None:
# shape not already set, try to infer dimensions
try:
M = len(self.indptr) - 1
N = self.indices.max() + 1
except Exception as e:
raise ValueError('unable to infer matrix dimensions') from e
else:
R,C = self.blocksize
self._shape = check_shape((M*R,N*C))
if self.shape is None:
if shape is None:
# TODO infer shape here
raise ValueError('need to infer shape')
else:
self._shape = check_shape(shape)
if dtype is not None:
self.data = self.data.astype(getdtype(dtype, self.data), copy=False)
self.check_format(full_check=False)
def check_format(self, full_check=True):
"""Check whether the array/matrix respects the BSR format.
Parameters
----------
full_check : bool, optional
If `True`, run rigorous check, scanning arrays for valid values.
Note that activating those check might copy arrays for casting,
modifying indices and index pointers' inplace.
If `False`, run basic checks on attributes. O(1) operations.
Default is `True`.
"""
M,N = self.shape
R,C = self.blocksize
# index arrays should have integer data types
if self.indptr.dtype.kind != 'i':
warn(f"indptr array has non-integer dtype ({self.indptr.dtype.name})",
stacklevel=2)
if self.indices.dtype.kind != 'i':
warn(f"indices array has non-integer dtype ({self.indices.dtype.name})",
stacklevel=2)
# check array shapes
if self.indices.ndim != 1 or self.indptr.ndim != 1:
raise ValueError("indices, and indptr should be 1-D")
if self.data.ndim != 3:
raise ValueError("data should be 3-D")
# check index pointer
if (len(self.indptr) != M//R + 1):
raise ValueError(
f"index pointer size ({len(self.indptr)}) should be ({M//R + 1})"
)
if (self.indptr[0] != 0):
raise ValueError("index pointer should start with 0")
# check index and data arrays
if (len(self.indices) != len(self.data)):
raise ValueError("indices and data should have the same size")
if (self.indptr[-1] > len(self.indices)):
raise ValueError("Last value of index pointer should be less than "
"the size of index and data arrays")
self.prune()
if full_check:
# check format validity (more expensive)
if self.nnz > 0:
if self.indices.max() >= N//C:
raise ValueError(
f"column index values must be < {N//C}"
f" (now max {self.indices.max()})"
)
if self.indices.min() < 0:
raise ValueError("column index values must be >= 0")
if np.diff(self.indptr).min() < 0:
raise ValueError("index pointer values must form a "
"non-decreasing sequence")
idx_dtype = self._get_index_dtype((self.indices, self.indptr))
self.indptr = np.asarray(self.indptr, dtype=idx_dtype)
self.indices = np.asarray(self.indices, dtype=idx_dtype)
self.data = to_native(self.data)
# if not self.has_sorted_indices():
# warn('Indices were not in sorted order. Sorting indices.')
# self.sort_indices(check_first=False)
@property
def blocksize(self) -> tuple:
"""Block size of the matrix."""
return self.data.shape[1:]
def _getnnz(self, axis=None):
if axis is not None:
raise NotImplementedError("_getnnz over an axis is not implemented "
"for BSR format")
R, C = self.blocksize
return int(self.indptr[-1]) * R * C
_getnnz.__doc__ = _spbase._getnnz.__doc__
def count_nonzero(self, axis=None):
if axis is not None:
raise NotImplementedError(
"count_nonzero over axis is not implemented for BSR format."
)
return np.count_nonzero(self._deduped_data())
count_nonzero.__doc__ = _spbase.count_nonzero.__doc__
def __repr__(self):
_, fmt = _formats[self.format]
sparse_cls = 'array' if isinstance(self, sparray) else 'matrix'
b = 'x'.join(str(x) for x in self.blocksize)
return (
f"<{fmt} sparse {sparse_cls} of dtype '{self.dtype}'\n"
f"\twith {self.nnz} stored elements (blocksize={b}) and shape {self.shape}>"
)
def diagonal(self, k=0):
rows, cols = self.shape
if k <= -rows or k >= cols:
return np.empty(0, dtype=self.data.dtype)
R, C = self.blocksize
y = np.zeros(min(rows + min(k, 0), cols - max(k, 0)),
dtype=upcast(self.dtype))
_sparsetools.bsr_diagonal(k, rows // R, cols // C, R, C,
self.indptr, self.indices,
np.ravel(self.data), y)
return y
diagonal.__doc__ = _spbase.diagonal.__doc__
##########################
# NotImplemented methods #
##########################
def __getitem__(self,key):
raise NotImplementedError
def __setitem__(self,key,val):
raise NotImplementedError
######################
# Arithmetic methods #
######################
def _add_dense(self, other):
return self.tocoo(copy=False)._add_dense(other)
def _matmul_vector(self, other):
M,N = self.shape
R,C = self.blocksize
result = np.zeros(self.shape[0], dtype=upcast(self.dtype, other.dtype))
bsr_matvec(M//R, N//C, R, C,
self.indptr, self.indices, self.data.ravel(),
other, result)
return result
def _matmul_multivector(self,other):
R,C = self.blocksize
M,N = self.shape
n_vecs = other.shape[1] # number of column vectors
result = np.zeros((M,n_vecs), dtype=upcast(self.dtype,other.dtype))
bsr_matvecs(M//R, N//C, n_vecs, R, C,
self.indptr, self.indices, self.data.ravel(),
other.ravel(), result.ravel())
return result
def _matmul_sparse(self, other):
M, K1 = self.shape
K2, N = other.shape
R,n = self.blocksize
# convert to this format
if other.format == "bsr":
C = other.blocksize[1]
else:
C = 1
if other.format == "csr" and n == 1:
other = other.tobsr(blocksize=(n,C), copy=False) # lightweight conversion
else:
other = other.tobsr(blocksize=(n,C))
idx_dtype = self._get_index_dtype((self.indptr, self.indices,
other.indptr, other.indices))
bnnz = csr_matmat_maxnnz(M//R, N//C,
self.indptr.astype(idx_dtype),
self.indices.astype(idx_dtype),
other.indptr.astype(idx_dtype),
other.indices.astype(idx_dtype))
idx_dtype = self._get_index_dtype((self.indptr, self.indices,
other.indptr, other.indices),
maxval=bnnz)
indptr = np.empty(self.indptr.shape, dtype=idx_dtype)
indices = np.empty(bnnz, dtype=idx_dtype)
data = np.empty(R*C*bnnz, dtype=upcast(self.dtype,other.dtype))
bsr_matmat(bnnz, M//R, N//C, R, C, n,
self.indptr.astype(idx_dtype),
self.indices.astype(idx_dtype),
np.ravel(self.data),
other.indptr.astype(idx_dtype),
other.indices.astype(idx_dtype),
np.ravel(other.data),
indptr,
indices,
data)
data = data.reshape(-1,R,C)
# TODO eliminate zeros
return self._bsr_container(
(data, indices, indptr), shape=(M, N), blocksize=(R, C)
)
######################
# Conversion methods #
######################
def tobsr(self, blocksize=None, copy=False):
"""Convert this array/matrix into Block Sparse Row Format.
With copy=False, the data/indices may be shared between this
array/matrix and the resultant bsr_array/bsr_matrix.
If blocksize=(R, C) is provided, it will be used for determining
block size of the bsr_array/bsr_matrix.
"""
if blocksize not in [None, self.blocksize]:
return self.tocsr().tobsr(blocksize=blocksize)
if copy:
return self.copy()
else:
return self
def tocsr(self, copy=False):
M, N = self.shape
R, C = self.blocksize
nnz = self.nnz
idx_dtype = self._get_index_dtype((self.indptr, self.indices),
maxval=max(nnz, N))
indptr = np.empty(M + 1, dtype=idx_dtype)
indices = np.empty(nnz, dtype=idx_dtype)
data = np.empty(nnz, dtype=upcast(self.dtype))
bsr_tocsr(M // R, # n_brow
N // C, # n_bcol
R, C,
self.indptr.astype(idx_dtype, copy=False),
self.indices.astype(idx_dtype, copy=False),
self.data,
indptr,
indices,
data)
return self._csr_container((data, indices, indptr), shape=self.shape)
tocsr.__doc__ = _spbase.tocsr.__doc__
def tocsc(self, copy=False):
return self.tocsr(copy=False).tocsc(copy=copy)
tocsc.__doc__ = _spbase.tocsc.__doc__
def tocoo(self, copy=True):
"""Convert this array/matrix to COOrdinate format.
When copy=False the data array will be shared between
this array/matrix and the resultant coo_array/coo_matrix.
"""
M,N = self.shape
R,C = self.blocksize
indptr_diff = np.diff(self.indptr)
if indptr_diff.dtype.itemsize > np.dtype(np.intp).itemsize:
# Check for potential overflow
indptr_diff_limited = indptr_diff.astype(np.intp)
if np.any(indptr_diff_limited != indptr_diff):
raise ValueError("Matrix too big to convert")
indptr_diff = indptr_diff_limited
idx_dtype = self._get_index_dtype(maxval=max(M, N))
row = (R * np.arange(M//R, dtype=idx_dtype)).repeat(indptr_diff)
row = row.repeat(R*C).reshape(-1,R,C)
row += np.tile(np.arange(R, dtype=idx_dtype).reshape(-1,1), (1,C))
row = row.reshape(-1)
col = ((C * self.indices).astype(idx_dtype, copy=False)
.repeat(R*C).reshape(-1,R,C))
col += np.tile(np.arange(C, dtype=idx_dtype), (R,1))
col = col.reshape(-1)
data = self.data.reshape(-1)
if copy:
data = data.copy()
return self._coo_container(
(data, (row, col)), shape=self.shape
)
def toarray(self, order=None, out=None):
return self.tocoo(copy=False).toarray(order=order, out=out)
toarray.__doc__ = _spbase.toarray.__doc__
def transpose(self, axes=None, copy=False):
if axes is not None and axes != (1, 0):
raise ValueError("Sparse matrices do not support "
"an 'axes' parameter because swapping "
"dimensions is the only logical permutation.")
R, C = self.blocksize
M, N = self.shape
NBLK = self.nnz//(R*C)
if self.nnz == 0:
return self._bsr_container((N, M), blocksize=(C, R),
dtype=self.dtype, copy=copy)
indptr = np.empty(N//C + 1, dtype=self.indptr.dtype)
indices = np.empty(NBLK, dtype=self.indices.dtype)
data = np.empty((NBLK, C, R), dtype=self.data.dtype)
bsr_transpose(M//R, N//C, R, C,
self.indptr, self.indices, self.data.ravel(),
indptr, indices, data.ravel())
return self._bsr_container((data, indices, indptr),
shape=(N, M), copy=copy)
transpose.__doc__ = _spbase.transpose.__doc__
##############################################################
# methods that examine or modify the internal data structure #
##############################################################
def eliminate_zeros(self):
"""Remove zero elements in-place."""
if not self.nnz:
return # nothing to do
R,C = self.blocksize
M,N = self.shape
mask = (self.data != 0).reshape(-1,R*C).sum(axis=1) # nonzero blocks
nonzero_blocks = mask.nonzero()[0]
self.data[:len(nonzero_blocks)] = self.data[nonzero_blocks]
# modifies self.indptr and self.indices *in place*
_sparsetools.csr_eliminate_zeros(M//R, N//C, self.indptr,
self.indices, mask)
self.prune()
def sum_duplicates(self):
"""Eliminate duplicate array/matrix entries by adding them together
The is an *in place* operation
"""
if self.has_canonical_format:
return
self.sort_indices()
R, C = self.blocksize
M, N = self.shape
# port of _sparsetools.csr_sum_duplicates
n_row = M // R
nnz = 0
row_end = 0
for i in range(n_row):
jj = row_end
row_end = self.indptr[i+1]
while jj < row_end:
j = self.indices[jj]
x = self.data[jj]
jj += 1
while jj < row_end and self.indices[jj] == j:
x += self.data[jj]
jj += 1
self.indices[nnz] = j
self.data[nnz] = x
nnz += 1
self.indptr[i+1] = nnz
self.prune() # nnz may have changed
self.has_canonical_format = True
def sort_indices(self):
"""Sort the indices of this array/matrix *in place*
"""
if self.has_sorted_indices:
return
R,C = self.blocksize
M,N = self.shape
bsr_sort_indices(M//R, N//C, R, C, self.indptr, self.indices, self.data.ravel())
self.has_sorted_indices = True
def prune(self):
"""Remove empty space after all non-zero elements.
"""
R,C = self.blocksize
M,N = self.shape
if len(self.indptr) != M//R + 1:
raise ValueError("index pointer has invalid length")
bnnz = self.indptr[-1]
if len(self.indices) < bnnz:
raise ValueError("indices array has too few elements")
if len(self.data) < bnnz:
raise ValueError("data array has too few elements")
self.data = self.data[:bnnz]
self.indices = self.indices[:bnnz]
# utility functions
def _binopt(self, other, op, in_shape=None, out_shape=None):
"""Apply the binary operation fn to two sparse matrices."""
# Ideally we'd take the GCDs of the blocksize dimensions
# and explode self and other to match.
other = self.__class__(other, blocksize=self.blocksize)
# e.g. bsr_plus_bsr, etc.
fn = getattr(_sparsetools, self.format + op + self.format)
R,C = self.blocksize
max_bnnz = len(self.data) + len(other.data)
idx_dtype = self._get_index_dtype((self.indptr, self.indices,
other.indptr, other.indices),
maxval=max_bnnz)
indptr = np.empty(self.indptr.shape, dtype=idx_dtype)
indices = np.empty(max_bnnz, dtype=idx_dtype)
bool_ops = ['_ne_', '_lt_', '_gt_', '_le_', '_ge_']
if op in bool_ops:
data = np.empty(R*C*max_bnnz, dtype=np.bool_)
else:
data = np.empty(R*C*max_bnnz, dtype=upcast(self.dtype,other.dtype))
fn(self.shape[0]//R, self.shape[1]//C, R, C,
self.indptr.astype(idx_dtype),
self.indices.astype(idx_dtype),
self.data,
other.indptr.astype(idx_dtype),
other.indices.astype(idx_dtype),
np.ravel(other.data),
indptr,
indices,
data)
actual_bnnz = indptr[-1]
indices = indices[:actual_bnnz]
data = data[:R*C*actual_bnnz]
if actual_bnnz < max_bnnz/2:
indices = indices.copy()
data = data.copy()
data = data.reshape(-1,R,C)
return self.__class__((data, indices, indptr), shape=self.shape)
# needed by _data_matrix
def _with_data(self,data,copy=True):
"""Returns a matrix with the same sparsity structure as self,
but with different data. By default the structure arrays
(i.e. .indptr and .indices) are copied.
"""
if copy:
return self.__class__((data,self.indices.copy(),self.indptr.copy()),
shape=self.shape,dtype=data.dtype)
else:
return self.__class__((data,self.indices,self.indptr),
shape=self.shape,dtype=data.dtype)
# # these functions are used by the parent class
# # to remove redundancy between bsc_matrix and bsr_matrix
# def _swap(self,x):
# """swap the members of x if this is a column-oriented matrix
# """
# return (x[0],x[1])
def _broadcast_to(self, shape, copy=False):
return _spbase._broadcast_to(self, shape, copy)
def isspmatrix_bsr(x):
"""Is `x` of a bsr_matrix type?
Parameters
----------
x
object to check for being a bsr matrix
Returns
-------
bool
True if `x` is a bsr matrix, False otherwise
Examples
--------
>>> from scipy.sparse import bsr_array, bsr_matrix, csr_matrix, isspmatrix_bsr
>>> isspmatrix_bsr(bsr_matrix([[5]]))
True
>>> isspmatrix_bsr(bsr_array([[5]]))
False
>>> isspmatrix_bsr(csr_matrix([[5]]))
False
"""
return isinstance(x, bsr_matrix)
# This namespace class separates array from matrix with isinstance
class bsr_array(_bsr_base, sparray):
"""
Block Sparse Row format sparse array.
This can be instantiated in several ways:
bsr_array(D, [blocksize=(R,C)])
where D is a 2-D ndarray.
bsr_array(S, [blocksize=(R,C)])
with another sparse array or matrix S (equivalent to S.tobsr())
bsr_array((M, N), [blocksize=(R,C), dtype])
to construct an empty sparse array with shape (M, N)
dtype is optional, defaulting to dtype='d'.
bsr_array((data, ij), [blocksize=(R,C), shape=(M, N)])
where ``data`` and ``ij`` satisfy ``a[ij[0, k], ij[1, k]] = data[k]``
bsr_array((data, indices, indptr), [shape=(M, N)])
is the standard BSR representation where the block column
indices for row i are stored in ``indices[indptr[i]:indptr[i+1]]``
and their corresponding block values are stored in
``data[ indptr[i]: indptr[i+1] ]``. If the shape parameter is not
supplied, the array dimensions are inferred from the index arrays.
Attributes
----------
dtype : dtype
Data type of the array
shape : 2-tuple
Shape of the array
ndim : int
Number of dimensions (this is always 2)
nnz
size
data
BSR format data array of the array
indices
BSR format index array of the array
indptr
BSR format index pointer array of the array
blocksize
Block size
has_sorted_indices : bool
Whether indices are sorted
has_canonical_format : bool
T
Notes
-----
Sparse arrays can be used in arithmetic operations: they support
addition, subtraction, multiplication, division, and matrix power.
**Summary of BSR format**
The Block Sparse Row (BSR) format is very similar to the Compressed
Sparse Row (CSR) format. BSR is appropriate for sparse matrices with dense
sub matrices like the last example below. Such sparse block matrices often
arise in vector-valued finite element discretizations. In such cases, BSR is
considerably more efficient than CSR and CSC for many sparse arithmetic
operations.
**Blocksize**
The blocksize (R,C) must evenly divide the shape of the sparse array (M,N).
That is, R and C must satisfy the relationship ``M % R = 0`` and
``N % C = 0``.
If no blocksize is specified, a simple heuristic is applied to determine
an appropriate blocksize.
**Canonical Format**
In canonical format, there are no duplicate blocks and indices are sorted
per row.
**Limitations**
Block Sparse Row format sparse arrays do not support slicing.
Examples
--------
>>> import numpy as np
>>> from scipy.sparse import bsr_array
>>> bsr_array((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)
>>> row = np.array([0, 0, 1, 2, 2, 2])
>>> col = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3 ,4, 5, 6])
>>> bsr_array((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6]).repeat(4).reshape(6, 2, 2)
>>> bsr_array((data,indices,indptr), shape=(6, 6)).toarray()
array([[1, 1, 0, 0, 2, 2],
[1, 1, 0, 0, 2, 2],
[0, 0, 0, 0, 3, 3],
[0, 0, 0, 0, 3, 3],
[4, 4, 5, 5, 6, 6],
[4, 4, 5, 5, 6, 6]])
"""
class bsr_matrix(spmatrix, _bsr_base):
"""
Block Sparse Row format sparse matrix.
This can be instantiated in several ways:
bsr_matrix(D, [blocksize=(R,C)])
where D is a 2-D ndarray.
bsr_matrix(S, [blocksize=(R,C)])
with another sparse array or matrix S (equivalent to S.tobsr())
bsr_matrix((M, N), [blocksize=(R,C), dtype])
to construct an empty sparse matrix with shape (M, N)
dtype is optional, defaulting to dtype='d'.
bsr_matrix((data, ij), [blocksize=(R,C), shape=(M, N)])
where ``data`` and ``ij`` satisfy ``a[ij[0, k], ij[1, k]] = data[k]``
bsr_matrix((data, indices, indptr), [shape=(M, N)])
is the standard BSR representation where the block column
indices for row i are stored in ``indices[indptr[i]:indptr[i+1]]``
and their corresponding block values are stored in
``data[ indptr[i]: indptr[i+1] ]``. If the shape parameter is not
supplied, the matrix dimensions are inferred from the index arrays.
Attributes
----------
dtype : dtype
Data type of the matrix
shape : 2-tuple
Shape of the matrix
ndim : int
Number of dimensions (this is always 2)
nnz
size
data
BSR format data array of the matrix
indices
BSR format index array of the matrix
indptr
BSR format index pointer array of the matrix
blocksize
Block size
has_sorted_indices : bool
Whether indices are sorted
has_canonical_format : bool
T
Notes
-----
Sparse matrices can be used in arithmetic operations: they support
addition, subtraction, multiplication, division, and matrix power.
**Summary of BSR format**
The Block Sparse Row (BSR) format is very similar to the Compressed
Sparse Row (CSR) format. BSR is appropriate for sparse matrices with dense
sub matrices like the last example below. Such sparse block matrices often
arise in vector-valued finite element discretizations. In such cases, BSR is
considerably more efficient than CSR and CSC for many sparse arithmetic
operations.
**Blocksize**
The blocksize (R,C) must evenly divide the shape of the sparse matrix (M,N).
That is, R and C must satisfy the relationship ``M % R = 0`` and
``N % C = 0``.
If no blocksize is specified, a simple heuristic is applied to determine
an appropriate blocksize.
**Canonical Format**
In canonical format, there are no duplicate blocks and indices are sorted
per row.
**Limitations**
Block Sparse Row format sparse matrices do not support slicing.
Examples
--------
>>> import numpy as np
>>> from scipy.sparse import bsr_matrix
>>> bsr_matrix((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)
>>> row = np.array([0, 0, 1, 2, 2, 2])
>>> col = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3 ,4, 5, 6])
>>> bsr_matrix((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6]).repeat(4).reshape(6, 2, 2)
>>> bsr_matrix((data,indices,indptr), shape=(6, 6)).toarray()
array([[1, 1, 0, 0, 2, 2],
[1, 1, 0, 0, 2, 2],
[0, 0, 0, 0, 3, 3],
[0, 0, 0, 0, 3, 3],
[4, 4, 5, 5, 6, 6],
[4, 4, 5, 5, 6, 6]])
"""

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,367 @@
"""Compressed Sparse Column matrix format"""
__docformat__ = "restructuredtext en"
__all__ = ['csc_array', 'csc_matrix', 'isspmatrix_csc']
import numpy as np
from ._matrix import spmatrix
from ._base import _spbase, sparray
from ._sparsetools import csr_tocsc, expandptr
from ._sputils import upcast
from ._compressed import _cs_matrix
class _csc_base(_cs_matrix):
_format = 'csc'
def transpose(self, axes=None, copy=False):
if axes is not None and axes != (1, 0):
raise ValueError("Sparse arrays/matrices do not support "
"an 'axes' parameter because swapping "
"dimensions is the only logical permutation.")
M, N = self.shape
return self._csr_container((self.data, self.indices,
self.indptr), (N, M), copy=copy)
transpose.__doc__ = _spbase.transpose.__doc__
def __iter__(self):
yield from self.tocsr()
def tocsc(self, copy=False):
if copy:
return self.copy()
else:
return self
tocsc.__doc__ = _spbase.tocsc.__doc__
def tocsr(self, copy=False):
M,N = self.shape
idx_dtype = self._get_index_dtype((self.indptr, self.indices),
maxval=max(self.nnz, N))
indptr = np.empty(M + 1, dtype=idx_dtype)
indices = np.empty(self.nnz, dtype=idx_dtype)
data = np.empty(self.nnz, dtype=upcast(self.dtype))
csr_tocsc(N, M,
self.indptr.astype(idx_dtype),
self.indices.astype(idx_dtype),
self.data,
indptr,
indices,
data)
A = self._csr_container(
(data, indices, indptr),
shape=self.shape, copy=False
)
A.has_sorted_indices = True
return A
tocsr.__doc__ = _spbase.tocsr.__doc__
def nonzero(self):
# CSC can't use _cs_matrix's .nonzero method because it
# returns the indices sorted for self transposed.
# Get row and col indices, from _cs_matrix.tocoo
major_dim, minor_dim = self._swap(self.shape)
minor_indices = self.indices
major_indices = np.empty(len(minor_indices), dtype=self.indices.dtype)
expandptr(major_dim, self.indptr, major_indices)
row, col = self._swap((major_indices, minor_indices))
# Remove explicit zeros
nz_mask = self.data != 0
row = row[nz_mask]
col = col[nz_mask]
# Sort them to be in C-style order
ind = np.argsort(row, kind='mergesort')
row = row[ind]
col = col[ind]
return row, col
nonzero.__doc__ = _cs_matrix.nonzero.__doc__
def _getrow(self, i):
"""Returns a copy of row i of the matrix, as a (1 x n)
CSR matrix (row vector).
"""
M, N = self.shape
i = int(i)
if i < 0:
i += M
if i < 0 or i >= M:
raise IndexError(f'index ({i}) out of range')
return self._get_submatrix(minor=i).tocsr()
def _getcol(self, i):
"""Returns a copy of column i of the matrix, as a (m x 1)
CSC matrix (column vector).
"""
M, N = self.shape
i = int(i)
if i < 0:
i += N
if i < 0 or i >= N:
raise IndexError(f'index ({i}) out of range')
return self._get_submatrix(major=i, copy=True)
def _get_intXarray(self, row, col):
return self._major_index_fancy(col)._get_submatrix(minor=row)
def _get_intXslice(self, row, col):
if col.step in (1, None):
return self._get_submatrix(major=col, minor=row, copy=True)
return self._major_slice(col)._get_submatrix(minor=row)
def _get_sliceXint(self, row, col):
if row.step in (1, None):
return self._get_submatrix(major=col, minor=row, copy=True)
return self._get_submatrix(major=col)._minor_slice(row)
def _get_sliceXarray(self, row, col):
return self._major_index_fancy(col)._minor_slice(row)
def _get_arrayXint(self, row, col):
res = self._get_submatrix(major=col)._minor_index_fancy(row)
if row.ndim > 1:
return res.reshape(row.shape)
return res
def _get_arrayXslice(self, row, col):
return self._major_slice(col)._minor_index_fancy(row)
# these functions are used by the parent class (_cs_matrix)
# to remove redundancy between csc_array and csr_matrix
@staticmethod
def _swap(x):
"""swap the members of x if this is a column-oriented matrix
"""
return x[1], x[0]
def isspmatrix_csc(x):
"""Is `x` of csc_matrix type?
Parameters
----------
x
object to check for being a csc matrix
Returns
-------
bool
True if `x` is a csc matrix, False otherwise
Examples
--------
>>> from scipy.sparse import csc_array, csc_matrix, coo_matrix, isspmatrix_csc
>>> isspmatrix_csc(csc_matrix([[5]]))
True
>>> isspmatrix_csc(csc_array([[5]]))
False
>>> isspmatrix_csc(coo_matrix([[5]]))
False
"""
return isinstance(x, csc_matrix)
# This namespace class separates array from matrix with isinstance
class csc_array(_csc_base, sparray):
"""
Compressed Sparse Column array.
This can be instantiated in several ways:
csc_array(D)
where D is a 2-D ndarray
csc_array(S)
with another sparse array or matrix S (equivalent to S.tocsc())
csc_array((M, N), [dtype])
to construct an empty array with shape (M, N)
dtype is optional, defaulting to dtype='d'.
csc_array((data, (row_ind, col_ind)), [shape=(M, N)])
where ``data``, ``row_ind`` and ``col_ind`` satisfy the
relationship ``a[row_ind[k], col_ind[k]] = data[k]``.
csc_array((data, indices, indptr), [shape=(M, N)])
is the standard CSC representation where the row indices for
column i are stored in ``indices[indptr[i]:indptr[i+1]]``
and their corresponding values are stored in
``data[indptr[i]:indptr[i+1]]``. If the shape parameter is
not supplied, the array dimensions are inferred from
the index arrays.
Attributes
----------
dtype : dtype
Data type of the array
shape : 2-tuple
Shape of the array
ndim : int
Number of dimensions (this is always 2)
nnz
size
data
CSC format data array of the array
indices
CSC format index array of the array
indptr
CSC format index pointer array of the array
has_sorted_indices
has_canonical_format
T
Notes
-----
Sparse arrays can be used in arithmetic operations: they support
addition, subtraction, multiplication, division, and matrix power.
Advantages of the CSC format
- efficient arithmetic operations CSC + CSC, CSC * CSC, etc.
- efficient column slicing
- fast matrix vector products (CSR, BSR may be faster)
Disadvantages of the CSC format
- slow row slicing operations (consider CSR)
- changes to the sparsity structure are expensive (consider LIL or DOK)
Canonical format
- Within each column, indices are sorted by row.
- There are no duplicate entries.
Examples
--------
>>> import numpy as np
>>> from scipy.sparse import csc_array
>>> csc_array((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)
>>> row = np.array([0, 2, 2, 0, 1, 2])
>>> col = np.array([0, 0, 1, 2, 2, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csc_array((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 4],
[0, 0, 5],
[2, 3, 6]])
>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csc_array((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 4],
[0, 0, 5],
[2, 3, 6]])
"""
class csc_matrix(spmatrix, _csc_base):
"""
Compressed Sparse Column matrix.
This can be instantiated in several ways:
csc_matrix(D)
where D is a 2-D ndarray
csc_matrix(S)
with another sparse array or matrix S (equivalent to S.tocsc())
csc_matrix((M, N), [dtype])
to construct an empty matrix with shape (M, N)
dtype is optional, defaulting to dtype='d'.
csc_matrix((data, (row_ind, col_ind)), [shape=(M, N)])
where ``data``, ``row_ind`` and ``col_ind`` satisfy the
relationship ``a[row_ind[k], col_ind[k]] = data[k]``.
csc_matrix((data, indices, indptr), [shape=(M, N)])
is the standard CSC representation where the row indices for
column i are stored in ``indices[indptr[i]:indptr[i+1]]``
and their corresponding values are stored in
``data[indptr[i]:indptr[i+1]]``. If the shape parameter is
not supplied, the matrix dimensions are inferred from
the index arrays.
Attributes
----------
dtype : dtype
Data type of the matrix
shape : 2-tuple
Shape of the matrix
ndim : int
Number of dimensions (this is always 2)
nnz
size
data
CSC format data array of the matrix
indices
CSC format index array of the matrix
indptr
CSC format index pointer array of the matrix
has_sorted_indices
has_canonical_format
T
Notes
-----
Sparse matrices can be used in arithmetic operations: they support
addition, subtraction, multiplication, division, and matrix power.
Advantages of the CSC format
- efficient arithmetic operations CSC + CSC, CSC * CSC, etc.
- efficient column slicing
- fast matrix vector products (CSR, BSR may be faster)
Disadvantages of the CSC format
- slow row slicing operations (consider CSR)
- changes to the sparsity structure are expensive (consider LIL or DOK)
Canonical format
- Within each column, indices are sorted by row.
- There are no duplicate entries.
Examples
--------
>>> import numpy as np
>>> from scipy.sparse import csc_matrix
>>> csc_matrix((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)
>>> row = np.array([0, 2, 2, 0, 1, 2])
>>> col = np.array([0, 0, 1, 2, 2, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csc_matrix((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 4],
[0, 0, 5],
[2, 3, 6]])
>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csc_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 4],
[0, 0, 5],
[2, 3, 6]])
"""

View file

@ -0,0 +1,558 @@
"""Compressed Sparse Row matrix format"""
__docformat__ = "restructuredtext en"
__all__ = ['csr_array', 'csr_matrix', 'isspmatrix_csr']
import numpy as np
from ._matrix import spmatrix
from ._base import _spbase, sparray
from ._sparsetools import (csr_tocsc, csr_tobsr, csr_count_blocks,
get_csr_submatrix, csr_sample_values)
from ._sputils import upcast
from ._compressed import _cs_matrix
class _csr_base(_cs_matrix):
_format = 'csr'
_allow_nd = (1, 2)
def transpose(self, axes=None, copy=False):
if axes is not None and axes != (1, 0):
raise ValueError("Sparse arrays/matrices do not support "
"an 'axes' parameter because swapping "
"dimensions is the only logical permutation.")
if self.ndim == 1:
return self.copy() if copy else self
M, N = self.shape
return self._csc_container((self.data, self.indices,
self.indptr), shape=(N, M), copy=copy)
transpose.__doc__ = _spbase.transpose.__doc__
def tolil(self, copy=False):
if self.ndim != 2:
raise ValueError("Cannot convert a 1d sparse array to lil format")
lil = self._lil_container(self.shape, dtype=self.dtype)
self.sum_duplicates()
ptr,ind,dat = self.indptr,self.indices,self.data
rows, data = lil.rows, lil.data
for n in range(self.shape[0]):
start = ptr[n]
end = ptr[n+1]
rows[n] = ind[start:end].tolist()
data[n] = dat[start:end].tolist()
return lil
tolil.__doc__ = _spbase.tolil.__doc__
def tocsr(self, copy=False):
if copy:
return self.copy()
else:
return self
tocsr.__doc__ = _spbase.tocsr.__doc__
def tocsc(self, copy=False):
if self.ndim != 2:
raise ValueError("Cannot convert a 1d sparse array to csc format")
M, N = self.shape
idx_dtype = self._get_index_dtype((self.indptr, self.indices),
maxval=max(self.nnz, M))
indptr = np.empty(N + 1, dtype=idx_dtype)
indices = np.empty(self.nnz, dtype=idx_dtype)
data = np.empty(self.nnz, dtype=upcast(self.dtype))
csr_tocsc(M, N,
self.indptr.astype(idx_dtype),
self.indices.astype(idx_dtype),
self.data,
indptr,
indices,
data)
A = self._csc_container((data, indices, indptr), shape=self.shape)
A.has_sorted_indices = True
return A
tocsc.__doc__ = _spbase.tocsc.__doc__
def tobsr(self, blocksize=None, copy=True):
if self.ndim != 2:
raise ValueError("Cannot convert a 1d sparse array to bsr format")
if blocksize is None:
from ._spfuncs import estimate_blocksize
return self.tobsr(blocksize=estimate_blocksize(self))
elif blocksize == (1,1):
arg1 = (self.data.reshape(-1,1,1),self.indices,self.indptr)
return self._bsr_container(arg1, shape=self.shape, copy=copy)
else:
R,C = blocksize
M,N = self.shape
if R < 1 or C < 1 or M % R != 0 or N % C != 0:
raise ValueError(f'invalid blocksize {blocksize}')
blks = csr_count_blocks(M,N,R,C,self.indptr,self.indices)
idx_dtype = self._get_index_dtype((self.indptr, self.indices),
maxval=max(N//C, blks))
indptr = np.empty(M//R+1, dtype=idx_dtype)
indices = np.empty(blks, dtype=idx_dtype)
data = np.zeros((blks,R,C), dtype=self.dtype)
csr_tobsr(M, N, R, C,
self.indptr.astype(idx_dtype),
self.indices.astype(idx_dtype),
self.data,
indptr, indices, data.ravel())
return self._bsr_container(
(data, indices, indptr), shape=self.shape
)
tobsr.__doc__ = _spbase.tobsr.__doc__
# these functions are used by the parent class (_cs_matrix)
# to remove redundancy between csc_matrix and csr_array
@staticmethod
def _swap(x):
"""swap the members of x if this is a column-oriented matrix
"""
return x
def __iter__(self):
if self.ndim == 1:
zero = self.dtype.type(0)
u = 0
for v, d in zip(self.indices, self.data):
for _ in range(v - u):
yield zero
yield d
u = v + 1
for _ in range(self.shape[0] - u):
yield zero
return
indptr = np.zeros(2, dtype=self.indptr.dtype)
# return 1d (sparray) or 2drow (spmatrix)
shape = self.shape[1:] if isinstance(self, sparray) else (1, self.shape[1])
i0 = 0
for i1 in self.indptr[1:]:
indptr[1] = i1 - i0
indices = self.indices[i0:i1]
data = self.data[i0:i1]
yield self.__class__((data, indices, indptr), shape=shape, copy=True)
i0 = i1
def _getrow(self, i):
"""Returns a copy of row i of the matrix, as a (1 x n)
CSR matrix (row vector).
"""
if self.ndim == 1:
if i not in (0, -1):
raise IndexError(f'index ({i}) out of range')
return self.reshape((1, self.shape[0]), copy=True)
M, N = self.shape
i = int(i)
if i < 0:
i += M
if i < 0 or i >= M:
raise IndexError(f'index ({i}) out of range')
indptr, indices, data = get_csr_submatrix(
M, N, self.indptr, self.indices, self.data, i, i + 1, 0, N)
return self.__class__((data, indices, indptr), shape=(1, N),
dtype=self.dtype, copy=False)
def _getcol(self, i):
"""Returns a copy of column i. A (m x 1) sparse array (column vector).
"""
if self.ndim == 1:
raise ValueError("getcol not provided for 1d arrays. Use indexing A[j]")
M, N = self.shape
i = int(i)
if i < 0:
i += N
if i < 0 or i >= N:
raise IndexError(f'index ({i}) out of range')
indptr, indices, data = get_csr_submatrix(
M, N, self.indptr, self.indices, self.data, 0, M, i, i + 1)
return self.__class__((data, indices, indptr), shape=(M, 1),
dtype=self.dtype, copy=False)
def _get_int(self, idx):
spot = np.flatnonzero(self.indices == idx)
if spot.size:
return self.data[spot[0]]
return self.data.dtype.type(0)
def _get_slice(self, idx):
if idx == slice(None):
return self.copy()
if idx.step in (1, None):
ret = self._get_submatrix(0, idx, copy=True)
return ret.reshape(ret.shape[-1])
return self._minor_slice(idx)
def _get_array(self, idx):
idx_dtype = self._get_index_dtype(self.indices)
idx = np.asarray(idx, dtype=idx_dtype)
if idx.size == 0:
return self.__class__([], dtype=self.dtype)
M, N = 1, self.shape[0]
row = np.zeros_like(idx, dtype=idx_dtype)
col = np.asarray(idx, dtype=idx_dtype)
val = np.empty(row.size, dtype=self.dtype)
csr_sample_values(M, N, self.indptr, self.indices, self.data,
row.size, row, col, val)
new_shape = col.shape if col.shape[0] > 1 else (col.shape[0],)
return self.__class__(val.reshape(new_shape))
def _get_intXarray(self, row, col):
return self._getrow(row)._minor_index_fancy(col)
def _get_intXslice(self, row, col):
if col.step in (1, None):
return self._get_submatrix(row, col, copy=True)
# TODO: uncomment this once it's faster:
# return self._getrow(row)._minor_slice(col)
M, N = self.shape
start, stop, stride = col.indices(N)
ii, jj = self.indptr[row:row+2]
row_indices = self.indices[ii:jj]
row_data = self.data[ii:jj]
if stride > 0:
ind = (row_indices >= start) & (row_indices < stop)
else:
ind = (row_indices <= start) & (row_indices > stop)
if abs(stride) > 1:
ind &= (row_indices - start) % stride == 0
row_indices = (row_indices[ind] - start) // stride
row_data = row_data[ind]
row_indptr = np.array([0, len(row_indices)])
if stride < 0:
row_data = row_data[::-1]
row_indices = abs(row_indices[::-1])
shape = (1, max(0, int(np.ceil(float(stop - start) / stride))))
return self.__class__((row_data, row_indices, row_indptr), shape=shape,
dtype=self.dtype, copy=False)
def _get_sliceXint(self, row, col):
if row.step in (1, None):
return self._get_submatrix(row, col, copy=True)
return self._major_slice(row)._get_submatrix(minor=col)
def _get_sliceXarray(self, row, col):
return self._major_slice(row)._minor_index_fancy(col)
def _get_arrayXint(self, row, col):
res = self._major_index_fancy(row)._get_submatrix(minor=col)
if row.ndim > 1:
return res.reshape(row.shape)
return res
def _get_arrayXslice(self, row, col):
if col.step not in (1, None):
col = np.arange(*col.indices(self.shape[1]))
return self._get_arrayXarray(row, col)
return self._major_index_fancy(row)._get_submatrix(minor=col)
def _set_int(self, idx, x):
self._set_many(0, idx, x)
def _set_array(self, idx, x):
x = np.broadcast_to(x, idx.shape)
self._set_many(np.zeros_like(idx), idx, x)
def isspmatrix_csr(x):
"""Is `x` of csr_matrix type?
Parameters
----------
x
object to check for being a csr matrix
Returns
-------
bool
True if `x` is a csr matrix, False otherwise
Examples
--------
>>> from scipy.sparse import csr_array, csr_matrix, coo_matrix, isspmatrix_csr
>>> isspmatrix_csr(csr_matrix([[5]]))
True
>>> isspmatrix_csr(csr_array([[5]]))
False
>>> isspmatrix_csr(coo_matrix([[5]]))
False
"""
return isinstance(x, csr_matrix)
# This namespace class separates array from matrix with isinstance
class csr_array(_csr_base, sparray):
"""
Compressed Sparse Row array.
This can be instantiated in several ways:
csr_array(D)
where D is a 2-D ndarray
csr_array(S)
with another sparse array or matrix S (equivalent to S.tocsr())
csr_array((M, N), [dtype])
to construct an empty array with shape (M, N)
dtype is optional, defaulting to dtype='d'.
csr_array((data, (row_ind, col_ind)), [shape=(M, N)])
where ``data``, ``row_ind`` and ``col_ind`` satisfy the
relationship ``a[row_ind[k], col_ind[k]] = data[k]``.
csr_array((data, indices, indptr), [shape=(M, N)])
is the standard CSR representation where the column indices for
row i are stored in ``indices[indptr[i]:indptr[i+1]]`` and their
corresponding values are stored in ``data[indptr[i]:indptr[i+1]]``.
If the shape parameter is not supplied, the array dimensions
are inferred from the index arrays.
Attributes
----------
dtype : dtype
Data type of the array
shape : 2-tuple
Shape of the array
ndim : int
Number of dimensions (this is always 2)
nnz
size
data
CSR format data array of the array
indices
CSR format index array of the array
indptr
CSR format index pointer array of the array
has_sorted_indices
has_canonical_format
T
Notes
-----
Sparse arrays can be used in arithmetic operations: they support
addition, subtraction, multiplication, division, and matrix power.
Advantages of the CSR format
- efficient arithmetic operations CSR + CSR, CSR * CSR, etc.
- efficient row slicing
- fast matrix vector products
Disadvantages of the CSR format
- slow column slicing operations (consider CSC)
- changes to the sparsity structure are expensive (consider LIL or DOK)
Canonical Format
- Within each row, indices are sorted by column.
- There are no duplicate entries.
Examples
--------
>>> import numpy as np
>>> from scipy.sparse import csr_array
>>> csr_array((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)
>>> row = np.array([0, 0, 1, 2, 2, 2])
>>> col = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_array((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_array((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
Duplicate entries are summed together:
>>> row = np.array([0, 1, 2, 0])
>>> col = np.array([0, 1, 1, 0])
>>> data = np.array([1, 2, 4, 8])
>>> csr_array((data, (row, col)), shape=(3, 3)).toarray()
array([[9, 0, 0],
[0, 2, 0],
[0, 4, 0]])
As an example of how to construct a CSR array incrementally,
the following snippet builds a term-document array from texts:
>>> docs = [["hello", "world", "hello"], ["goodbye", "cruel", "world"]]
>>> indptr = [0]
>>> indices = []
>>> data = []
>>> vocabulary = {}
>>> for d in docs:
... for term in d:
... index = vocabulary.setdefault(term, len(vocabulary))
... indices.append(index)
... data.append(1)
... indptr.append(len(indices))
...
>>> csr_array((data, indices, indptr), dtype=int).toarray()
array([[2, 1, 0, 0],
[0, 1, 1, 1]])
"""
class csr_matrix(spmatrix, _csr_base):
"""
Compressed Sparse Row matrix.
This can be instantiated in several ways:
csr_matrix(D)
where D is a 2-D ndarray
csr_matrix(S)
with another sparse array or matrix S (equivalent to S.tocsr())
csr_matrix((M, N), [dtype])
to construct an empty matrix with shape (M, N)
dtype is optional, defaulting to dtype='d'.
csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)])
where ``data``, ``row_ind`` and ``col_ind`` satisfy the
relationship ``a[row_ind[k], col_ind[k]] = data[k]``.
csr_matrix((data, indices, indptr), [shape=(M, N)])
is the standard CSR representation where the column indices for
row i are stored in ``indices[indptr[i]:indptr[i+1]]`` and their
corresponding values are stored in ``data[indptr[i]:indptr[i+1]]``.
If the shape parameter is not supplied, the matrix dimensions
are inferred from the index arrays.
Attributes
----------
dtype : dtype
Data type of the matrix
shape : 2-tuple
Shape of the matrix
ndim : int
Number of dimensions (this is always 2)
nnz
size
data
CSR format data array of the matrix
indices
CSR format index array of the matrix
indptr
CSR format index pointer array of the matrix
has_sorted_indices
has_canonical_format
T
Notes
-----
Sparse matrices can be used in arithmetic operations: they support
addition, subtraction, multiplication, division, and matrix power.
Advantages of the CSR format
- efficient arithmetic operations CSR + CSR, CSR * CSR, etc.
- efficient row slicing
- fast matrix vector products
Disadvantages of the CSR format
- slow column slicing operations (consider CSC)
- changes to the sparsity structure are expensive (consider LIL or DOK)
Canonical Format
- Within each row, indices are sorted by column.
- There are no duplicate entries.
Examples
--------
>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> csr_matrix((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)
>>> row = np.array([0, 0, 1, 2, 2, 2])
>>> col = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
Duplicate entries are summed together:
>>> row = np.array([0, 1, 2, 0])
>>> col = np.array([0, 1, 1, 0])
>>> data = np.array([1, 2, 4, 8])
>>> csr_matrix((data, (row, col)), shape=(3, 3)).toarray()
array([[9, 0, 0],
[0, 2, 0],
[0, 4, 0]])
As an example of how to construct a CSR matrix incrementally,
the following snippet builds a term-document matrix from texts:
>>> docs = [["hello", "world", "hello"], ["goodbye", "cruel", "world"]]
>>> indptr = [0]
>>> indices = []
>>> data = []
>>> vocabulary = {}
>>> for d in docs:
... for term in d:
... index = vocabulary.setdefault(term, len(vocabulary))
... indices.append(index)
... data.append(1)
... indptr.append(len(indices))
...
>>> csr_matrix((data, indices, indptr), dtype=int).toarray()
array([[2, 1, 0, 0],
[0, 1, 1, 1]])
"""

View file

@ -0,0 +1,569 @@
"""Base class for sparse matrice with a .data attribute
subclasses must provide a _with_data() method that
creates a new matrix with the same sparsity pattern
as self but with a different data array
"""
import math
import numpy as np
from ._base import _spbase, sparray, _ufuncs_with_fixed_point_at_zero
from ._sputils import isscalarlike, validateaxis
__all__ = []
# TODO implement all relevant operations
# use .data.__methods__() instead of /=, *=, etc.
class _data_matrix(_spbase):
def __init__(self, arg1, *, maxprint=None):
_spbase.__init__(self, arg1, maxprint=maxprint)
@property
def dtype(self):
return self.data.dtype
@dtype.setter
def dtype(self, newtype):
self.data.dtype = newtype
def _deduped_data(self):
if hasattr(self, 'sum_duplicates'):
self.sum_duplicates()
return self.data
def __abs__(self):
return self._with_data(abs(self._deduped_data()))
def __round__(self, ndigits=0):
return self._with_data(np.around(self._deduped_data(), decimals=ndigits))
def _real(self):
return self._with_data(self.data.real)
def _imag(self):
return self._with_data(self.data.imag)
def __neg__(self):
if self.dtype.kind == 'b':
raise NotImplementedError('negating a boolean sparse array is not '
'supported')
return self._with_data(-self.data)
def __imul__(self, other): # self *= other
if isscalarlike(other):
self.data *= other
return self
return NotImplemented
def __itruediv__(self, other): # self /= other
if isscalarlike(other):
recip = 1.0 / other
self.data *= recip
return self
else:
return NotImplemented
def astype(self, dtype, casting='unsafe', copy=True):
dtype = np.dtype(dtype)
if self.dtype != dtype:
matrix = self._with_data(
self.data.astype(dtype, casting=casting, copy=True),
copy=True
)
return matrix._with_data(matrix._deduped_data(), copy=False)
elif copy:
return self.copy()
else:
return self
astype.__doc__ = _spbase.astype.__doc__
def conjugate(self, copy=True):
if np.issubdtype(self.dtype, np.complexfloating):
return self._with_data(self.data.conjugate(), copy=copy)
elif copy:
return self.copy()
else:
return self
conjugate.__doc__ = _spbase.conjugate.__doc__
def copy(self):
return self._with_data(self.data.copy(), copy=True)
copy.__doc__ = _spbase.copy.__doc__
def power(self, n, dtype=None):
"""
This function performs element-wise power.
Parameters
----------
n : scalar
n is a non-zero scalar (nonzero avoids dense ones creation)
If zero power is desired, special case it to use `np.ones`
dtype : If dtype is not specified, the current dtype will be preserved.
Raises
------
NotImplementedError : if n is a zero scalar
If zero power is desired, special case it to use
``np.ones(A.shape, dtype=A.dtype)``
"""
if not isscalarlike(n):
raise NotImplementedError("input is not scalar")
if not n:
raise NotImplementedError(
"zero power is not supported as it would densify the matrix.\n"
"Use `np.ones(A.shape, dtype=A.dtype)` for this case."
)
data = self._deduped_data()
if dtype is not None:
data = data.astype(dtype, copy=False)
return self._with_data(data ** n)
###########################
# Multiplication handlers #
###########################
def _mul_scalar(self, other):
return self._with_data(self.data * other)
# Add the numpy unary ufuncs for which func(0) = 0 to _data_matrix.
for npfunc in _ufuncs_with_fixed_point_at_zero:
name = npfunc.__name__
def _create_method(op):
def method(self):
result = op(self._deduped_data())
return self._with_data(result, copy=True)
method.__doc__ = (f"Element-wise {name}.\n\n"
f"See `numpy.{name}` for more information.")
method.__name__ = name
return method
setattr(_data_matrix, name, _create_method(npfunc))
def _find_missing_index(ind, n):
for k, a in enumerate(ind):
if k != a:
return k
k += 1
if k < n:
return k
else:
return -1
class _minmax_mixin:
"""Mixin for min and max methods.
These are not implemented for dia_matrix, hence the separate class.
"""
def _min_or_max_axis(self, axis, min_or_max, explicit):
# already checked that self.shape[axis] is not zero
N = self.shape[axis]
M = self.shape[1 - axis]
idx_dtype = self._get_index_dtype(maxval=M)
mat = self.tocsc() if axis == 0 else self.tocsr()
mat.sum_duplicates()
major_index, value = mat._minor_reduce(min_or_max)
if not explicit:
not_full = np.diff(mat.indptr)[major_index] < N
value[not_full] = min_or_max(value[not_full], 0)
mask = value != 0
major_index = np.compress(mask, major_index).astype(idx_dtype, copy=False)
value = np.compress(mask, value)
if isinstance(self, sparray):
coords = (major_index,)
shape = (M,)
return self._coo_container((value, coords), shape=shape, dtype=self.dtype)
if axis == 0:
return self._coo_container(
(value, (np.zeros(len(value), dtype=idx_dtype), major_index)),
dtype=self.dtype, shape=(1, M)
)
else:
return self._coo_container(
(value, (major_index, np.zeros(len(value), dtype=idx_dtype))),
dtype=self.dtype, shape=(M, 1)
)
def _min_or_max(self, axis, out, min_or_max, explicit):
if out is not None:
raise ValueError("Sparse min/max does not support an 'out' parameter.")
axis = validateaxis(axis, ndim=self.ndim)
if axis is None:
if 0 in self.shape:
raise ValueError("zero-size array to reduction operation")
zero = self.dtype.type(0)
if self.nnz == 0:
return zero
m = min_or_max.reduce(self._deduped_data().ravel())
if self.nnz != math.prod(self.shape) and not explicit:
m = min_or_max(zero, m)
return m
if any(self.shape[d] == 0 for d in axis):
raise ValueError("zero-size array to reduction operation")
if self.ndim == 2:
# note: 2D ensures that len(axis)==1 so we pass in the int axis[0]
return self._min_or_max_axis(axis[0], min_or_max, explicit)
return self._min_or_max_axis_nd(axis, min_or_max, explicit)
def _argminmax_axis(self, axis, argminmax, compare, explicit):
zero = self.dtype.type(0)
mat = self.tocsc() if axis == 0 else self.tocsr()
mat.sum_duplicates()
ret_size, line_size = mat._swap(mat.shape)
ret = np.zeros(ret_size, dtype=int)
nz_lines, = np.nonzero(np.diff(mat.indptr))
for i in nz_lines:
p, q = mat.indptr[i:i + 2]
data = mat.data[p:q]
indices = mat.indices[p:q]
extreme_index = argminmax(data)
extreme_value = data[extreme_index]
if explicit:
if q - p > 0:
ret[i] = indices[extreme_index]
else:
if compare(extreme_value, zero) or q - p == line_size:
ret[i] = indices[extreme_index]
else:
zero_ind = _find_missing_index(indices, line_size)
if extreme_value == zero:
ret[i] = min(extreme_index, zero_ind)
else:
ret[i] = zero_ind
if isinstance(self, sparray):
return ret
if axis == 1:
ret = ret.reshape(-1, 1)
return self._ascontainer(ret)
def _argminmax(self, axis, out, argminmax, compare, explicit):
if out is not None:
minmax = "argmin" if argminmax == np.argmin else "argmax"
raise ValueError(f"Sparse {minmax} does not support an 'out' parameter.")
axis = validateaxis(axis, ndim=self.ndim)
if axis is not None:
if any(self.shape[i] == 0 for i in axis):
minmax = "argmin" if argminmax == np.argmin else "argmax"
raise ValueError(f"Cannot apply {minmax} along a zero-sized dimension.")
if self.ndim == 2:
# note: 2D ensures that len(axis)==1 so we pass in the int axis[0]
return self._argminmax_axis(axis[0], argminmax, compare, explicit)
return self._argminmax_axis_nd(axis, argminmax, compare, explicit)
if 0 in self.shape:
minmax = "argmin" if argminmax == np.argmin else "argmax"
raise ValueError(f"Cannot apply {minmax} to an empty matrix.")
if self.nnz == 0:
if explicit:
minmax = "argmin" if argminmax == np.argmin else "argmax"
raise ValueError(f"Cannot apply {minmax} to zero matrix "
"when explicit=True.")
return 0
zero = self.dtype.type(0)
mat = self.tocoo()
# Convert to canonical form: no duplicates, sorted indices.
mat.sum_duplicates()
extreme_index = argminmax(mat.data)
if explicit:
return extreme_index
extreme_value = mat.data[extreme_index]
if mat.ndim > 2:
mat = mat.reshape(-1)
# If the min value is less than zero, or max is greater than zero,
# then we do not need to worry about implicit zeros.
# And we use a "cheap test" for the rare case of no implicit zeros.
maxnnz = math.prod(self.shape)
if compare(extreme_value, zero) or mat.nnz == maxnnz:
# cast to Python int to avoid overflow and RuntimeError
if mat.ndim == 1: # includes nD case that was reshaped above
return int(mat.col[extreme_index])
# ndim == 2
num_col = mat.shape[-1]
return int(mat.row[extreme_index]) * num_col + int(mat.col[extreme_index])
# At this stage, any implicit zero could be the min or max value.
# After sum_duplicates(), the `row` and `col` arrays are guaranteed to
# be sorted in C-order, which means the linearized indices are sorted.
if mat.ndim == 1: # includes nD case that was reshaped above
linear_indices = mat.coords[-1]
else: # ndim == 2
num_col = mat.shape[-1]
linear_indices = mat.row * num_col + mat.col
first_implicit_zero_index = _find_missing_index(linear_indices, maxnnz)
if extreme_value == zero:
return min(first_implicit_zero_index, extreme_index)
return first_implicit_zero_index
def max(self, axis=None, out=None, *, explicit=False):
"""Return the maximum of the array/matrix or maximum along an axis.
By default, all elements are taken into account, not just the non-zero ones.
But with `explicit` set, only the stored elements are considered.
Parameters
----------
axis : {-2, -1, 0, 1, None} optional
Axis along which the sum is computed. The default is to
compute the maximum over all elements, returning
a scalar (i.e., `axis` = `None`).
out : None, optional
This argument is in the signature *solely* for NumPy
compatibility reasons. Do not pass in anything except
for the default value, as this argument is not used.
explicit : {False, True} optional (default: False)
When set to True, only the stored elements will be considered.
If a row/column is empty, the sparse.coo_array returned
has no stored element (i.e. an implicit zero) for that row/column.
.. versionadded:: 1.15.0
Returns
-------
amax : coo_array or scalar
Maximum of `a`. If `axis` is None, the result is a scalar value.
If `axis` is given, the result is a sparse.coo_array of dimension
``a.ndim - 1``.
See Also
--------
min : The minimum value of a sparse array/matrix along a given axis.
numpy.max : NumPy's implementation of 'max'
"""
return self._min_or_max(axis, out, np.maximum, explicit)
def min(self, axis=None, out=None, *, explicit=False):
"""Return the minimum of the array/matrix or maximum along an axis.
By default, all elements are taken into account, not just the non-zero ones.
But with `explicit` set, only the stored elements are considered.
Parameters
----------
axis : {-2, -1, 0, 1, None} optional
Axis along which the sum is computed. The default is to
compute the minimum over all elements, returning
a scalar (i.e., `axis` = `None`).
out : None, optional
This argument is in the signature *solely* for NumPy
compatibility reasons. Do not pass in anything except for
the default value, as this argument is not used.
explicit : {False, True} optional (default: False)
When set to True, only the stored elements will be considered.
If a row/column is empty, the sparse.coo_array returned
has no stored element (i.e. an implicit zero) for that row/column.
.. versionadded:: 1.15.0
Returns
-------
amin : coo_matrix or scalar
Minimum of `a`. If `axis` is None, the result is a scalar value.
If `axis` is given, the result is a sparse.coo_array of dimension
``a.ndim - 1``.
See Also
--------
max : The maximum value of a sparse array/matrix along a given axis.
numpy.min : NumPy's implementation of 'min'
"""
return self._min_or_max(axis, out, np.minimum, explicit)
def nanmax(self, axis=None, out=None, *, explicit=False):
"""Return the maximum, ignoring any Nans, along an axis.
Return the maximum, ignoring any Nans, of the array/matrix along an axis.
By default this takes all elements into account, but with `explicit` set,
only stored elements are considered.
.. versionadded:: 1.11.0
Parameters
----------
axis : {-2, -1, 0, 1, None} optional
Axis along which the maximum is computed. The default is to
compute the maximum over all elements, returning
a scalar (i.e., `axis` = `None`).
out : None, optional
This argument is in the signature *solely* for NumPy
compatibility reasons. Do not pass in anything except
for the default value, as this argument is not used.
explicit : {False, True} optional (default: False)
When set to True, only the stored elements will be considered.
If a row/column is empty, the sparse.coo_array returned
has no stored element (i.e. an implicit zero) for that row/column.
.. versionadded:: 1.15.0
Returns
-------
amax : coo_array or scalar
Maximum of `a`. If `axis` is None, the result is a scalar value.
If `axis` is given, the result is a sparse.coo_array of dimension
``a.ndim - 1``.
See Also
--------
nanmin : The minimum value of a sparse array/matrix along a given axis,
ignoring NaNs.
max : The maximum value of a sparse array/matrix along a given axis,
propagating NaNs.
numpy.nanmax : NumPy's implementation of 'nanmax'.
"""
return self._min_or_max(axis, out, np.fmax, explicit)
def nanmin(self, axis=None, out=None, *, explicit=False):
"""Return the minimum, ignoring any Nans, along an axis.
Return the minimum, ignoring any Nans, of the array/matrix along an axis.
By default this takes all elements into account, but with `explicit` set,
only stored elements are considered.
.. versionadded:: 1.11.0
Parameters
----------
axis : {-2, -1, 0, 1, None} optional
Axis along which the minimum is computed. The default is to
compute the minimum over all elements, returning
a scalar (i.e., `axis` = `None`).
out : None, optional
This argument is in the signature *solely* for NumPy
compatibility reasons. Do not pass in anything except for
the default value, as this argument is not used.
explicit : {False, True} optional (default: False)
When set to True, only the stored elements will be considered.
If a row/column is empty, the sparse.coo_array returned
has no stored element (i.e. an implicit zero) for that row/column.
.. versionadded:: 1.15.0
Returns
-------
amin : coo_array or scalar
Minimum of `a`. If `axis` is None, the result is a scalar value.
If `axis` is given, the result is a sparse.coo_array of dimension
``a.ndim - 1``.
See Also
--------
nanmax : The maximum value of a sparse array/matrix along a given axis,
ignoring NaNs.
min : The minimum value of a sparse array/matrix along a given axis,
propagating NaNs.
numpy.nanmin : NumPy's implementation of 'nanmin'.
"""
return self._min_or_max(axis, out, np.fmin, explicit)
def argmax(self, axis=None, out=None, *, explicit=False):
"""Return indices of maximum elements along an axis.
By default, implicit zero elements are taken into account. If there are
several minimum values, the index of the first occurrence is returned.
If `explicit` is set, only explicitly stored elements will be considered.
Parameters
----------
axis : {-2, -1, 0, 1, None}, optional
Axis along which the argmax is computed. If None (default), index
of the maximum element in the flatten data is returned.
out : None, optional
This argument is in the signature *solely* for NumPy
compatibility reasons. Do not pass in anything except for
the default value, as this argument is not used.
explicit : {False, True} optional (default: False)
When set to True, only explicitly stored elements will be considered.
If axis is not None and an axis has no stored elements, argmax
is undefined, so the index ``0`` is returned for that row/column.
.. versionadded:: 1.15.0
Returns
-------
ind : numpy.matrix or int
Indices of maximum elements. If matrix, its size along `axis` is 1.
"""
return self._argminmax(axis, out, np.argmax, np.greater, explicit)
def argmin(self, axis=None, out=None, *, explicit=False):
"""Return indices of minimum elements along an axis.
By default, implicit zero elements are taken into account. If there are
several minimum values, the index of the first occurrence is returned.
If `explicit` is set, only explicitly stored elements will be considered.
Parameters
----------
axis : {-2, -1, 0, 1, None}, optional
Axis along which the argmin is computed. If None (default), index
of the minimum element in the flatten data is returned.
out : None, optional
This argument is in the signature *solely* for NumPy
compatibility reasons. Do not pass in anything except for
the default value, as this argument is not used.
explicit : {False, True} optional (default: False)
When set to True, only explicitly stored elements will be considered.
If axis is not None and an axis has no stored elements, argmin
is undefined, so the index ``0`` is returned for that row/column.
.. versionadded:: 1.15.0
Returns
-------
ind : numpy.matrix or int
Indices of minimum elements. If matrix, its size along `axis` is 1.
"""
return self._argminmax(axis, out, np.argmin, np.less, explicit)

View file

@ -0,0 +1,677 @@
"""Sparse DIAgonal format"""
__docformat__ = "restructuredtext en"
__all__ = ['dia_array', 'dia_matrix', 'isspmatrix_dia']
import numpy as np
from .._lib._util import copy_if_needed
from ._matrix import spmatrix
from ._base import issparse, _formats, _spbase, sparray
from ._data import _data_matrix
from ._sputils import (
isdense, isscalarlike, isshape, upcast_char, getdtype, get_sum_dtype,
validateaxis, check_shape
)
from ._sparsetools import dia_matmat, dia_matvec, dia_matvecs
class _dia_base(_data_matrix):
_format = 'dia'
def __init__(self, arg1, shape=None, dtype=None, copy=False, *, maxprint=None):
_data_matrix.__init__(self, arg1, maxprint=maxprint)
if issparse(arg1):
if arg1.format == "dia":
if copy:
arg1 = arg1.copy()
self.data = arg1.data
self.offsets = arg1.offsets
self._shape = check_shape(arg1.shape)
else:
if arg1.format == self.format and copy:
A = arg1.copy()
else:
A = arg1.todia()
self.data = A.data
self.offsets = A.offsets
self._shape = check_shape(A.shape)
elif isinstance(arg1, tuple):
if isshape(arg1):
# It's a tuple of matrix dimensions (M, N)
# create empty matrix
self._shape = check_shape(arg1)
self.data = np.zeros((0,0), getdtype(dtype, default=float))
idx_dtype = self._get_index_dtype(maxval=max(self.shape))
self.offsets = np.zeros((0), dtype=idx_dtype)
else:
try:
# Try interpreting it as (data, offsets)
data, offsets = arg1
except Exception as e:
message = 'unrecognized form for dia_array constructor'
raise ValueError(message) from e
else:
if shape is None:
raise ValueError('expected a shape argument')
if not copy:
copy = copy_if_needed
self.data = np.atleast_2d(np.array(arg1[0], dtype=dtype, copy=copy))
offsets = np.array(arg1[1],
dtype=self._get_index_dtype(maxval=max(shape)),
copy=copy)
self.offsets = np.atleast_1d(offsets)
self._shape = check_shape(shape)
else:
# must be dense, convert to COO first, then to DIA
try:
arg1 = np.asarray(arg1)
except Exception as e:
raise ValueError("unrecognized form for "
f"{self.format}_matrix constructor") from e
if isinstance(self, sparray) and arg1.ndim != 2:
raise ValueError(f"DIA arrays don't support {arg1.ndim}D input. Use 2D")
A = self._coo_container(arg1, dtype=dtype, shape=shape).todia()
self.data = A.data
self.offsets = A.offsets
self._shape = check_shape(A.shape)
if dtype is not None:
newdtype = getdtype(dtype)
self.data = self.data.astype(newdtype)
# check format
if self.offsets.ndim != 1:
raise ValueError('offsets array must have rank 1')
if self.data.ndim != 2:
raise ValueError('data array must have rank 2')
if self.data.shape[0] != len(self.offsets):
raise ValueError(
f'number of diagonals ({self.data.shape[0]}) does not match the number '
f'of offsets ({len(self.offsets)})'
)
if len(np.unique(self.offsets)) != len(self.offsets):
raise ValueError('offset array contains duplicate values')
def __repr__(self):
_, fmt = _formats[self.format]
sparse_cls = 'array' if isinstance(self, sparray) else 'matrix'
d = self.data.shape[0]
return (
f"<{fmt} sparse {sparse_cls} of dtype '{self.dtype}'\n"
f"\twith {self.nnz} stored elements ({d} diagonals) and shape {self.shape}>"
)
def _data_mask(self):
"""Returns a mask of the same shape as self.data, where
mask[i,j] is True when data[i,j] corresponds to a stored element."""
num_rows, num_cols = self.shape
offset_inds = np.arange(self.data.shape[1])
row = offset_inds - self.offsets[:,None]
mask = (row >= 0)
mask &= (row < num_rows)
mask &= (offset_inds < num_cols)
return mask
def count_nonzero(self, axis=None):
if axis is not None:
raise NotImplementedError(
"count_nonzero over an axis is not implemented for DIA format"
)
mask = self._data_mask()
return np.count_nonzero(self.data[mask])
count_nonzero.__doc__ = _spbase.count_nonzero.__doc__
def _getnnz(self, axis=None):
if axis is not None:
raise NotImplementedError("_getnnz over an axis is not implemented "
"for DIA format")
M,N = self.shape
nnz = 0
for k in self.offsets:
if k > 0:
nnz += min(M,N-k)
else:
nnz += min(M+k,N)
return int(nnz)
_getnnz.__doc__ = _spbase._getnnz.__doc__
def sum(self, axis=None, dtype=None, out=None):
axis = validateaxis(axis)
res_dtype = get_sum_dtype(self.dtype)
num_rows, num_cols = self.shape
ret = None
if axis == (0,):
mask = self._data_mask()
x = (self.data * mask).sum(axis=0)
if x.shape[0] == num_cols:
res = x
else:
res = np.zeros(num_cols, dtype=x.dtype)
res[:x.shape[0]] = x
ret = self._ascontainer(res, dtype=res_dtype)
else: # axis is None or (1,)
row_sums = np.zeros((num_rows, 1), dtype=res_dtype)
one = np.ones(num_cols, dtype=res_dtype)
dia_matvec(num_rows, num_cols, len(self.offsets),
self.data.shape[1], self.offsets, self.data, one, row_sums)
row_sums = self._ascontainer(row_sums)
if axis is None:
return row_sums.sum(dtype=dtype, out=out)
ret = self._ascontainer(row_sums.sum(axis=axis))
return ret.sum(axis=(), dtype=dtype, out=out)
sum.__doc__ = _spbase.sum.__doc__
def _add_sparse(self, other, sub=False):
# If other is not DIA format, let them handle us instead.
if not isinstance(other, _dia_base):
return other._add_sparse(self)
# Fast path for exact equality of the sparsity structure.
if np.array_equal(self.offsets, other.offsets):
return self._with_data(self.data - other.data if sub else
self.data + other.data)
# Find the union of the offsets (which will be sorted and unique).
new_offsets = np.union1d(self.offsets, other.offsets)
self_idx = np.searchsorted(new_offsets, self.offsets)
other_idx = np.searchsorted(new_offsets, other.offsets)
self_d = self.data.shape[1]
other_d = other.data.shape[1]
# Fast path for a sparsity structure where the final offsets are a
# permutation of the existing offsets and the diagonal lengths match.
if self_d == other_d and len(new_offsets) == len(self.offsets):
new_data = self.data[_invert_index(self_idx)]
if sub:
new_data[other_idx, :] -= other.data
else:
new_data[other_idx, :] += other.data
elif self_d == other_d and len(new_offsets) == len(other.offsets):
if sub:
new_data = -other.data[_invert_index(other_idx)]
else:
new_data = other.data[_invert_index(other_idx)]
new_data[self_idx, :] += self.data
else:
# Maximum diagonal length of the result.
d = min(self.shape[0] + new_offsets[-1], self.shape[1])
# Add all diagonals to a freshly allocated data array.
new_data = np.zeros(
(len(new_offsets), d),
dtype=np.result_type(self.data, other.data),
)
new_data[self_idx, :self_d] += self.data[:, :d]
if sub:
new_data[other_idx, :other_d] -= other.data[:, :d]
else:
new_data[other_idx, :other_d] += other.data[:, :d]
return self._dia_container((new_data, new_offsets), shape=self.shape)
def _sub_sparse(self, other):
# If other is not DIA format, use default handler.
if not isinstance(other, _dia_base):
return super()._sub_sparse(other)
return self._add_sparse(other, sub=True)
def _mul_scalar(self, other):
return self._with_data(self.data * other)
def multiply(self, other):
if isscalarlike(other):
return self._mul_scalar(other)
if isdense(other):
if other.ndim > 2:
return self.toarray() * other
# Use default handler for pathological cases.
if 0 in self.shape or 1 in self.shape or 0 in other.shape:
return super().multiply(other)
other = np.atleast_2d(other)
other_rows, other_cols = other.shape
rows, cols = self.shape
L = min(self.data.shape[1], cols)
data = self.data[:, :L].astype(np.result_type(self.data, other))
if other_rows == 1:
data *= other[0, :L]
elif other_rows != rows:
raise ValueError('inconsistent shapes')
else:
j = np.arange(L)
if L > rows:
i = (j - self.offsets[:, None]) % rows
else: # can use faster method
i = j - self.offsets[:, None] % rows
if other_cols == 1:
j = 0
elif other_cols != cols:
raise ValueError('inconsistent shapes')
data *= other[i, j]
return self._with_data(data)
# If other is not DIA format or needs broadcasting (unreasonable
# use case for DIA anyway), use default handler.
if not isinstance(other, _dia_base) or other.shape != self.shape:
return super().multiply(other)
# Find common offsets (unique diagonals don't contribute)
# and indices corresponding to them in multiplicand and multiplier.
offsets, self_idx, other_idx = \
np.intersect1d(self.offsets, other.offsets,
assume_unique=True, return_indices=True)
# Only overlapping length of diagonals can have non-zero products.
L = min(self.data.shape[1], other.data.shape[1])
data = self.data[self_idx, :L] * other.data[other_idx, :L]
return self._dia_container((data, offsets), shape=self.shape)
def _matmul_vector(self, other):
x = other
y = np.zeros(self.shape[0], dtype=upcast_char(self.dtype.char,
x.dtype.char))
L = self.data.shape[1]
M,N = self.shape
dia_matvec(M,N, len(self.offsets), L, self.offsets, self.data,
x.ravel(), y.ravel())
return y
def _matmul_multivector(self, other):
res = np.zeros((self.shape[0], other.shape[1]),
dtype=np.result_type(self.data, other))
dia_matvecs(*self.shape, *self.data.shape, self.offsets, self.data,
other.shape[1], other, res)
return res
def _matmul_sparse(self, other):
# If other is not DIA format, use default handler.
if not isinstance(other, _dia_base):
return super()._matmul_sparse(other)
# If any dimension is zero, return empty array immediately.
if 0 in self.shape or 0 in other.shape:
return self._dia_container((self.shape[0], other.shape[1]))
offsets, data = dia_matmat(*self.shape, *self.data.shape,
self.offsets, self.data,
other.shape[1], *other.data.shape,
other.offsets, other.data)
return self._dia_container((data.reshape(len(offsets), -1), offsets),
(self.shape[0], other.shape[1]))
def _setdiag(self, values, k=0):
M, N = self.shape
if values.ndim == 0:
# broadcast
values_n = np.inf
else:
values_n = len(values)
if k < 0:
n = min(M + k, N, values_n)
min_index = 0
max_index = n
else:
n = min(M, N - k, values_n)
min_index = k
max_index = k + n
if values.ndim != 0:
# allow also longer sequences
values = values[:n]
data_rows, data_cols = self.data.shape
if k in self.offsets:
if max_index > data_cols:
data = np.zeros((data_rows, max_index), dtype=self.data.dtype)
data[:, :data_cols] = self.data
self.data = data
self.data[self.offsets == k, min_index:max_index] = values
else:
self.offsets = np.append(self.offsets, self.offsets.dtype.type(k))
m = max(max_index, data_cols)
data = np.zeros((data_rows + 1, m), dtype=self.data.dtype)
data[:-1, :data_cols] = self.data
data[-1, min_index:max_index] = values
self.data = data
def todia(self, copy=False):
if copy:
return self.copy()
else:
return self
todia.__doc__ = _spbase.todia.__doc__
def transpose(self, axes=None, copy=False):
if axes is not None and axes != (1, 0):
raise ValueError("Sparse arrays/matrices do not support "
"an 'axes' parameter because swapping "
"dimensions is the only logical permutation.")
num_rows, num_cols = self.shape
max_dim = max(self.shape)
# flip diagonal offsets
offsets = -self.offsets
# re-align the data matrix
r = np.arange(len(offsets), dtype=np.intc)[:, None]
c = np.arange(num_rows, dtype=np.intc) - (offsets % max_dim)[:, None]
pad_amount = max(0, max_dim-self.data.shape[1])
data = np.hstack((self.data, np.zeros((self.data.shape[0], pad_amount),
dtype=self.data.dtype)))
data = data[r, c]
return self._dia_container((data, offsets), shape=(
num_cols, num_rows), copy=copy)
transpose.__doc__ = _spbase.transpose.__doc__
def diagonal(self, k=0):
rows, cols = self.shape
if k <= -rows or k >= cols:
return np.empty(0, dtype=self.data.dtype)
idx, = np.nonzero(self.offsets == k)
first_col = max(0, k)
last_col = min(rows + k, cols)
result_size = last_col - first_col
if idx.size == 0:
return np.zeros(result_size, dtype=self.data.dtype)
result = self.data[idx[0], first_col:last_col]
padding = result_size - len(result)
if padding > 0:
result = np.pad(result, (0, padding), mode='constant')
return result
diagonal.__doc__ = _spbase.diagonal.__doc__
def tocsc(self, copy=False):
if self.nnz == 0:
return self._csc_container(self.shape, dtype=self.dtype)
num_rows, num_cols = self.shape
num_offsets, offset_len = self.data.shape
offset_inds = np.arange(offset_len)
row = offset_inds - self.offsets[:,None]
mask = (row >= 0)
mask &= (row < num_rows)
mask &= (offset_inds < num_cols)
mask &= (self.data != 0)
idx_dtype = self._get_index_dtype(maxval=max(self.shape))
indptr = np.zeros(num_cols + 1, dtype=idx_dtype)
indptr[1:offset_len+1] = np.cumsum(mask.sum(axis=0)[:num_cols])
if offset_len < num_cols:
indptr[offset_len+1:] = indptr[offset_len]
indices = row.T[mask.T].astype(idx_dtype, copy=False)
data = self.data.T[mask.T]
return self._csc_container((data, indices, indptr), shape=self.shape,
dtype=self.dtype)
tocsc.__doc__ = _spbase.tocsc.__doc__
def tocoo(self, copy=False):
num_rows, num_cols = self.shape
num_offsets, offset_len = self.data.shape
offset_inds = np.arange(offset_len)
row = offset_inds - self.offsets[:,None]
mask = (row >= 0)
mask &= (row < num_rows)
mask &= (offset_inds < num_cols)
mask &= (self.data != 0)
row = row[mask]
col = np.tile(offset_inds, num_offsets)[mask.ravel()]
idx_dtype = self._get_index_dtype(
arrays=(self.offsets,), maxval=max(self.shape)
)
row = row.astype(idx_dtype, copy=False)
col = col.astype(idx_dtype, copy=False)
data = self.data[mask]
# Note: this cannot set has_canonical_format=True, because despite the
# lack of duplicates, we do not generate sorted indices.
return self._coo_container(
(data, (row, col)), shape=self.shape, dtype=self.dtype, copy=False
)
tocoo.__doc__ = _spbase.tocoo.__doc__
# needed by _data_matrix
def _with_data(self, data, copy=True):
"""Returns a matrix with the same sparsity structure as self,
but with different data. By default the structure arrays are copied.
"""
if copy:
return self._dia_container(
(data, self.offsets.copy()), shape=self.shape
)
else:
return self._dia_container(
(data, self.offsets), shape=self.shape
)
def resize(self, *shape):
shape = check_shape(shape)
M, N = shape
# we do not need to handle the case of expanding N
self.data = self.data[:, :N]
if (M > self.shape[0] and
np.any(self.offsets + self.shape[0] < self.data.shape[1])):
# explicitly clear values that were previously hidden
mask = (self.offsets[:, None] + self.shape[0] <=
np.arange(self.data.shape[1]))
self.data[mask] = 0
self._shape = shape
resize.__doc__ = _spbase.resize.__doc__
def _invert_index(idx):
"""Helper function to invert an index array."""
inv = np.zeros_like(idx)
inv[idx] = np.arange(len(idx))
return inv
def isspmatrix_dia(x):
"""Is `x` of dia_matrix type?
Parameters
----------
x
object to check for being a dia matrix
Returns
-------
bool
True if `x` is a dia matrix, False otherwise
Examples
--------
>>> from scipy.sparse import dia_array, dia_matrix, coo_matrix, isspmatrix_dia
>>> isspmatrix_dia(dia_matrix([[5]]))
True
>>> isspmatrix_dia(dia_array([[5]]))
False
>>> isspmatrix_dia(coo_matrix([[5]]))
False
"""
return isinstance(x, dia_matrix)
# This namespace class separates array from matrix with isinstance
class dia_array(_dia_base, sparray):
"""
Sparse array with DIAgonal storage.
This can be instantiated in several ways:
dia_array(D)
where D is a 2-D ndarray
dia_array(S)
with another sparse array or matrix S (equivalent to S.todia())
dia_array((M, N), [dtype])
to construct an empty array with shape (M, N),
dtype is optional, defaulting to dtype='d'.
dia_array((data, offsets), shape=(M, N))
where the ``data[k,:]`` stores the diagonal entries for
diagonal ``offsets[k]`` (See example below)
Attributes
----------
dtype : dtype
Data type of the array
shape : 2-tuple
Shape of the array
ndim : int
Number of dimensions (this is always 2)
nnz
size
data
DIA format data array of the array
offsets
DIA format offset array of the array
T
Notes
-----
Sparse arrays can be used in arithmetic operations: they support
addition, subtraction, multiplication, division, and matrix power.
Sparse arrays with DIAgonal storage do not support slicing.
Examples
--------
>>> import numpy as np
>>> from scipy.sparse import dia_array
>>> dia_array((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)
>>> data = np.array([[1, 2, 3, 4]]).repeat(3, axis=0)
>>> offsets = np.array([0, -1, 2])
>>> dia_array((data, offsets), shape=(4, 4)).toarray()
array([[1, 0, 3, 0],
[1, 2, 0, 4],
[0, 2, 3, 0],
[0, 0, 3, 4]])
>>> from scipy.sparse import dia_array
>>> n = 10
>>> ex = np.ones(n)
>>> data = np.array([ex, 2 * ex, ex])
>>> offsets = np.array([-1, 0, 1])
>>> dia_array((data, offsets), shape=(n, n)).toarray()
array([[2., 1., 0., ..., 0., 0., 0.],
[1., 2., 1., ..., 0., 0., 0.],
[0., 1., 2., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 2., 1., 0.],
[0., 0., 0., ..., 1., 2., 1.],
[0., 0., 0., ..., 0., 1., 2.]])
"""
class dia_matrix(spmatrix, _dia_base):
"""
Sparse matrix with DIAgonal storage.
This can be instantiated in several ways:
dia_matrix(D)
where D is a 2-D ndarray
dia_matrix(S)
with another sparse array or matrix S (equivalent to S.todia())
dia_matrix((M, N), [dtype])
to construct an empty matrix with shape (M, N),
dtype is optional, defaulting to dtype='d'.
dia_matrix((data, offsets), shape=(M, N))
where the ``data[k,:]`` stores the diagonal entries for
diagonal ``offsets[k]`` (See example below)
Attributes
----------
dtype : dtype
Data type of the matrix
shape : 2-tuple
Shape of the matrix
ndim : int
Number of dimensions (this is always 2)
nnz
size
data
DIA format data array of the matrix
offsets
DIA format offset array of the matrix
T
Notes
-----
Sparse matrices can be used in arithmetic operations: they support
addition, subtraction, multiplication, division, and matrix power.
Sparse matrices with DIAgonal storage do not support slicing.
Examples
--------
>>> import numpy as np
>>> from scipy.sparse import dia_matrix
>>> dia_matrix((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)
>>> data = np.array([[1, 2, 3, 4]]).repeat(3, axis=0)
>>> offsets = np.array([0, -1, 2])
>>> dia_matrix((data, offsets), shape=(4, 4)).toarray()
array([[1, 0, 3, 0],
[1, 2, 0, 4],
[0, 2, 3, 0],
[0, 0, 3, 4]])
>>> from scipy.sparse import dia_matrix
>>> n = 10
>>> ex = np.ones(n)
>>> data = np.array([ex, 2 * ex, ex])
>>> offsets = np.array([-1, 0, 1])
>>> dia_matrix((data, offsets), shape=(n, n)).toarray()
array([[2., 1., 0., ..., 0., 0., 0.],
[1., 2., 1., ..., 0., 0., 0.],
[0., 1., 2., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 2., 1., 0.],
[0., 0., 0., ..., 1., 2., 1.],
[0., 0., 0., ..., 0., 1., 2.]])
"""

View file

@ -0,0 +1,669 @@
"""Dictionary Of Keys based matrix"""
__docformat__ = "restructuredtext en"
__all__ = ['dok_array', 'dok_matrix', 'isspmatrix_dok']
import itertools
import numpy as np
from ._matrix import spmatrix
from ._base import _spbase, sparray, issparse
from ._index import IndexMixin
from ._sputils import (isdense, getdtype, isshape, isintlike, isscalarlike,
upcast, upcast_scalar, check_shape)
class _dok_base(_spbase, IndexMixin, dict):
_format = 'dok'
_allow_nd = (1, 2)
def __init__(self, arg1, shape=None, dtype=None, copy=False, *, maxprint=None):
_spbase.__init__(self, arg1, maxprint=maxprint)
if isinstance(arg1, tuple) and isshape(arg1, allow_nd=self._allow_nd):
self._shape = check_shape(arg1, allow_nd=self._allow_nd)
self._dict = {}
self.dtype = getdtype(dtype, default=float)
elif issparse(arg1): # Sparse ctor
if arg1.format == self.format:
arg1 = arg1.copy() if copy else arg1
else:
arg1 = arg1.todok()
if dtype is not None:
arg1 = arg1.astype(dtype, copy=False)
self._dict = arg1._dict
self._shape = check_shape(arg1.shape, allow_nd=self._allow_nd)
self.dtype = getdtype(arg1.dtype)
else: # Dense ctor
try:
arg1 = np.asarray(arg1)
except Exception as e:
raise TypeError('Invalid input format.') from e
if arg1.ndim > 2:
raise ValueError(f"DOK arrays don't yet support {arg1.ndim}D input.")
if arg1.ndim == 1:
if dtype is not None:
arg1 = arg1.astype(dtype)
self._dict = {i: v for i, v in enumerate(arg1) if v != 0}
self.dtype = getdtype(arg1.dtype)
else:
d = self._coo_container(arg1, shape=shape, dtype=dtype).todok()
self._dict = d._dict
self.dtype = getdtype(d.dtype)
self._shape = check_shape(arg1.shape, allow_nd=self._allow_nd)
def update(self, val):
# Prevent direct usage of update
raise NotImplementedError("Direct update to DOK sparse format is not allowed.")
def _getnnz(self, axis=None):
if axis is not None:
raise NotImplementedError(
"_getnnz over an axis is not implemented for DOK format."
)
return len(self._dict)
def count_nonzero(self, axis=None):
if axis is not None:
raise NotImplementedError(
"count_nonzero over an axis is not implemented for DOK format."
)
return sum(x != 0 for x in self.values())
_getnnz.__doc__ = _spbase._getnnz.__doc__
count_nonzero.__doc__ = _spbase.count_nonzero.__doc__
def __len__(self):
return len(self._dict)
def __contains__(self, key):
return key in self._dict
def setdefault(self, key, default=None, /):
return self._dict.setdefault(key, default)
def __delitem__(self, key, /):
del self._dict[key]
def clear(self):
return self._dict.clear()
def pop(self, /, *args):
return self._dict.pop(*args)
def __reversed__(self):
raise TypeError("reversed is not defined for dok_array type")
def __or__(self, other):
type_names = f"{type(self).__name__} and {type(other).__name__}"
raise TypeError(f"unsupported operand type for |: {type_names}")
def __ror__(self, other):
type_names = f"{type(self).__name__} and {type(other).__name__}"
raise TypeError(f"unsupported operand type for |: {type_names}")
def __ior__(self, other):
type_names = f"{type(self).__name__} and {type(other).__name__}"
raise TypeError(f"unsupported operand type for |: {type_names}")
def popitem(self):
return self._dict.popitem()
def items(self):
return self._dict.items()
def keys(self):
return self._dict.keys()
def values(self):
return self._dict.values()
def get(self, key, default=0.0):
"""This provides dict.get method functionality with type checking"""
if key in self._dict:
return self._dict[key]
if isintlike(key) and self.ndim == 1:
key = (key,)
if self.ndim != len(key):
raise IndexError(f'Index {key} length needs to match self.shape')
try:
for i in key:
assert isintlike(i)
except (AssertionError, TypeError, ValueError) as e:
raise IndexError('Index must be or consist of integers.') from e
key = tuple(i + M if i < 0 else i for i, M in zip(key, self.shape))
if any(i < 0 or i >= M for i, M in zip(key, self.shape)):
raise IndexError('Index out of bounds.')
if self.ndim == 1:
key = key[0]
return self._dict.get(key, default)
# 1D get methods
def _get_int(self, idx):
return self._dict.get(idx, self.dtype.type(0))
def _get_slice(self, idx):
i_range = range(*idx.indices(self.shape[0]))
return self._get_array(list(i_range))
def _get_array(self, idx):
idx = np.asarray(idx)
if idx.ndim == 0:
val = self._dict.get(int(idx), self.dtype.type(0))
return np.array(val, stype=self.dtype)
new_dok = self._dok_container(idx.shape, dtype=self.dtype)
dok_vals = [self._dict.get(i, 0) for i in idx.ravel()]
if dok_vals:
if len(idx.shape) == 1:
for i, v in enumerate(dok_vals):
if v:
new_dok._dict[i] = v
else:
new_idx = np.unravel_index(np.arange(len(dok_vals)), idx.shape)
new_idx = new_idx[0] if len(new_idx) == 1 else zip(*new_idx)
for i, v in zip(new_idx, dok_vals, strict=True):
if v:
new_dok._dict[i] = v
return new_dok
# 2D get methods
def _get_intXint(self, row, col):
return self._dict.get((row, col), self.dtype.type(0))
def _get_intXslice(self, row, col):
return self._get_sliceXslice(slice(row, row + 1), col)
def _get_sliceXint(self, row, col):
return self._get_sliceXslice(row, slice(col, col + 1))
def _get_sliceXslice(self, row, col):
row_start, row_stop, row_step = row.indices(self.shape[0])
col_start, col_stop, col_step = col.indices(self.shape[1])
row_range = range(row_start, row_stop, row_step)
col_range = range(col_start, col_stop, col_step)
shape = (len(row_range), len(col_range))
# Switch paths only when advantageous
# (count the iterations in the loops, adjust for complexity)
if len(self) >= 2 * shape[0] * shape[1]:
# O(nr*nc) path: loop over <row x col>
return self._get_columnXarray(row_range, col_range)
# O(nnz) path: loop over entries of self
newdok = self._dok_container(shape, dtype=self.dtype)
for key in self.keys():
i, ri = divmod(int(key[0]) - row_start, row_step)
if ri != 0 or i < 0 or i >= shape[0]:
continue
j, rj = divmod(int(key[1]) - col_start, col_step)
if rj != 0 or j < 0 or j >= shape[1]:
continue
newdok._dict[i, j] = self._dict[key]
return newdok
def _get_intXarray(self, row, col):
return self._get_columnXarray([row], col.ravel())
def _get_arrayXint(self, row, col):
res = self._get_columnXarray(row.ravel(), [col])
if row.ndim > 1:
return res.reshape(row.shape)
return res
def _get_sliceXarray(self, row, col):
row = list(range(*row.indices(self.shape[0])))
return self._get_columnXarray(row, col)
def _get_arrayXslice(self, row, col):
col = list(range(*col.indices(self.shape[1])))
return self._get_columnXarray(row, col)
def _get_columnXarray(self, row, col):
# outer indexing
newdok = self._dok_container((len(row), len(col)), dtype=self.dtype)
for i, r in enumerate(row):
for j, c in enumerate(col):
v = self._dict.get((r, c), 0)
if v:
newdok._dict[i, j] = v
return newdok
def _get_arrayXarray(self, row, col):
# inner indexing
i, j = map(np.atleast_2d, np.broadcast_arrays(row, col))
newdok = self._dok_container(i.shape, dtype=self.dtype)
for key in itertools.product(range(i.shape[0]), range(i.shape[1])):
v = self._dict.get((i[key], j[key]), 0)
if v:
newdok._dict[key] = v
return newdok
# 1D set methods
def _set_int(self, idx, x):
if x:
self._dict[idx] = x
elif idx in self._dict:
del self._dict[idx]
def _set_array(self, idx, x):
idx_set = idx.ravel()
x_set = x.ravel()
if len(idx_set) != len(x_set):
if len(x_set) == 1:
x_set = np.full(len(idx_set), x_set[0], dtype=self.dtype)
else:
raise ValueError("Need len(index)==len(data) or len(data)==1")
for i, v in zip(idx_set, x_set):
if v:
self._dict[i] = v
elif i in self._dict:
del self._dict[i]
# 2D set methods
def _set_intXint(self, row, col, x):
key = (row, col)
if x:
self._dict[key] = x
elif key in self._dict:
del self._dict[key]
def _set_arrayXarray(self, row, col, x):
row = list(map(int, row.ravel()))
col = list(map(int, col.ravel()))
x = x.ravel()
self._dict.update(zip(zip(row, col), x))
for i in np.nonzero(x == 0)[0]:
key = (row[i], col[i])
if self._dict[key] == 0:
# may have been superseded by later update
del self._dict[key]
def __add__(self, other):
if isscalarlike(other):
res_dtype = upcast_scalar(self.dtype, other)
new = self._dok_container(self.shape, dtype=res_dtype)
# Add this scalar to each element.
for key in itertools.product(*[range(d) for d in self.shape]):
aij = self._dict.get(key, 0) + other
if aij:
new[key] = aij
elif issparse(other):
if other.shape != self.shape:
raise ValueError("Matrix dimensions are not equal.")
res_dtype = upcast(self.dtype, other.dtype)
new = self._dok_container(self.shape, dtype=res_dtype)
new._dict = self._dict.copy()
if other.format == "dok":
o_items = other.items()
else:
other = other.tocoo()
if self.ndim == 1:
o_items = zip(other.coords[0], other.data)
else:
o_items = zip(zip(*other.coords), other.data)
with np.errstate(over='ignore'):
new._dict.update((k, new[k] + v) for k, v in o_items)
elif isdense(other):
new = self.todense() + other
else:
return NotImplemented
return new
def __radd__(self, other):
return self + other # addition is commutative
def __neg__(self):
if self.dtype.kind == 'b':
raise NotImplementedError(
'Negating a sparse boolean matrix is not supported.'
)
new = self._dok_container(self.shape, dtype=self.dtype)
new._dict.update((k, -v) for k, v in self.items())
return new
def _mul_scalar(self, other):
res_dtype = upcast_scalar(self.dtype, other)
# Multiply this scalar by every element.
new = self._dok_container(self.shape, dtype=res_dtype)
new._dict.update(((k, v * other) for k, v in self.items()))
return new
def _matmul_vector(self, other):
res_dtype = upcast(self.dtype, other.dtype)
# vector @ vector
if self.ndim == 1:
if issparse(other):
if other.format == "dok":
keys = self.keys() & other.keys()
else:
keys = self.keys() & other.tocoo().coords[0]
return res_dtype(sum(self._dict[k] * other._dict[k] for k in keys))
elif isdense(other):
return res_dtype(sum(other[k] * v for k, v in self.items()))
else:
return NotImplemented
# matrix @ vector
result = np.zeros(self.shape[0], dtype=res_dtype)
for (i, j), v in self.items():
result[i] += v * other[j]
return result
def _matmul_multivector(self, other):
result_dtype = upcast(self.dtype, other.dtype)
# vector @ multivector
if self.ndim == 1:
# works for other 1d or 2d
return sum(v * other[j] for j, v in self._dict.items())
# matrix @ multivector
M = self.shape[0]
new_shape = (M,) if other.ndim == 1 else (M, other.shape[1])
result = np.zeros(new_shape, dtype=result_dtype)
for (i, j), v in self.items():
result[i] += v * other[j]
return result
def __imul__(self, other):
if isscalarlike(other):
self._dict.update((k, v * other) for k, v in self.items())
return self
return NotImplemented
def __truediv__(self, other):
if isscalarlike(other):
res_dtype = upcast_scalar(self.dtype, other)
new = self._dok_container(self.shape, dtype=res_dtype)
new._dict.update(((k, v / other) for k, v in self.items()))
return new
return self.tocsr() / other
def __itruediv__(self, other):
if isscalarlike(other):
self._dict.update((k, v / other) for k, v in self.items())
return self
return NotImplemented
def __reduce__(self):
# this approach is necessary because __setstate__ is called after
# __setitem__ upon unpickling and since __init__ is not called there
# is no shape attribute hence it is not possible to unpickle it.
return dict.__reduce__(self)
def diagonal(self, k=0):
if self.ndim == 2:
return super().diagonal(k)
raise ValueError("diagonal requires two dimensions")
def transpose(self, axes=None, copy=False):
if self.ndim == 1:
return self.copy()
if axes is not None and axes != (1, 0):
raise ValueError(
"Sparse arrays/matrices do not support "
"an 'axes' parameter because swapping "
"dimensions is the only logical permutation."
)
M, N = self.shape
new = self._dok_container((N, M), dtype=self.dtype, copy=copy)
new._dict.update((((right, left), val) for (left, right), val in self.items()))
return new
transpose.__doc__ = _spbase.transpose.__doc__
def copy(self):
new = self._dok_container(self.shape, dtype=self.dtype)
new._dict.update(self._dict)
return new
copy.__doc__ = _spbase.copy.__doc__
@classmethod
def fromkeys(cls, iterable, value=1, /):
tmp = dict.fromkeys(iterable, value)
if isinstance(next(iter(tmp)), tuple):
shape = tuple(max(idx) + 1 for idx in zip(*tmp))
else:
shape = (max(tmp) + 1,)
result = cls(shape, dtype=type(value))
result._dict = tmp
return result
def tocoo(self, copy=False):
nnz = self.nnz
if nnz == 0:
return self._coo_container(self.shape, dtype=self.dtype)
idx_dtype = self._get_index_dtype(maxval=max(self.shape))
data = np.fromiter(self.values(), dtype=self.dtype, count=nnz)
# handle 1d keys specially b/c not a tuple
inds = zip(*self.keys()) if self.ndim > 1 else (self.keys(),)
coords = tuple(np.fromiter(ix, dtype=idx_dtype, count=nnz) for ix in inds)
A = self._coo_container((data, coords), shape=self.shape, dtype=self.dtype)
A.has_canonical_format = True
return A
tocoo.__doc__ = _spbase.tocoo.__doc__
def todok(self, copy=False):
if copy:
return self.copy()
return self
todok.__doc__ = _spbase.todok.__doc__
def tocsc(self, copy=False):
if self.ndim == 1:
raise NotImplementedError("tocsr() not valid for 1d sparse array")
return self.tocoo(copy=False).tocsc(copy=copy)
tocsc.__doc__ = _spbase.tocsc.__doc__
def resize(self, *shape):
shape = check_shape(shape, allow_nd=self._allow_nd)
if len(shape) != len(self.shape):
# TODO implement resize across dimensions
raise NotImplementedError
if self.ndim == 1:
newN = shape[-1]
for i in list(self._dict):
if i >= newN:
del self._dict[i]
self._shape = shape
return
newM, newN = shape
M, N = self.shape
if newM < M or newN < N:
# Remove all elements outside new dimensions
for i, j in list(self.keys()):
if i >= newM or j >= newN:
del self._dict[i, j]
self._shape = shape
resize.__doc__ = _spbase.resize.__doc__
# Added for 1d to avoid `tocsr` from _base.py
def astype(self, dtype, casting='unsafe', copy=True):
dtype = np.dtype(dtype)
if self.dtype != dtype:
result = self._dok_container(self.shape, dtype=dtype)
data = np.array(list(self._dict.values()), dtype=dtype)
result._dict = dict(zip(self._dict, data))
return result
elif copy:
return self.copy()
return self
def isspmatrix_dok(x):
"""Is `x` of dok_array type?
Parameters
----------
x
object to check for being a dok matrix
Returns
-------
bool
True if `x` is a dok matrix, False otherwise
Examples
--------
>>> from scipy.sparse import dok_array, dok_matrix, coo_matrix, isspmatrix_dok
>>> isspmatrix_dok(dok_matrix([[5]]))
True
>>> isspmatrix_dok(dok_array([[5]]))
False
>>> isspmatrix_dok(coo_matrix([[5]]))
False
"""
return isinstance(x, dok_matrix)
# This namespace class separates array from matrix with isinstance
class dok_array(_dok_base, sparray):
"""
Dictionary Of Keys based sparse array.
This is an efficient structure for constructing sparse
arrays incrementally.
This can be instantiated in several ways:
dok_array(D)
where D is a 2-D ndarray
dok_array(S)
with another sparse array or matrix S (equivalent to S.todok())
dok_array((M,N), [dtype])
create the array with initial shape (M,N)
dtype is optional, defaulting to dtype='d'
Attributes
----------
dtype : dtype
Data type of the array
shape : 2-tuple
Shape of the array
ndim : int
Number of dimensions (this is always 2)
nnz
Number of nonzero elements
size
T
Notes
-----
Sparse arrays can be used in arithmetic operations: they support
addition, subtraction, multiplication, division, and matrix power.
- Allows for efficient O(1) access of individual elements.
- Duplicates are not allowed.
- Can be efficiently converted to a coo_array once constructed.
Examples
--------
>>> import numpy as np
>>> from scipy.sparse import dok_array
>>> S = dok_array((5, 5), dtype=np.float32)
>>> for i in range(5):
... for j in range(5):
... S[i, j] = i + j # Update element
"""
class dok_matrix(spmatrix, _dok_base):
"""
Dictionary Of Keys based sparse matrix.
This is an efficient structure for constructing sparse
matrices incrementally.
This can be instantiated in several ways:
dok_matrix(D)
where D is a 2-D ndarray
dok_matrix(S)
with another sparse array or matrix S (equivalent to S.todok())
dok_matrix((M,N), [dtype])
create the matrix with initial shape (M,N)
dtype is optional, defaulting to dtype='d'
Attributes
----------
dtype : dtype
Data type of the matrix
shape : 2-tuple
Shape of the matrix
ndim : int
Number of dimensions (this is always 2)
nnz
Number of nonzero elements
size
T
Notes
-----
Sparse matrices can be used in arithmetic operations: they support
addition, subtraction, multiplication, division, and matrix power.
- Allows for efficient O(1) access of individual elements.
- Duplicates are not allowed.
- Can be efficiently converted to a coo_matrix once constructed.
Examples
--------
>>> import numpy as np
>>> from scipy.sparse import dok_matrix
>>> S = dok_matrix((5, 5), dtype=np.float32)
>>> for i in range(5):
... for j in range(5):
... S[i, j] = i + j # Update element
"""
def set_shape(self, shape):
new_matrix = self.reshape(shape, copy=False).asformat(self.format)
self.__dict__ = new_matrix.__dict__
def get_shape(self):
"""Get shape of a sparse matrix."""
return self._shape
shape = property(fget=get_shape, fset=set_shape)
def __reversed__(self):
return self._dict.__reversed__()
def __or__(self, other):
if isinstance(other, _dok_base):
return self._dict | other._dict
return self._dict | other
def __ror__(self, other):
if isinstance(other, _dok_base):
return self._dict | other._dict
return self._dict | other
def __ior__(self, other):
if isinstance(other, _dok_base):
self._dict |= other._dict
else:
self._dict |= other
return self

View file

@ -0,0 +1,178 @@
"""Functions to extract parts of sparse matrices
"""
__docformat__ = "restructuredtext en"
__all__ = ['find', 'tril', 'triu']
from ._coo import coo_matrix, coo_array
from ._base import sparray
def find(A):
"""Return the indices and values of the nonzero elements of a matrix
Parameters
----------
A : dense or sparse array or matrix
Matrix whose nonzero elements are desired.
Returns
-------
(I,J,V) : tuple of arrays
I,J, and V contain the row indices, column indices, and values
of the nonzero entries.
Examples
--------
>>> from scipy.sparse import csr_array, find
>>> A = csr_array([[7.0, 8.0, 0],[0, 0, 9.0]])
>>> find(A)
(array([0, 0, 1], dtype=int32),
array([0, 1, 2], dtype=int32),
array([ 7., 8., 9.]))
"""
A = coo_array(A, copy=True)
A.sum_duplicates()
# remove explicit zeros
nz_mask = A.data != 0
return A.row[nz_mask], A.col[nz_mask], A.data[nz_mask]
def tril(A, k=0, format=None):
"""Return the lower triangular portion of a sparse array or matrix
Returns the elements on or below the k-th diagonal of A.
- k = 0 corresponds to the main diagonal
- k > 0 is above the main diagonal
- k < 0 is below the main diagonal
Parameters
----------
A : dense or sparse array or matrix
Matrix whose lower trianglar portion is desired.
k : integer : optional
The top-most diagonal of the lower triangle.
format : string
Sparse format of the result, e.g. format="csr", etc.
Returns
-------
L : sparse matrix
Lower triangular portion of A in sparse format.
See Also
--------
triu : upper triangle in sparse format
Examples
--------
>>> from scipy.sparse import csr_array, tril
>>> A = csr_array([[1, 2, 0, 0, 3], [4, 5, 0, 6, 7], [0, 0, 8, 9, 0]],
... dtype='int32')
>>> A.toarray()
array([[1, 2, 0, 0, 3],
[4, 5, 0, 6, 7],
[0, 0, 8, 9, 0]], dtype=int32)
>>> tril(A).toarray()
array([[1, 0, 0, 0, 0],
[4, 5, 0, 0, 0],
[0, 0, 8, 0, 0]], dtype=int32)
>>> tril(A).nnz
4
>>> tril(A, k=1).toarray()
array([[1, 2, 0, 0, 0],
[4, 5, 0, 0, 0],
[0, 0, 8, 9, 0]], dtype=int32)
>>> tril(A, k=-1).toarray()
array([[0, 0, 0, 0, 0],
[4, 0, 0, 0, 0],
[0, 0, 0, 0, 0]], dtype=int32)
>>> tril(A, format='csc')
<Compressed Sparse Column sparse array of dtype 'int32'
with 4 stored elements and shape (3, 5)>
"""
coo_sparse = coo_array if isinstance(A, sparray) else coo_matrix
# convert to COOrdinate format where things are easy
A = coo_sparse(A, copy=False)
mask = A.row + k >= A.col
row = A.row[mask]
col = A.col[mask]
data = A.data[mask]
new_coo = coo_sparse((data, (row, col)), shape=A.shape, dtype=A.dtype)
return new_coo.asformat(format)
def triu(A, k=0, format=None):
"""Return the upper triangular portion of a sparse array or matrix
Returns the elements on or above the k-th diagonal of A.
- k = 0 corresponds to the main diagonal
- k > 0 is above the main diagonal
- k < 0 is below the main diagonal
Parameters
----------
A : dense or sparse array or matrix
Matrix whose upper trianglar portion is desired.
k : integer : optional
The bottom-most diagonal of the upper triangle.
format : string
Sparse format of the result, e.g. format="csr", etc.
Returns
-------
L : sparse array or matrix
Upper triangular portion of A in sparse format.
Sparse array if A is a sparse array, otherwise matrix.
See Also
--------
tril : lower triangle in sparse format
Examples
--------
>>> from scipy.sparse import csr_array, triu
>>> A = csr_array([[1, 2, 0, 0, 3], [4, 5, 0, 6, 7], [0, 0, 8, 9, 0]],
... dtype='int32')
>>> A.toarray()
array([[1, 2, 0, 0, 3],
[4, 5, 0, 6, 7],
[0, 0, 8, 9, 0]], dtype=int32)
>>> triu(A).toarray()
array([[1, 2, 0, 0, 3],
[0, 5, 0, 6, 7],
[0, 0, 8, 9, 0]], dtype=int32)
>>> triu(A).nnz
8
>>> triu(A, k=1).toarray()
array([[0, 2, 0, 0, 3],
[0, 0, 0, 6, 7],
[0, 0, 0, 9, 0]], dtype=int32)
>>> triu(A, k=-1).toarray()
array([[1, 2, 0, 0, 3],
[4, 5, 0, 6, 7],
[0, 0, 8, 9, 0]], dtype=int32)
>>> triu(A, format='csc')
<Compressed Sparse Column sparse array of dtype 'int32'
with 8 stored elements and shape (3, 5)>
"""
coo_sparse = coo_array if isinstance(A, sparray) else coo_matrix
# convert to COOrdinate format where things are easy
A = coo_sparse(A, copy=False)
mask = A.row + k <= A.col
row = A.row[mask]
col = A.col[mask]
data = A.data[mask]
new_coo = coo_sparse((data, (row, col)), shape=A.shape, dtype=A.dtype)
return new_coo.asformat(format)

View file

@ -0,0 +1,444 @@
"""Indexing mixin for sparse array/matrix classes.
"""
import numpy as np
from ._sputils import isintlike
from ._base import sparray, issparse
INT_TYPES = (int, np.integer)
def _broadcast_arrays(a, b):
"""
Same as np.broadcast_arrays(a, b) but old writeability rules.
NumPy >= 1.17.0 transitions broadcast_arrays to return
read-only arrays. Set writeability explicitly to avoid warnings.
Retain the old writeability rules, as our Cython code assumes
the old behavior.
"""
x, y = np.broadcast_arrays(a, b)
x.flags.writeable = a.flags.writeable
y.flags.writeable = b.flags.writeable
return x, y
class IndexMixin:
"""
This class provides common dispatching and validation logic for indexing.
"""
def __getitem__(self, key):
index, new_shape = self._validate_indices(key)
# 1D array
if len(index) == 1:
idx = index[0]
if isinstance(idx, np.ndarray):
if idx.shape == ():
idx = idx.item()
if isinstance(idx, INT_TYPES):
res = self._get_int(idx)
elif isinstance(idx, slice):
res = self._get_slice(idx)
else: # assume array idx
res = self._get_array(idx)
# package the result and return
if not isinstance(self, sparray):
return res
# handle np.newaxis in idx when result would otherwise be a scalar
if res.shape == () and new_shape != ():
if len(new_shape) == 1:
return self.__class__([res], shape=new_shape, dtype=self.dtype)
if len(new_shape) == 2:
return self.__class__([[res]], shape=new_shape, dtype=self.dtype)
return res.reshape(new_shape)
# 2D array
row, col = index
# Dispatch to specialized methods.
if isinstance(row, INT_TYPES):
if isinstance(col, INT_TYPES):
res = self._get_intXint(row, col)
elif isinstance(col, slice):
res = self._get_intXslice(row, col)
elif col.ndim == 1:
res = self._get_intXarray(row, col)
elif col.ndim == 2:
res = self._get_intXarray(row, col)
else:
raise IndexError('index results in >2 dimensions')
elif isinstance(row, slice):
if isinstance(col, INT_TYPES):
res = self._get_sliceXint(row, col)
elif isinstance(col, slice):
if row == slice(None) and row == col:
res = self.copy()
else:
res = self._get_sliceXslice(row, col)
elif col.ndim == 1:
res = self._get_sliceXarray(row, col)
else:
raise IndexError('index results in >2 dimensions')
else:
if isinstance(col, INT_TYPES):
res = self._get_arrayXint(row, col)
elif isinstance(col, slice):
res = self._get_arrayXslice(row, col)
# arrayXarray preprocess
elif (row.ndim == 2 and row.shape[1] == 1
and (col.ndim == 1 or col.shape[0] == 1)):
# outer indexing
res = self._get_columnXarray(row[:, 0], col.ravel())
else:
# inner indexing
row, col = _broadcast_arrays(row, col)
if row.shape != col.shape:
raise IndexError('number of row and column indices differ')
if row.size == 0:
res = self.__class__(np.atleast_2d(row).shape, dtype=self.dtype)
else:
res = self._get_arrayXarray(row, col)
# handle spmatrix (must be 2d, dont let 1d new_shape start reshape)
if not isinstance(self, sparray):
if new_shape == () or (len(new_shape) == 1 and res.ndim != 0):
# res handles cases not inflated by None
return res
if len(new_shape) == 1:
# shape inflated to 1D by None in index. Make 2D
new_shape = (1,) + new_shape
# reshape if needed (when None changes shape, e.g. A[1,:,None])
return res if new_shape == res.shape else res.reshape(new_shape)
# package the result and return
if res.shape != new_shape:
# handle formats that support indexing but not 1D (lil for now)
if self.format == "lil" and len(new_shape) != 2:
if res.shape == ():
return self._coo_container([res], shape = new_shape)
return res.tocoo().reshape(new_shape)
return res.reshape(new_shape)
return res
def __setitem__(self, key, x):
index, _ = self._validate_indices(key)
# 1D array
if len(index) == 1:
idx = index[0]
if issparse(x):
x = x.toarray()
else:
x = np.asarray(x, dtype=self.dtype)
if isinstance(idx, INT_TYPES):
if x.size != 1:
raise ValueError('Trying to assign a sequence to an item')
self._set_int(idx, x.flat[0])
return
if isinstance(idx, slice):
# check for simple case of slice that gives 1 item
# Note: Python `range` does not use lots of memory
idx_range = range(*idx.indices(self.shape[0]))
N = len(idx_range)
if N == 1 and x.size == 1:
self._set_int(idx_range[0], x.flat[0])
return
idx = np.arange(*idx.indices(self.shape[0]))
idx_shape = idx.shape
else:
idx_shape = idx.squeeze().shape
# broadcast scalar to full 1d
if x.squeeze().shape != idx_shape:
x = np.broadcast_to(x, idx.shape)
if x.size != 0:
self._set_array(idx, x)
return
# 2D array
row, col = index
if isinstance(row, INT_TYPES) and isinstance(col, INT_TYPES):
x = np.asarray(x, dtype=self.dtype)
if x.size != 1:
raise ValueError('Trying to assign a sequence to an item')
self._set_intXint(row, col, x.flat[0])
return
if isinstance(row, slice):
row = np.arange(*row.indices(self.shape[0]))[:, None]
else:
row = np.atleast_1d(row)
if isinstance(col, slice):
col = np.arange(*col.indices(self.shape[1]))[None, :]
if row.ndim == 1:
row = row[:, None]
else:
col = np.atleast_1d(col)
i, j = _broadcast_arrays(row, col)
if i.shape != j.shape:
raise IndexError('number of row and column indices differ')
if issparse(x):
if 0 in x.shape:
return
if i.ndim == 1:
# Inner indexing, so treat them like row vectors.
i = i[None]
j = j[None]
x = x.tocoo(copy=False).reshape(x._shape_as_2d, copy=True)
broadcast_row = x.shape[0] == 1 and i.shape[0] != 1
broadcast_col = x.shape[1] == 1 and i.shape[1] != 1
if not ((broadcast_row or x.shape[0] == i.shape[0]) and
(broadcast_col or x.shape[1] == i.shape[1])):
raise ValueError('shape mismatch in assignment')
x.sum_duplicates()
self._set_arrayXarray_sparse(i, j, x)
else:
# Make x and i into the same shape
x = np.asarray(x, dtype=self.dtype)
if x.squeeze().shape != i.squeeze().shape:
x = np.broadcast_to(x, i.shape)
if x.size == 0:
return
x = x.reshape(i.shape)
self._set_arrayXarray(i, j, x)
def _validate_indices(self, key):
"""Returns two tuples: (index tuple, requested shape tuple)"""
# single ellipsis
if key is Ellipsis:
return (slice(None),) * self.ndim, self.shape
if not isinstance(key, tuple):
key = [key]
ellps_pos = None
index_1st = []
prelim_ndim = 0
for i, idx in enumerate(key):
if idx is Ellipsis:
if ellps_pos is not None:
raise IndexError('an index can only have a single ellipsis')
ellps_pos = i
elif idx is None:
index_1st.append(idx)
elif isinstance(idx, slice) or isintlike(idx):
index_1st.append(idx)
prelim_ndim += 1
elif (ix := _compatible_boolean_index(idx, self.ndim)) is not None:
index_1st.append(ix)
prelim_ndim += ix.ndim
elif issparse(idx):
# TODO: make sparse matrix indexing work for sparray
raise IndexError(
'Indexing with sparse matrices is not supported '
'except boolean indexing where matrix and index '
'are equal shapes.')
else: # dense array
index_1st.append(np.asarray(idx))
prelim_ndim += 1
ellip_slices = (self.ndim - prelim_ndim) * [slice(None)]
if ellip_slices:
if ellps_pos is None:
index_1st.extend(ellip_slices)
else:
index_1st = index_1st[:ellps_pos] + ellip_slices + index_1st[ellps_pos:]
# second pass (have processed ellipsis and preprocessed arrays)
idx_shape = []
index_ndim = 0
index = []
array_indices = []
for i, idx in enumerate(index_1st):
if idx is None:
idx_shape.append(1)
elif isinstance(idx, slice):
index.append(idx)
Ms = self._shape[index_ndim]
len_slice = len(range(*idx.indices(Ms)))
idx_shape.append(len_slice)
index_ndim += 1
elif isintlike(idx):
N = self._shape[index_ndim]
if not (-N <= idx < N):
raise IndexError(f'index ({idx}) out of range')
idx = int(idx + N if idx < 0 else idx)
index.append(idx)
index_ndim += 1
# bool array (checked in first pass)
elif idx.dtype.kind == 'b':
ix = idx
tmp_ndim = index_ndim + ix.ndim
mid_shape = self._shape[index_ndim:tmp_ndim]
if ix.shape != mid_shape:
raise IndexError(
f"bool index {i} has shape {mid_shape} instead of {ix.shape}"
)
index.extend(ix.nonzero())
array_indices.extend(range(index_ndim, tmp_ndim))
index_ndim = tmp_ndim
else: # dense array
N = self._shape[index_ndim]
idx = self._asindices(idx, N)
index.append(idx)
array_indices.append(index_ndim)
index_ndim += 1
if index_ndim > self.ndim:
raise IndexError(
f'invalid index ndim. Array is {self.ndim}D. Index needs {index_ndim}D'
)
if len(array_indices) > 1:
idx_arrays = _broadcast_arrays(*(index[i] for i in array_indices))
if any(idx_arrays[0].shape != ix.shape for ix in idx_arrays[1:]):
shapes = " ".join(str(ix.shape) for ix in idx_arrays)
msg = (f'shape mismatch: indexing arrays could not be broadcast '
f'together with shapes {shapes}')
raise IndexError(msg)
# TODO: handle this for nD (adjacent arrays stay, separated move to start)
idx_shape = list(idx_arrays[0].shape) + idx_shape
elif len(array_indices) == 1:
arr_index = array_indices[0]
arr_shape = list(index[arr_index].shape)
idx_shape = idx_shape[:arr_index] + arr_shape + idx_shape[arr_index:]
if (ndim := len(idx_shape)) > 2:
raise IndexError(f'Only 1D or 2D arrays allowed. Index makes {ndim}D')
return tuple(index), tuple(idx_shape)
def _asindices(self, idx, length):
"""Convert `idx` to a valid index for an axis with a given length.
Subclasses that need special validation can override this method.
"""
try:
x = np.asarray(idx)
except (ValueError, TypeError, MemoryError) as e:
raise IndexError('invalid index') from e
if x.ndim not in (1, 2):
raise IndexError('Index dimension must be 1 or 2')
if x.size == 0:
return x
# Check bounds
max_indx = x.max()
if max_indx >= length:
raise IndexError(f'index ({max_indx}) out of range')
min_indx = x.min()
if min_indx < 0:
if min_indx < -length:
raise IndexError(f'index ({min_indx}) out of range')
if x is idx or not x.flags.owndata:
x = x.copy()
x[x < 0] += length
return x
def _getrow(self, i):
"""Return a copy of row i of the matrix, as a (1 x n) row vector.
"""
M, N = self.shape
i = int(i)
if i < -M or i >= M:
raise IndexError(f'index ({i}) out of range')
if i < 0:
i += M
return self._get_intXslice(i, slice(None))
def _getcol(self, i):
"""Return a copy of column i of the matrix, as a (m x 1) column vector.
"""
M, N = self.shape
i = int(i)
if i < -N or i >= N:
raise IndexError(f'index ({i}) out of range')
if i < 0:
i += N
return self._get_sliceXint(slice(None), i)
def _get_int(self, idx):
raise NotImplementedError()
def _get_slice(self, idx):
raise NotImplementedError()
def _get_array(self, idx):
raise NotImplementedError()
def _get_intXint(self, row, col):
raise NotImplementedError()
def _get_intXarray(self, row, col):
raise NotImplementedError()
def _get_intXslice(self, row, col):
raise NotImplementedError()
def _get_sliceXint(self, row, col):
raise NotImplementedError()
def _get_sliceXslice(self, row, col):
raise NotImplementedError()
def _get_sliceXarray(self, row, col):
raise NotImplementedError()
def _get_arrayXint(self, row, col):
raise NotImplementedError()
def _get_arrayXslice(self, row, col):
raise NotImplementedError()
def _get_columnXarray(self, row, col):
raise NotImplementedError()
def _get_arrayXarray(self, row, col):
raise NotImplementedError()
def _set_int(self, idx, x):
raise NotImplementedError()
def _set_array(self, idx, x):
raise NotImplementedError()
def _set_intXint(self, row, col, x):
raise NotImplementedError()
def _set_arrayXarray(self, row, col, x):
raise NotImplementedError()
def _set_arrayXarray_sparse(self, row, col, x):
# Fall back to densifying x
x = np.asarray(x.toarray(), dtype=self.dtype)
x, _ = _broadcast_arrays(x, row)
self._set_arrayXarray(row, col, x)
def _compatible_boolean_index(idx, desired_ndim):
"""Check for boolean array or array-like. peek before asarray for array-like"""
# use attribute ndim to indicate a compatible array and check dtype
# if not, look at 1st element as quick rejection of bool, else slower asanyarray
if not hasattr(idx, 'ndim'):
# is first element boolean?
try:
ix = next(iter(idx), None)
for _ in range(desired_ndim):
if isinstance(ix, bool):
break
ix = next(iter(ix), None)
else:
return None
except TypeError:
return None
# since first is boolean, construct array and check all elements
idx = np.asanyarray(idx)
if idx.dtype.kind == 'b':
return idx
return None

View file

@ -0,0 +1,632 @@
"""List of Lists sparse matrix class
"""
__docformat__ = "restructuredtext en"
__all__ = ['lil_array', 'lil_matrix', 'isspmatrix_lil']
from bisect import bisect_left
import numpy as np
from ._matrix import spmatrix
from ._base import _spbase, sparray, issparse
from ._index import IndexMixin, INT_TYPES, _broadcast_arrays
from ._sputils import (getdtype, isshape, isscalarlike, upcast_scalar,
check_shape, check_reshape_kwargs)
from . import _csparsetools
class _lil_base(_spbase, IndexMixin):
_format = 'lil'
def __init__(self, arg1, shape=None, dtype=None, copy=False, *, maxprint=None):
_spbase.__init__(self, arg1, maxprint=maxprint)
self.dtype = getdtype(dtype, arg1, default=float)
# First get the shape
if issparse(arg1):
if arg1.format == "lil" and copy:
A = arg1.copy()
else:
A = arg1.tolil()
if dtype is not None:
newdtype = getdtype(dtype)
A = A.astype(newdtype, copy=False)
self._shape = check_shape(A.shape)
self.dtype = A.dtype
self.rows = A.rows
self.data = A.data
elif isinstance(arg1,tuple):
if isshape(arg1):
if shape is not None:
raise ValueError('invalid use of shape parameter')
M, N = arg1
self._shape = check_shape((M, N))
self.rows = np.empty((M,), dtype=object)
self.data = np.empty((M,), dtype=object)
for i in range(M):
self.rows[i] = []
self.data[i] = []
else:
raise TypeError('unrecognized lil_array constructor usage')
else:
# assume A is dense
try:
A = self._ascontainer(arg1)
except TypeError as e:
raise TypeError('unsupported matrix type') from e
if isinstance(self, sparray) and A.ndim != 2:
raise ValueError(f"LIL arrays don't support {A.ndim}D input. Use 2D")
A = self._csr_container(A, dtype=dtype).tolil()
self._shape = check_shape(A.shape)
self.dtype = getdtype(A.dtype)
self.rows = A.rows
self.data = A.data
def __iadd__(self,other):
self[:,:] = self + other
return self
def __isub__(self,other):
self[:,:] = self - other
return self
def __imul__(self,other):
if isscalarlike(other):
self[:,:] = self * other
return self
else:
return NotImplemented
def __itruediv__(self,other):
if isscalarlike(other):
self[:,:] = self / other
return self
else:
return NotImplemented
# Whenever the dimensions change, empty lists should be created for each
# row
def _getnnz(self, axis=None):
if axis is None:
return sum([len(rowvals) for rowvals in self.data])
if axis < 0:
axis += 2
if axis == 0:
out = np.zeros(self.shape[1], dtype=np.intp)
for row in self.rows:
out[row] += 1
return out
elif axis == 1:
return np.array([len(rowvals) for rowvals in self.data], dtype=np.intp)
else:
raise ValueError('axis out of bounds')
_getnnz.__doc__ = _spbase._getnnz.__doc__
def count_nonzero(self, axis=None):
if axis is None:
return sum(np.count_nonzero(rowvals) for rowvals in self.data)
if axis < 0:
axis += 2
if axis == 0:
out = np.zeros(self.shape[1], dtype=np.intp)
for row, data in zip(self.rows, self.data):
mask = [c for c, d in zip(row, data) if d != 0]
out[mask] += 1
return out
elif axis == 1:
return np.array(
[np.count_nonzero(rowvals) for rowvals in self.data], dtype=np.intp,
)
else:
raise ValueError('axis out of bounds')
count_nonzero.__doc__ = _spbase.count_nonzero.__doc__
def getrowview(self, i):
"""Returns a view of the 'i'th row (without copying).
"""
new = self._lil_container((1, self.shape[1]), dtype=self.dtype)
new.rows[0] = self.rows[i]
new.data[0] = self.data[i]
return new
def getrow(self, i):
"""Returns a copy of the 'i'th row.
"""
M, N = self.shape
if i < 0:
i += M
if i < 0 or i >= M:
raise IndexError('row index out of bounds')
new = self._lil_container((1, N), dtype=self.dtype)
new.rows[0] = self.rows[i][:]
new.data[0] = self.data[i][:]
return new
def __getitem__(self, key):
# Fast path for simple (int, int) indexing.
if (isinstance(key, tuple) and len(key) == 2 and
isinstance(key[0], INT_TYPES) and
isinstance(key[1], INT_TYPES)):
# lil_get1 handles validation for us.
return self._get_intXint(*key)
# Everything else takes the normal path.
return IndexMixin.__getitem__(self, key)
def _asindices(self, idx, N):
# LIL routines handle bounds-checking for us, so don't do it here.
try:
x = np.asarray(idx)
except (ValueError, TypeError, MemoryError) as e:
raise IndexError('invalid index') from e
if x.ndim not in (1, 2):
raise IndexError('Index dimension must be <= 2')
return x
def _get_intXint(self, row, col):
v = _csparsetools.lil_get1(self.shape[0], self.shape[1], self.rows,
self.data, row, col)
return self.dtype.type(v)
def _get_sliceXint(self, row, col):
row = range(*row.indices(self.shape[0]))
return self._get_row_ranges(row, slice(col, col+1))
def _get_arrayXint(self, row, col):
res = self._get_row_ranges(row.ravel(), slice(col, col+1))
if row.ndim > 1:
return res.reshape(row.shape)
return res
def _get_intXslice(self, row, col):
return self._get_row_ranges((row,), col)
def _get_sliceXslice(self, row, col):
row = range(*row.indices(self.shape[0]))
return self._get_row_ranges(row, col)
def _get_arrayXslice(self, row, col):
return self._get_row_ranges(row, col)
def _get_intXarray(self, row, col):
row = np.array(row, dtype=col.dtype, ndmin=1)
return self._get_columnXarray(row, col)
def _get_sliceXarray(self, row, col):
row = np.arange(*row.indices(self.shape[0]))
return self._get_columnXarray(row, col)
def _get_columnXarray(self, row, col):
# outer indexing
row, col = _broadcast_arrays(row[:,None], col)
return self._get_arrayXarray(row, col)
def _get_arrayXarray(self, row, col):
# inner indexing
i, j = map(np.atleast_2d, _prepare_index_for_memoryview(row, col))
new = self._lil_container(i.shape, dtype=self.dtype)
_csparsetools.lil_fancy_get(self.shape[0], self.shape[1],
self.rows, self.data,
new.rows, new.data,
i, j)
return new
def _get_row_ranges(self, rows, col_slice):
"""
Fast path for indexing in the case where column index is slice.
This gains performance improvement over brute force by more
efficient skipping of zeros, by accessing the elements
column-wise in order.
Parameters
----------
rows : sequence or range
Rows indexed. If range, must be within valid bounds.
col_slice : slice
Columns indexed
"""
j_start, j_stop, j_stride = col_slice.indices(self.shape[1])
col_range = range(j_start, j_stop, j_stride)
nj = len(col_range)
new = self._lil_container((len(rows), nj), dtype=self.dtype)
_csparsetools.lil_get_row_ranges(self.shape[0], self.shape[1],
self.rows, self.data,
new.rows, new.data,
rows,
j_start, j_stop, j_stride, nj)
return new
def _set_intXint(self, row, col, x):
_csparsetools.lil_insert(self.shape[0], self.shape[1], self.rows,
self.data, row, col, x)
def _set_arrayXarray(self, row, col, x):
i, j, x = map(np.atleast_2d, _prepare_index_for_memoryview(row, col, x))
_csparsetools.lil_fancy_set(self.shape[0], self.shape[1],
self.rows, self.data,
i, j, x)
def _set_arrayXarray_sparse(self, row, col, x):
# Fall back to densifying x
x = np.asarray(x.toarray(), dtype=self.dtype)
x, _ = _broadcast_arrays(x, row)
self._set_arrayXarray(row, col, x)
def __setitem__(self, key, x):
if isinstance(key, tuple) and len(key) == 2:
row, col = key
# Fast path for simple (int, int) indexing.
if isinstance(row, INT_TYPES) and isinstance(col, INT_TYPES):
x = self.dtype.type(x)
if x.size > 1:
raise ValueError("Trying to assign a sequence to an item")
return self._set_intXint(row, col, x)
# Fast path for full-matrix sparse assignment.
if (isinstance(row, slice) and isinstance(col, slice) and
row == slice(None) and col == slice(None) and
issparse(x) and x.shape == self.shape):
x = self._lil_container(x, dtype=self.dtype)
self.rows = x.rows
self.data = x.data
return
# Everything else takes the normal path.
IndexMixin.__setitem__(self, key, x)
def _mul_scalar(self, other):
if other == 0:
# Multiply by zero: return the zero matrix
new = self._lil_container(self.shape, dtype=self.dtype)
else:
res_dtype = upcast_scalar(self.dtype, other)
new = self.copy()
new = new.astype(res_dtype)
# Multiply this scalar by every element.
for j, rowvals in enumerate(new.data):
new.data[j] = [val*other for val in rowvals]
return new
def __truediv__(self, other): # self / other
if isscalarlike(other):
new = self.copy()
new.dtype = np.result_type(self, other)
# Divide every element by this scalar
for j, rowvals in enumerate(new.data):
new.data[j] = [val/other for val in rowvals]
return new
else:
return self.tocsr() / other
def copy(self):
M, N = self.shape
new = self._lil_container(self.shape, dtype=self.dtype)
# This is ~14x faster than calling deepcopy() on rows and data.
_csparsetools.lil_get_row_ranges(M, N, self.rows, self.data,
new.rows, new.data, range(M),
0, N, 1, N)
return new
copy.__doc__ = _spbase.copy.__doc__
def reshape(self, *args, **kwargs):
shape = check_shape(args, self.shape)
order, copy = check_reshape_kwargs(kwargs)
# Return early if reshape is not required
if shape == self.shape:
if copy:
return self.copy()
else:
return self
new = self._lil_container(shape, dtype=self.dtype)
if order == 'C':
ncols = self.shape[1]
for i, row in enumerate(self.rows):
for col, j in enumerate(row):
new_r, new_c = np.unravel_index(i * ncols + j, shape)
new[new_r, new_c] = self[i, j]
elif order == 'F':
nrows = self.shape[0]
for i, row in enumerate(self.rows):
for col, j in enumerate(row):
new_r, new_c = np.unravel_index(i + j * nrows, shape, order)
new[new_r, new_c] = self[i, j]
else:
raise ValueError("'order' must be 'C' or 'F'")
return new
reshape.__doc__ = _spbase.reshape.__doc__
def resize(self, *shape):
shape = check_shape(shape)
new_M, new_N = shape
M, N = self.shape
if new_M < M:
self.rows = self.rows[:new_M]
self.data = self.data[:new_M]
elif new_M > M:
self.rows = np.resize(self.rows, new_M)
self.data = np.resize(self.data, new_M)
for i in range(M, new_M):
self.rows[i] = []
self.data[i] = []
if new_N < N:
for row, data in zip(self.rows, self.data):
trunc = bisect_left(row, new_N)
del row[trunc:]
del data[trunc:]
self._shape = shape
resize.__doc__ = _spbase.resize.__doc__
def toarray(self, order=None, out=None):
d = self._process_toarray_args(order, out)
for i, row in enumerate(self.rows):
for pos, j in enumerate(row):
d[i, j] = self.data[i][pos]
return d
toarray.__doc__ = _spbase.toarray.__doc__
def transpose(self, axes=None, copy=False):
return self.tocsr(copy=copy).transpose(axes=axes, copy=False).tolil(copy=False)
transpose.__doc__ = _spbase.transpose.__doc__
def tolil(self, copy=False):
if copy:
return self.copy()
else:
return self
tolil.__doc__ = _spbase.tolil.__doc__
def tocsr(self, copy=False):
M, N = self.shape
if M == 0 or N == 0:
return self._csr_container((M, N), dtype=self.dtype)
# construct indptr array
if M*N <= np.iinfo(np.int32).max:
# fast path: it is known that 64-bit indexing will not be needed.
idx_dtype = np.int32
indptr = np.empty(M + 1, dtype=idx_dtype)
indptr[0] = 0
_csparsetools.lil_get_lengths(self.rows, indptr[1:])
np.cumsum(indptr, out=indptr)
nnz = indptr[-1]
else:
idx_dtype = self._get_index_dtype(maxval=N)
lengths = np.empty(M, dtype=idx_dtype)
_csparsetools.lil_get_lengths(self.rows, lengths)
nnz = lengths.sum(dtype=np.int64)
idx_dtype = self._get_index_dtype(maxval=max(N, nnz))
indptr = np.empty(M + 1, dtype=idx_dtype)
indptr[0] = 0
np.cumsum(lengths, dtype=idx_dtype, out=indptr[1:])
indices = np.empty(nnz, dtype=idx_dtype)
data = np.empty(nnz, dtype=self.dtype)
_csparsetools.lil_flatten_to_array(self.rows, indices)
_csparsetools.lil_flatten_to_array(self.data, data)
# init csr matrix
return self._csr_container((data, indices, indptr), shape=self.shape)
tocsr.__doc__ = _spbase.tocsr.__doc__
def _prepare_index_for_memoryview(i, j, x=None):
"""
Convert index and data arrays to form suitable for passing to the
Cython fancy getset routines.
The conversions are necessary since to (i) ensure the integer
index arrays are in one of the accepted types, and (ii) to ensure
the arrays are writable so that Cython memoryview support doesn't
choke on them.
Parameters
----------
i, j
Index arrays
x : optional
Data arrays
Returns
-------
i, j, x
Re-formatted arrays (x is omitted, if input was None)
"""
if i.dtype > j.dtype:
j = j.astype(i.dtype)
elif i.dtype < j.dtype:
i = i.astype(j.dtype)
if not i.flags.writeable or i.dtype not in (np.int32, np.int64):
i = i.astype(np.intp)
if not j.flags.writeable or j.dtype not in (np.int32, np.int64):
j = j.astype(np.intp)
if x is not None:
if not x.flags.writeable:
x = x.copy()
return i, j, x
else:
return i, j
def isspmatrix_lil(x):
"""Is `x` of lil_matrix type?
Parameters
----------
x
object to check for being a lil matrix
Returns
-------
bool
True if `x` is a lil matrix, False otherwise
Examples
--------
>>> from scipy.sparse import lil_array, lil_matrix, coo_matrix, isspmatrix_lil
>>> isspmatrix_lil(lil_matrix([[5]]))
True
>>> isspmatrix_lil(lil_array([[5]]))
False
>>> isspmatrix_lil(coo_matrix([[5]]))
False
"""
return isinstance(x, lil_matrix)
# This namespace class separates array from matrix with isinstance
class lil_array(_lil_base, sparray):
"""
Row-based LIst of Lists sparse array.
This is a structure for constructing sparse arrays incrementally.
Note that inserting a single item can take linear time in the worst case;
to construct the array efficiently, make sure the items are pre-sorted by
index, per row.
This can be instantiated in several ways:
lil_array(D)
where D is a 2-D ndarray
lil_array(S)
with another sparse array or matrix S (equivalent to S.tolil())
lil_array((M, N), [dtype])
to construct an empty array with shape (M, N)
dtype is optional, defaulting to dtype='d'.
Attributes
----------
dtype : dtype
Data type of the array
shape : 2-tuple
Shape of the array
ndim : int
Number of dimensions (this is always 2)
nnz
size
data
LIL format data array of the array
rows
LIL format row index array of the array
T
Notes
-----
Sparse arrays can be used in arithmetic operations: they support
addition, subtraction, multiplication, division, and matrix power.
Advantages of the LIL format
- supports flexible slicing
- changes to the array sparsity structure are efficient
Disadvantages of the LIL format
- arithmetic operations LIL + LIL are slow (consider CSR or CSC)
- slow column slicing (consider CSC)
- slow matrix vector products (consider CSR or CSC)
Intended Usage
- LIL is a convenient format for constructing sparse arrays
- once an array has been constructed, convert to CSR or
CSC format for fast arithmetic and matrix vector operations
- consider using the COO format when constructing large arrays
Data Structure
- An array (``self.rows``) of rows, each of which is a sorted
list of column indices of non-zero elements.
- The corresponding nonzero values are stored in similar
fashion in ``self.data``.
"""
class lil_matrix(spmatrix, _lil_base):
"""
Row-based LIst of Lists sparse matrix.
This is a structure for constructing sparse matrices incrementally.
Note that inserting a single item can take linear time in the worst case;
to construct the matrix efficiently, make sure the items are pre-sorted by
index, per row.
This can be instantiated in several ways:
lil_matrix(D)
where D is a 2-D ndarray
lil_matrix(S)
with another sparse array or matrix S (equivalent to S.tolil())
lil_matrix((M, N), [dtype])
to construct an empty matrix with shape (M, N)
dtype is optional, defaulting to dtype='d'.
Attributes
----------
dtype : dtype
Data type of the matrix
shape : 2-tuple
Shape of the matrix
ndim : int
Number of dimensions (this is always 2)
nnz
size
data
LIL format data array of the matrix
rows
LIL format row index array of the matrix
T
Notes
-----
Sparse matrices can be used in arithmetic operations: they support
addition, subtraction, multiplication, division, and matrix power.
Advantages of the LIL format
- supports flexible slicing
- changes to the matrix sparsity structure are efficient
Disadvantages of the LIL format
- arithmetic operations LIL + LIL are slow (consider CSR or CSC)
- slow column slicing (consider CSC)
- slow matrix vector products (consider CSR or CSC)
Intended Usage
- LIL is a convenient format for constructing sparse matrices
- once a matrix has been constructed, convert to CSR or
CSC format for fast arithmetic and matrix vector operations
- consider using the COO format when constructing large matrices
Data Structure
- An array (``self.rows``) of rows, each of which is a sorted
list of column indices of non-zero elements.
- The corresponding nonzero values are stored in similar
fashion in ``self.data``.
"""

View file

@ -0,0 +1,169 @@
class spmatrix:
"""This class provides a base class for all sparse matrix classes.
It cannot be instantiated. Most of the work is provided by subclasses.
"""
_allow_nd = (2,)
@property
def _bsr_container(self):
from ._bsr import bsr_matrix
return bsr_matrix
@property
def _coo_container(self):
from ._coo import coo_matrix
return coo_matrix
@property
def _csc_container(self):
from ._csc import csc_matrix
return csc_matrix
@property
def _csr_container(self):
from ._csr import csr_matrix
return csr_matrix
@property
def _dia_container(self):
from ._dia import dia_matrix
return dia_matrix
@property
def _dok_container(self):
from ._dok import dok_matrix
return dok_matrix
@property
def _lil_container(self):
from ._lil import lil_matrix
return lil_matrix
# Restore matrix multiplication
def __mul__(self, other):
return self._matmul_dispatch(other)
def __rmul__(self, other):
return self._rmatmul_dispatch(other)
# Restore matrix power
def __pow__(self, power):
from .linalg import matrix_power
return matrix_power(self, power)
## Backward compatibility
def set_shape(self, shape):
"""Set the shape of the matrix in-place"""
# Make sure copy is False since this is in place
# Make sure format is unchanged because we are doing a __dict__ swap
new_self = self.reshape(shape, copy=False).asformat(self.format)
self.__dict__ = new_self.__dict__
def get_shape(self):
"""Get the shape of the matrix"""
return self._shape
shape = property(fget=get_shape, fset=set_shape,
doc="Shape of the matrix")
def asfptype(self):
"""Upcast matrix to a floating point format (if necessary)"""
return self._asfptype()
def getmaxprint(self):
"""Maximum number of elements to display when printed."""
return self._getmaxprint()
def getformat(self):
"""Matrix storage format"""
return self.format
def getnnz(self, axis=None):
"""Number of stored values, including explicit zeros.
Parameters
----------
axis : None, 0, or 1
Select between the number of values across the whole array, in
each column, or in each row.
"""
return self._getnnz(axis=axis)
def getH(self):
"""Return the Hermitian transpose of this matrix.
See Also
--------
numpy.matrix.getH : NumPy's implementation of `getH` for matrices
"""
return self.conjugate().transpose()
def getcol(self, j):
"""Returns a copy of column j of the matrix, as an (m x 1) sparse
matrix (column vector).
"""
return self._getcol(j)
def getrow(self, i):
"""Returns a copy of row i of the matrix, as a (1 x n) sparse
matrix (row vector).
"""
return self._getrow(i)
def todense(self, order=None, out=None):
"""
Return a dense representation of this sparse matrix.
Parameters
----------
order : {'C', 'F'}, optional
Whether to store multi-dimensional data in C (row-major)
or Fortran (column-major) order in memory. The default
is 'None', which provides no ordering guarantees.
Cannot be specified in conjunction with the `out`
argument.
out : ndarray, 2-D, optional
If specified, uses this array (or `numpy.matrix`) as the
output buffer instead of allocating a new array to
return. The provided array must have the same shape and
dtype as the sparse matrix on which you are calling the
method.
Returns
-------
arr : numpy.matrix, 2-D
A NumPy matrix object with the same shape and containing
the same data represented by the sparse matrix, with the
requested memory order. If `out` was passed and was an
array (rather than a `numpy.matrix`), it will be filled
with the appropriate values and returned wrapped in a
`numpy.matrix` object that shares the same memory.
"""
return super().todense(order, out)
@classmethod
def __class_getitem__(cls, arg, /):
"""
Return a parametrized wrapper around the `~scipy.sparse.spmatrix` type.
.. versionadded:: 1.16.0
Returns
-------
alias : types.GenericAlias
A parametrized `~scipy.sparse.spmatrix` type.
Examples
--------
>>> import numpy as np
>>> from scipy.sparse import coo_matrix
>>> coo_matrix[np.int8]
scipy.sparse._coo.coo_matrix[numpy.int8]
"""
from types import GenericAlias
return GenericAlias(cls, arg)

View file

@ -0,0 +1,167 @@
import numpy as np
import scipy as sp
__all__ = ['save_npz', 'load_npz']
# Make loading safe vs. malicious input
PICKLE_KWARGS = dict(allow_pickle=False)
def save_npz(file, matrix, compressed=True):
""" Save a sparse matrix or array to a file using ``.npz`` format.
Parameters
----------
file : str or file-like object
Either the file name (string) or an open file (file-like object)
where the data will be saved. If file is a string, the ``.npz``
extension will be appended to the file name if it is not already
there.
matrix: spmatrix or sparray
The sparse matrix or array to save.
Supported formats: ``csc``, ``csr``, ``bsr``, ``dia`` or ``coo``.
compressed : bool, optional
Allow compressing the file. Default: True
See Also
--------
scipy.sparse.load_npz: Load a sparse matrix from a file using ``.npz`` format.
numpy.savez: Save several arrays into a ``.npz`` archive.
numpy.savez_compressed : Save several arrays into a compressed ``.npz`` archive.
Examples
--------
Store sparse matrix to disk, and load it again:
>>> import numpy as np
>>> import scipy as sp
>>> sparse_matrix = sp.sparse.csc_matrix([[0, 0, 3], [4, 0, 0]])
>>> sparse_matrix
<Compressed Sparse Column sparse matrix of dtype 'int64'
with 2 stored elements and shape (2, 3)>
>>> sparse_matrix.toarray()
array([[0, 0, 3],
[4, 0, 0]], dtype=int64)
>>> sp.sparse.save_npz('/tmp/sparse_matrix.npz', sparse_matrix)
>>> sparse_matrix = sp.sparse.load_npz('/tmp/sparse_matrix.npz')
>>> sparse_matrix
<Compressed Sparse Column sparse matrix of dtype 'int64'
with 2 stored elements and shape (2, 3)>
>>> sparse_matrix.toarray()
array([[0, 0, 3],
[4, 0, 0]], dtype=int64)
"""
arrays_dict = {}
if matrix.format in ('csc', 'csr', 'bsr'):
arrays_dict.update(indices=matrix.indices, indptr=matrix.indptr)
elif matrix.format == 'dia':
arrays_dict.update(offsets=matrix.offsets)
elif matrix.format == 'coo':
arrays_dict.update(row=matrix.row, col=matrix.col)
else:
msg = f'Save is not implemented for sparse matrix of format {matrix.format}.'
raise NotImplementedError(msg)
arrays_dict.update(
format=matrix.format.encode('ascii'),
shape=matrix.shape,
data=matrix.data
)
if isinstance(matrix, sp.sparse.sparray):
arrays_dict.update(_is_array=True)
if compressed:
np.savez_compressed(file, **arrays_dict)
else:
np.savez(file, **arrays_dict)
def load_npz(file):
""" Load a sparse array/matrix from a file using ``.npz`` format.
Parameters
----------
file : str or file-like object
Either the file name (string) or an open file (file-like object)
where the data will be loaded.
Returns
-------
result : csc_array, csr_array, bsr_array, dia_array or coo_array
A sparse array/matrix containing the loaded data.
Raises
------
OSError
If the input file does not exist or cannot be read.
See Also
--------
scipy.sparse.save_npz: Save a sparse array/matrix to a file using ``.npz`` format.
numpy.load: Load several arrays from a ``.npz`` archive.
Examples
--------
Store sparse array/matrix to disk, and load it again:
>>> import numpy as np
>>> import scipy as sp
>>> sparse_array = sp.sparse.csc_array([[0, 0, 3], [4, 0, 0]])
>>> sparse_array
<Compressed Sparse Column sparse array of dtype 'int64'
with 2 stored elements and shape (2, 3)>
>>> sparse_array.toarray()
array([[0, 0, 3],
[4, 0, 0]], dtype=int64)
>>> sp.sparse.save_npz('/tmp/sparse_array.npz', sparse_array)
>>> sparse_array = sp.sparse.load_npz('/tmp/sparse_array.npz')
>>> sparse_array
<Compressed Sparse Column sparse array of dtype 'int64'
with 2 stored elements and shape (2, 3)>
>>> sparse_array.toarray()
array([[0, 0, 3],
[4, 0, 0]], dtype=int64)
In this example we force the result to be csr_array from csr_matrix
>>> sparse_matrix = sp.sparse.csc_matrix([[0, 0, 3], [4, 0, 0]])
>>> sp.sparse.save_npz('/tmp/sparse_matrix.npz', sparse_matrix)
>>> tmp = sp.sparse.load_npz('/tmp/sparse_matrix.npz')
>>> sparse_array = sp.sparse.csr_array(tmp)
"""
with np.load(file, **PICKLE_KWARGS) as loaded:
sparse_format = loaded.get('format')
if sparse_format is None:
raise ValueError(f'The file {file} does not contain '
f'a sparse array or matrix.')
sparse_format = sparse_format.item()
if not isinstance(sparse_format, str):
# Play safe with Python 2 vs 3 backward compatibility;
# files saved with SciPy < 1.0.0 may contain unicode or bytes.
sparse_format = sparse_format.decode('ascii')
if loaded.get('_is_array'):
sparse_type = sparse_format + '_array'
else:
sparse_type = sparse_format + '_matrix'
try:
cls = getattr(sp.sparse, f'{sparse_type}')
except AttributeError as e:
raise ValueError(f'Unknown format "{sparse_type}"') from e
if sparse_format in ('csc', 'csr', 'bsr'):
return cls((loaded['data'], loaded['indices'], loaded['indptr']),
shape=loaded['shape'])
elif sparse_format == 'dia':
return cls((loaded['data'], loaded['offsets']),
shape=loaded['shape'])
elif sparse_format == 'coo':
return cls((loaded['data'], (loaded['row'], loaded['col'])),
shape=loaded['shape'])
else:
raise NotImplementedError(f'Load is not implemented for '
f'sparse matrix of format {sparse_format}.')

View file

@ -0,0 +1,76 @@
""" Functions that operate on sparse matrices
"""
__all__ = ['count_blocks','estimate_blocksize']
from ._base import issparse
from ._csr import csr_array
from ._sparsetools import csr_count_blocks
def estimate_blocksize(A,efficiency=0.7):
"""Attempt to determine the blocksize of a sparse matrix
Returns a blocksize=(r,c) such that
- A.nnz / A.tobsr( (r,c) ).nnz > efficiency
"""
if not (issparse(A) and A.format in ("csc", "csr")):
A = csr_array(A)
if A.nnz == 0:
return (1,1)
if not 0 < efficiency < 1.0:
raise ValueError('efficiency must satisfy 0.0 < efficiency < 1.0')
high_efficiency = (1.0 + efficiency) / 2.0
nnz = float(A.nnz)
M,N = A.shape
if M % 2 == 0 and N % 2 == 0:
e22 = nnz / (4 * count_blocks(A,(2,2)))
else:
e22 = 0.0
if M % 3 == 0 and N % 3 == 0:
e33 = nnz / (9 * count_blocks(A,(3,3)))
else:
e33 = 0.0
if e22 > high_efficiency and e33 > high_efficiency:
e66 = nnz / (36 * count_blocks(A,(6,6)))
if e66 > efficiency:
return (6,6)
else:
return (3,3)
else:
if M % 4 == 0 and N % 4 == 0:
e44 = nnz / (16 * count_blocks(A,(4,4)))
else:
e44 = 0.0
if e44 > efficiency:
return (4,4)
elif e33 > efficiency:
return (3,3)
elif e22 > efficiency:
return (2,2)
else:
return (1,1)
def count_blocks(A,blocksize):
"""For a given blocksize=(r,c) count the number of occupied
blocks in a sparse matrix A
"""
r,c = blocksize
if r < 1 or c < 1:
raise ValueError('r and c must be positive')
if issparse(A):
if A.format == "csr":
M,N = A.shape
return csr_count_blocks(M,N,r,c,A.indptr,A.indices)
elif A.format == "csc":
return count_blocks(A.T,(c,r))
return count_blocks(csr_array(A),blocksize)

View file

@ -0,0 +1,632 @@
""" Utility functions for sparse matrix module
"""
import sys
from typing import Any, Literal, Union
import operator
import numpy as np
from math import prod
import scipy.sparse as sp
from scipy._lib._util import np_long, np_ulong
__all__ = ['upcast', 'getdtype', 'getdata', 'isscalarlike', 'isintlike',
'isshape', 'issequence', 'isdense', 'ismatrix', 'get_sum_dtype',
'broadcast_shapes']
supported_dtypes = [np.bool_, np.byte, np.ubyte, np.short, np.ushort, np.intc,
np.uintc, np_long, np_ulong, np.longlong, np.ulonglong,
np.float32, np.float64, np.longdouble,
np.complex64, np.complex128, np.clongdouble]
_upcast_memo = {}
def upcast(*args):
"""Returns the nearest supported sparse dtype for the
combination of one or more types.
upcast(t0, t1, ..., tn) -> T where T is a supported dtype
Examples
--------
>>> from scipy.sparse._sputils import upcast
>>> upcast('int32')
<class 'numpy.int32'>
>>> upcast('bool')
<class 'numpy.bool'>
>>> upcast('int32','float32')
<class 'numpy.float64'>
>>> upcast('bool',complex,float)
<class 'numpy.complex128'>
"""
t = _upcast_memo.get(hash(args))
if t is not None:
return t
upcast = np.result_type(*args)
for t in supported_dtypes:
if np.can_cast(upcast, t):
_upcast_memo[hash(args)] = t
return t
raise TypeError(f'no supported conversion for types: {args!r}')
def upcast_char(*args):
"""Same as `upcast` but taking dtype.char as input (faster)."""
t = _upcast_memo.get(args)
if t is not None:
return t
t = upcast(*map(np.dtype, args))
_upcast_memo[args] = t
return t
def upcast_scalar(dtype, scalar):
"""Determine data type for binary operation between an array of
type `dtype` and a scalar.
"""
return (np.array([0], dtype=dtype) * scalar).dtype
def downcast_intp_index(arr):
"""
Down-cast index array to np.intp dtype if it is of a larger dtype.
Raise an error if the array contains a value that is too large for
intp.
"""
if arr.dtype.itemsize > np.dtype(np.intp).itemsize:
if arr.size == 0:
return arr.astype(np.intp)
maxval = arr.max()
minval = arr.min()
if maxval > np.iinfo(np.intp).max or minval < np.iinfo(np.intp).min:
raise ValueError("Cannot deal with arrays with indices larger "
"than the machine maximum address size "
"(e.g. 64-bit indices on 32-bit machine).")
return arr.astype(np.intp)
return arr
def to_native(A):
"""
Ensure that the data type of the NumPy array `A` has native byte order.
`A` must be a NumPy array. If the data type of `A` does not have native
byte order, a copy of `A` with a native byte order is returned. Otherwise
`A` is returned.
"""
dt = A.dtype
if dt.isnative:
# Don't call `asarray()` if A is already native, to avoid unnecessarily
# creating a view of the input array.
return A
return np.asarray(A, dtype=dt.newbyteorder('native'))
def getdtype(dtype, a=None, default=None):
"""Form a supported numpy dtype based on input arguments.
Returns a valid ``numpy.dtype`` from `dtype` if not None,
or else ``a.dtype`` if possible, or else the given `default`
if not None, or else raise a ``TypeError``.
The resulting ``dtype`` must be in ``supported_dtypes``:
bool_, int8, uint8, int16, uint16, int32, uint32,
int64, uint64, longlong, ulonglong, float32, float64,
longdouble, complex64, complex128, clongdouble
"""
if dtype is None:
try:
newdtype = a.dtype
except AttributeError as e:
if default is not None:
newdtype = np.dtype(default)
else:
raise TypeError("could not interpret data type") from e
else:
newdtype = np.dtype(dtype)
if newdtype not in supported_dtypes:
supported_dtypes_fmt = ", ".join(t.__name__ for t in supported_dtypes)
raise ValueError(f"scipy.sparse does not support dtype {newdtype}. "
f"The only supported types are: {supported_dtypes_fmt}.")
return newdtype
def getdata(obj, dtype=None, copy=False) -> np.ndarray:
"""
This is a wrapper of `np.array(obj, dtype=dtype, copy=copy)`
that will generate a warning if the result is an object array.
"""
data = np.array(obj, dtype=dtype, copy=copy)
# Defer to getdtype for checking that the dtype is OK.
# This is called for the validation only; we don't need the return value.
getdtype(data.dtype)
return data
def safely_cast_index_arrays(A, idx_dtype=np.int32, msg=""):
"""Safely cast sparse array indices to `idx_dtype`.
Check the shape of `A` to determine if it is safe to cast its index
arrays to dtype `idx_dtype`. If any dimension in shape is larger than
fits in the dtype, casting is unsafe so raise ``ValueError``.
If safe, cast the index arrays to `idx_dtype` and return the result
without changing the input `A`. The caller can assign results to `A`
attributes if desired or use the recast index arrays directly.
Unless downcasting is needed, the original index arrays are returned.
You can test e.g. ``A.indptr is new_indptr`` to see if downcasting occurred.
.. versionadded:: 1.15.0
Parameters
----------
A : sparse array or matrix
The array for which index arrays should be downcast.
idx_dtype : dtype
Desired dtype. Should be an integer dtype (default: ``np.int32``).
Most of scipy.sparse uses either int64 or int32.
msg : string, optional
A string to be added to the end of the ValueError message
if the array shape is too big to fit in `idx_dtype`.
The error message is ``f"<index> values too large for {msg}"``
It should indicate why the downcasting is needed, e.g. "SuperLU",
and defaults to f"dtype {idx_dtype}".
Returns
-------
idx_arrays : ndarray or tuple of ndarrays
Based on ``A.format``, index arrays are returned after casting to `idx_dtype`.
For CSC/CSR, returns ``(indices, indptr)``.
For COO, returns ``coords``.
For DIA, returns ``offsets``.
For BSR, returns ``(indices, indptr)``.
Raises
------
ValueError
If the array has shape that would not fit in the new dtype, or if
the sparse format does not use index arrays.
Examples
--------
>>> import numpy as np
>>> from scipy import sparse
>>> data = [3]
>>> coords = (np.array([3]), np.array([1])) # Note: int64 arrays
>>> A = sparse.coo_array((data, coords))
>>> A.coords[0].dtype
dtype('int64')
>>> # rescast after construction, raising exception if shape too big
>>> coords = sparse.safely_cast_index_arrays(A, np.int32)
>>> A.coords[0] is coords[0] # False if casting is needed
False
>>> A.coords = coords # set the index dtype of A
>>> A.coords[0].dtype
dtype('int32')
"""
if not msg:
msg = f"dtype {idx_dtype}"
# check for safe downcasting
max_value = np.iinfo(idx_dtype).max
if A.format in ("csc", "csr"):
# indptr[-1] is max b/c indptr always sorted
if A.indptr[-1] > max_value:
raise ValueError(f"indptr values too large for {msg}")
# check shape vs dtype
if max(*A.shape) > max_value:
if (A.indices > max_value).any():
raise ValueError(f"indices values too large for {msg}")
indices = A.indices.astype(idx_dtype, copy=False)
indptr = A.indptr.astype(idx_dtype, copy=False)
return indices, indptr
elif A.format == "coo":
if max(*A.shape) > max_value:
if any((co > max_value).any() for co in A.coords):
raise ValueError(f"coords values too large for {msg}")
return tuple(co.astype(idx_dtype, copy=False) for co in A.coords)
elif A.format == "dia":
if max(*A.shape) > max_value:
if (A.offsets > max_value).any():
raise ValueError(f"offsets values too large for {msg}")
offsets = A.offsets.astype(idx_dtype, copy=False)
return offsets
elif A.format == 'bsr':
R, C = A.blocksize
if A.indptr[-1] * R > max_value:
raise ValueError("indptr values too large for {msg}")
if max(*A.shape) > max_value:
if (A.indices * C > max_value).any():
raise ValueError(f"indices values too large for {msg}")
indices = A.indices.astype(idx_dtype, copy=False)
indptr = A.indptr.astype(idx_dtype, copy=False)
return indices, indptr
else:
raise TypeError(f'Format {A.format} is not associated with index arrays. '
'DOK and LIL have dict and list, not array.')
def get_index_dtype(arrays=(), maxval=None, check_contents=False):
"""
Based on input (integer) arrays `a`, determine a suitable index data
type that can hold the data in the arrays.
Parameters
----------
arrays : tuple of array_like
Input arrays whose types/contents to check
maxval : float, optional
Maximum value needed
check_contents : bool, optional
Whether to check the values in the arrays and not just their types.
Default: False (check only the types)
Returns
-------
dtype : dtype
Suitable index data type (int32 or int64)
Examples
--------
>>> import numpy as np
>>> from scipy import sparse
>>> # select index dtype based on shape
>>> shape = (3, 3)
>>> idx_dtype = sparse.get_index_dtype(maxval=max(shape))
>>> data = [1.1, 3.0, 1.5]
>>> indices = np.array([0, 1, 0], dtype=idx_dtype)
>>> indptr = np.array([0, 2, 3, 3], dtype=idx_dtype)
>>> A = sparse.csr_array((data, indices, indptr), shape=shape)
>>> A.indptr.dtype
dtype('int32')
>>> # select based on larger of existing arrays and shape
>>> shape = (3, 3)
>>> idx_dtype = sparse.get_index_dtype(A.indptr, maxval=max(shape))
>>> idx_dtype
<class 'numpy.int32'>
"""
# not using intc directly due to misinteractions with pythran
if np.intc().itemsize != 4:
return np.int64
int32min = np.int32(np.iinfo(np.int32).min)
int32max = np.int32(np.iinfo(np.int32).max)
if maxval is not None:
maxval = np.int64(maxval)
if maxval > int32max:
return np.int64
if isinstance(arrays, np.ndarray):
arrays = (arrays,)
for arr in arrays:
arr = np.asarray(arr)
if not np.can_cast(arr.dtype, np.int32):
if check_contents:
if arr.size == 0:
# a bigger type not needed
continue
elif np.issubdtype(arr.dtype, np.integer):
maxval = arr.max()
minval = arr.min()
if minval >= int32min and maxval <= int32max:
# a bigger type not needed
continue
return np.int64
return np.int32
def get_sum_dtype(dtype: np.dtype) -> np.dtype:
"""Mimic numpy's casting for np.sum"""
if dtype.kind == 'u' and np.can_cast(dtype, np.uint):
return np.uint
if np.can_cast(dtype, np.int_):
return np.int_
return dtype
def isscalarlike(x) -> bool:
"""Is x either a scalar, an array scalar, or a 0-dim array?"""
return np.isscalar(x) or (isdense(x) and x.ndim == 0)
def isintlike(x) -> bool:
"""Is x appropriate as an index into a sparse matrix? Returns True
if it can be cast safely to a machine int.
"""
# Fast-path check to eliminate non-scalar values. operator.index would
# catch this case too, but the exception catching is slow.
if np.ndim(x) != 0:
return False
try:
operator.index(x)
except (TypeError, ValueError):
try:
loose_int = bool(int(x) == x)
except (TypeError, ValueError):
return False
if loose_int:
msg = "Inexact indices into sparse matrices are not allowed"
raise ValueError(msg)
return loose_int
return True
def isshape(x, nonneg=False, *, allow_nd=(2,)) -> bool:
"""Is x a valid tuple of dimensions?
If nonneg, also checks that the dimensions are non-negative.
Shapes of length in the tuple allow_nd are allowed.
"""
ndim = len(x)
if ndim not in allow_nd:
return False
for d in x:
if not isintlike(d):
return False
if nonneg and d < 0:
return False
return True
def issequence(t) -> bool:
return ((isinstance(t, list | tuple) and
(len(t) == 0 or np.isscalar(t[0]))) or
(isinstance(t, np.ndarray) and (t.ndim == 1)))
def ismatrix(t) -> bool:
return ((isinstance(t, list | tuple) and
len(t) > 0 and issequence(t[0])) or
(isinstance(t, np.ndarray) and t.ndim == 2))
def isdense(x) -> bool:
return isinstance(x, np.ndarray)
def validateaxis(axis, *, ndim=2) -> tuple[int, ...] | None:
if axis is None:
return None
if axis == ():
raise ValueError(
"sparse does not accept 0D axis (). Either use toarray (for dense) "
"or copy (for sparse)."
)
if not isinstance(axis, tuple):
# If not a tuple, check that the provided axis is actually
# an integer and raise a TypeError similar to NumPy's
if not np.issubdtype(np.dtype(type(axis)), np.integer):
raise TypeError(f'axis must be an integer/tuple of ints, not {type(axis)}')
axis = (axis,)
canon_axis = []
for ax in axis:
if not isintlike(ax):
raise TypeError(f"axis must be an integer. (given {ax})")
if ax < 0:
ax += ndim
if ax < 0 or ax >= ndim:
raise ValueError("axis out of range for ndim")
canon_axis.append(ax)
len_axis = len(canon_axis)
if len_axis != len(set(canon_axis)):
raise ValueError("duplicate value in axis")
elif len_axis > ndim:
raise ValueError("axis tuple has too many elements")
elif len_axis == ndim:
return None
else:
return tuple(canon_axis)
def check_shape(args, current_shape=None, *, allow_nd=(2,)) -> tuple[int, ...]:
"""Imitate numpy.matrix handling of shape arguments
Parameters
----------
args : array_like
Data structures providing information about the shape of the sparse array.
current_shape : tuple, optional
The current shape of the sparse array or matrix.
If None (default), the current shape will be inferred from args.
allow_nd : tuple of ints, optional default: (2,)
If shape does not have a length in the tuple allow_nd an error is raised.
Returns
-------
new_shape: tuple
The new shape after validation.
"""
if len(args) == 0:
raise TypeError("function missing 1 required positional argument: 'shape'")
if len(args) == 1:
try:
shape_iter = iter(args[0])
except TypeError:
new_shape = (operator.index(args[0]), )
else:
new_shape = tuple(operator.index(arg) for arg in shape_iter)
else:
new_shape = tuple(operator.index(arg) for arg in args)
if current_shape is None:
if len(new_shape) not in allow_nd:
raise ValueError(f'shape must have length in {allow_nd}. Got {new_shape=}')
if any(d < 0 for d in new_shape):
raise ValueError("'shape' elements cannot be negative")
else:
# Check the current size only if needed
current_size = prod(current_shape)
# Check for negatives
negative_indexes = [i for i, x in enumerate(new_shape) if x < 0]
if not negative_indexes:
new_size = prod(new_shape)
if new_size != current_size:
raise ValueError(f'cannot reshape array of size {current_size}'
f' into shape {new_shape}')
elif len(negative_indexes) == 1:
skip = negative_indexes[0]
specified = prod(new_shape[:skip] + new_shape[skip+1:])
unspecified, remainder = divmod(current_size, specified)
if remainder != 0:
err_shape = tuple('newshape' if x < 0 else x for x in new_shape)
raise ValueError(f'cannot reshape array of size {current_size}'
f' into shape {err_shape}')
new_shape = new_shape[:skip] + (unspecified,) + new_shape[skip+1:]
else:
raise ValueError('can only specify one unknown dimension')
if len(new_shape) not in allow_nd:
raise ValueError(f'shape must have length in {allow_nd}. Got {new_shape=}')
return new_shape
def broadcast_shapes(*shapes):
"""Check if shapes can be broadcast and return resulting shape
This is similar to the NumPy ``broadcast_shapes`` function but
does not check memory consequences of the resulting dense matrix.
Parameters
----------
*shapes : tuple of shape tuples
The tuple of shapes to be considered for broadcasting.
Shapes should be tuples of non-negative integers.
Returns
-------
new_shape : tuple of integers
The shape that results from broadcasting th input shapes.
"""
if not shapes:
return ()
shapes = [shp if isinstance(shp, tuple | list) else (shp,) for shp in shapes]
big_shp = max(shapes, key=len)
out = list(big_shp)
for shp in shapes:
if shp is big_shp:
continue
for i, x in enumerate(shp, start=-len(shp)):
if x != 1 and x != out[i]:
if out[i] != 1:
raise ValueError("shapes cannot be broadcast to a single shape.")
out[i] = x
return (*out,)
def check_reshape_kwargs(kwargs):
"""Unpack keyword arguments for reshape function.
This is useful because keyword arguments after star arguments are not
allowed in Python 2, but star keyword arguments are. This function unpacks
'order' and 'copy' from the star keyword arguments (with defaults) and
throws an error for any remaining.
"""
order = kwargs.pop('order', 'C')
copy = kwargs.pop('copy', False)
if kwargs: # Some unused kwargs remain
raise TypeError("reshape() got unexpected keywords arguments: "
f"{', '.join(kwargs.keys())}")
return order, copy
def is_pydata_spmatrix(m) -> bool:
"""
Check whether object is pydata/sparse matrix, avoiding importing the module.
"""
base_cls = getattr(sys.modules.get('sparse'), 'SparseArray', None)
return base_cls is not None and isinstance(m, base_cls)
def convert_pydata_sparse_to_scipy(
arg: Any,
target_format: None | Literal["csc", "csr"] = None,
accept_fv: Any = None,
) -> Union[Any, "sp.spmatrix"]:
"""
Convert a pydata/sparse array to scipy sparse matrix,
pass through anything else.
"""
if is_pydata_spmatrix(arg):
# The `accept_fv` keyword is new in PyData Sparse 0.15.4 (May 2024),
# remove the `except` once the minimum supported version is >=0.15.4
try:
arg = arg.to_scipy_sparse(accept_fv=accept_fv)
except TypeError:
arg = arg.to_scipy_sparse()
if target_format is not None:
arg = arg.asformat(target_format)
elif arg.format not in ("csc", "csr"):
arg = arg.tocsc()
return arg
###############################################################################
# Wrappers for NumPy types that are deprecated
# Numpy versions of these functions raise deprecation warnings, the
# ones below do not.
def matrix(*args, **kwargs):
return np.array(*args, **kwargs).view(np.matrix)
def asmatrix(data, dtype=None):
if isinstance(data, np.matrix) and (dtype is None or data.dtype == dtype):
return data
return np.asarray(data, dtype=dtype).view(np.matrix)
###############################################################################
def _todata(s) -> np.ndarray:
"""Access nonzero values, possibly after summing duplicates.
Parameters
----------
s : sparse array
Input sparse array.
Returns
-------
data: ndarray
Nonzero values of the array, with shape (s.nnz,)
"""
if isinstance(s, sp._data._data_matrix):
return s._deduped_data()
if isinstance(s, sp.dok_array):
return np.fromiter(s.values(), dtype=s.dtype, count=s.nnz)
if isinstance(s, sp.lil_array):
data = np.empty(s.nnz, dtype=s.dtype)
sp._csparsetools.lil_flatten_to_array(s.data, data)
return data
return s.tocoo()._deduped_data()

View file

@ -0,0 +1,24 @@
# This file is not meant for public use and will be removed in SciPy v2.0.0.
# Use the `scipy.sparse` namespace for importing the functions
# included below.
from scipy._lib.deprecation import _sub_module_deprecation
__all__ = [ # noqa: F822
'SparseEfficiencyWarning',
'SparseWarning',
'issparse',
'isspmatrix',
'spmatrix',
]
def __dir__():
return __all__
def __getattr__(name):
return _sub_module_deprecation(sub_package="sparse", module="base",
private_modules=["_base"], all=__all__,
attribute=name)

View file

@ -0,0 +1,22 @@
# This file is not meant for public use and will be removed in SciPy v2.0.0.
# Use the `scipy.sparse` namespace for importing the functions
# included below.
from scipy._lib.deprecation import _sub_module_deprecation
__all__ = [ # noqa: F822
'bsr_matrix',
'isspmatrix_bsr',
'spmatrix',
]
def __dir__():
return __all__
def __getattr__(name):
return _sub_module_deprecation(sub_package="sparse", module="bsr",
private_modules=["_bsr"], all=__all__,
attribute=name)

View file

@ -0,0 +1,20 @@
# This file is not meant for public use and will be removed in SciPy v2.0.0.
# Use the `scipy.sparse` namespace for importing the functions
# included below.
from scipy._lib.deprecation import _sub_module_deprecation
__all__ = [ # noqa: F822
'SparseEfficiencyWarning',
]
def __dir__():
return __all__
def __getattr__(name):
return _sub_module_deprecation(sub_package="sparse", module="compressed",
private_modules=["_compressed"], all=__all__,
attribute=name)

View file

@ -0,0 +1,38 @@
# This file is not meant for public use and will be removed in SciPy v2.0.0.
# Use the `scipy.sparse` namespace for importing the functions
# included below.
from scipy._lib.deprecation import _sub_module_deprecation
__all__ = [ # noqa: F822
'block_diag',
'bmat',
'bsr_matrix',
'coo_matrix',
'csc_matrix',
'csr_matrix',
'dia_matrix',
'diags',
'eye',
'get_index_dtype',
'hstack',
'identity',
'issparse',
'kron',
'kronsum',
'rand',
'random',
'spdiags',
'vstack',
]
def __dir__():
return __all__
def __getattr__(name):
return _sub_module_deprecation(sub_package="sparse", module="construct",
private_modules=["_construct"], all=__all__,
attribute=name)

View file

@ -0,0 +1,23 @@
# This file is not meant for public use and will be removed in SciPy v2.0.0.
# Use the `scipy.sparse` namespace for importing the functions
# included below.
from scipy._lib.deprecation import _sub_module_deprecation
__all__ = [ # noqa: F822
'SparseEfficiencyWarning',
'coo_matrix',
'isspmatrix_coo',
'spmatrix',
]
def __dir__():
return __all__
def __getattr__(name):
return _sub_module_deprecation(sub_package="sparse", module="coo",
private_modules=["_coo"], all=__all__,
attribute=name)

View file

@ -0,0 +1,22 @@
# This file is not meant for public use and will be removed in SciPy v2.0.0.
# Use the `scipy.sparse` namespace for importing the functions
# included below.
from scipy._lib.deprecation import _sub_module_deprecation
__all__ = [ # noqa: F822
'csc_matrix',
'isspmatrix_csc',
'spmatrix',
]
def __dir__():
return __all__
def __getattr__(name):
return _sub_module_deprecation(sub_package="sparse", module="csc",
private_modules=["_csc"], all=__all__,
attribute=name)

View file

@ -0,0 +1,210 @@
r"""
Compressed sparse graph routines (:mod:`scipy.sparse.csgraph`)
==============================================================
.. currentmodule:: scipy.sparse.csgraph
Fast graph algorithms based on sparse matrix representations.
Contents
--------
.. autosummary::
:toctree: generated/
connected_components -- determine connected components of a graph
laplacian -- compute the laplacian of a graph
shortest_path -- compute the shortest path between points on a positive graph
dijkstra -- use Dijkstra's algorithm for shortest path
floyd_warshall -- use the Floyd-Warshall algorithm for shortest path
bellman_ford -- use the Bellman-Ford algorithm for shortest path
johnson -- use Johnson's algorithm for shortest path
yen -- use Yen's algorithm for K-shortest paths between to nodes.
breadth_first_order -- compute a breadth-first order of nodes
depth_first_order -- compute a depth-first order of nodes
breadth_first_tree -- construct the breadth-first tree from a given node
depth_first_tree -- construct a depth-first tree from a given node
minimum_spanning_tree -- construct the minimum spanning tree of a graph
reverse_cuthill_mckee -- compute permutation for reverse Cuthill-McKee ordering
maximum_flow -- solve the maximum flow problem for a graph
maximum_bipartite_matching -- compute a maximum matching of a bipartite graph
min_weight_full_bipartite_matching - compute a minimum weight full matching of a bipartite graph
structural_rank -- compute the structural rank of a graph
NegativeCycleError
.. autosummary::
:toctree: generated/
construct_dist_matrix
csgraph_from_dense
csgraph_from_masked
csgraph_masked_from_dense
csgraph_to_dense
csgraph_to_masked
reconstruct_path
Graph Representations
---------------------
This module uses graphs which are stored in a matrix format. A
graph with N nodes can be represented by an (N x N) adjacency matrix G.
If there is a connection from node i to node j, then G[i, j] = w, where
w is the weight of the connection. For nodes i and j which are
not connected, the value depends on the representation:
- for dense array representations, non-edges are represented by
G[i, j] = 0, infinity, or NaN.
- for dense masked representations (of type np.ma.MaskedArray), non-edges
are represented by masked values. This can be useful when graphs with
zero-weight edges are desired.
- for sparse array representations, non-edges are represented by
non-entries in the matrix. This sort of sparse representation also
allows for edges with zero weights.
As a concrete example, imagine that you would like to represent the following
undirected graph::
G
(0)
/ \
1 2
/ \
(2) (1)
This graph has three nodes, where node 0 and 1 are connected by an edge of
weight 2, and nodes 0 and 2 are connected by an edge of weight 1.
We can construct the dense, masked, and sparse representations as follows,
keeping in mind that an undirected graph is represented by a symmetric matrix::
>>> import numpy as np
>>> G_dense = np.array([[0, 2, 1],
... [2, 0, 0],
... [1, 0, 0]])
>>> G_masked = np.ma.masked_values(G_dense, 0)
>>> from scipy.sparse import csr_array
>>> G_sparse = csr_array(G_dense)
This becomes more difficult when zero edges are significant. For example,
consider the situation when we slightly modify the above graph::
G2
(0)
/ \
0 2
/ \
(2) (1)
This is identical to the previous graph, except nodes 0 and 2 are connected
by an edge of zero weight. In this case, the dense representation above
leads to ambiguities: how can non-edges be represented if zero is a meaningful
value? In this case, either a masked or sparse representation must be used
to eliminate the ambiguity::
>>> import numpy as np
>>> G2_data = np.array([[np.inf, 2, 0 ],
... [2, np.inf, np.inf],
... [0, np.inf, np.inf]])
>>> G2_masked = np.ma.masked_invalid(G2_data)
>>> from scipy.sparse.csgraph import csgraph_from_dense
>>> # G2_sparse = csr_array(G2_data) would give the wrong result
>>> G2_sparse = csgraph_from_dense(G2_data, null_value=np.inf)
>>> G2_sparse.data
array([ 2., 0., 2., 0.])
Here we have used a utility routine from the csgraph submodule in order to
convert the dense representation to a sparse representation which can be
understood by the algorithms in submodule. By viewing the data array, we
can see that the zero values are explicitly encoded in the graph.
Directed vs. undirected
^^^^^^^^^^^^^^^^^^^^^^^
Matrices may represent either directed or undirected graphs. This is
specified throughout the csgraph module by a boolean keyword. Graphs are
assumed to be directed by default. In a directed graph, traversal from node
i to node j can be accomplished over the edge G[i, j], but not the edge
G[j, i]. Consider the following dense graph::
>>> import numpy as np
>>> G_dense = np.array([[0, 1, 0],
... [2, 0, 3],
... [0, 4, 0]])
When ``directed=True`` we get the graph::
---1--> ---3-->
(0) (1) (2)
<--2--- <--4---
In a non-directed graph, traversal from node i to node j can be
accomplished over either G[i, j] or G[j, i]. If both edges are not null,
and the two have unequal weights, then the smaller of the two is used.
So for the same graph, when ``directed=False`` we get the graph::
(0)--1--(1)--3--(2)
Note that a symmetric matrix will represent an undirected graph, regardless
of whether the 'directed' keyword is set to True or False. In this case,
using ``directed=True`` generally leads to more efficient computation.
The routines in this module accept as input either scipy.sparse representations
(csr, csc, or lil format), masked representations, or dense representations
with non-edges indicated by zeros, infinities, and NaN entries.
""" # noqa: E501
__docformat__ = "restructuredtext en"
__all__ = ['connected_components',
'laplacian',
'shortest_path',
'floyd_warshall',
'dijkstra',
'bellman_ford',
'johnson',
'yen',
'breadth_first_order',
'depth_first_order',
'breadth_first_tree',
'depth_first_tree',
'minimum_spanning_tree',
'reverse_cuthill_mckee',
'maximum_flow',
'maximum_bipartite_matching',
'min_weight_full_bipartite_matching',
'structural_rank',
'construct_dist_matrix',
'reconstruct_path',
'csgraph_masked_from_dense',
'csgraph_from_dense',
'csgraph_from_masked',
'csgraph_to_dense',
'csgraph_to_masked',
'NegativeCycleError']
from ._laplacian import laplacian
from ._shortest_path import (
shortest_path, floyd_warshall, dijkstra, bellman_ford, johnson, yen,
NegativeCycleError
)
from ._traversal import (
breadth_first_order, depth_first_order, breadth_first_tree,
depth_first_tree, connected_components
)
from ._min_spanning_tree import minimum_spanning_tree
from ._flow import maximum_flow
from ._matching import (
maximum_bipartite_matching, min_weight_full_bipartite_matching
)
from ._reordering import reverse_cuthill_mckee, structural_rank
from ._tools import (
construct_dist_matrix, reconstruct_path, csgraph_from_dense,
csgraph_to_dense, csgraph_masked_from_dense, csgraph_from_masked,
csgraph_to_masked
)
from scipy._lib._testutils import PytestTester
test = PytestTester(__name__)
del PytestTester

View file

@ -0,0 +1,563 @@
"""
Laplacian of a compressed-sparse graph
"""
import numpy as np
from scipy.sparse import issparse
from scipy.sparse.linalg import LinearOperator
from scipy.sparse._sputils import convert_pydata_sparse_to_scipy, is_pydata_spmatrix
###############################################################################
# Graph laplacian
def laplacian(
csgraph,
normed=False,
return_diag=False,
use_out_degree=False,
*,
copy=True,
form="array",
dtype=None,
symmetrized=False,
):
"""
Return the Laplacian of a directed graph.
Parameters
----------
csgraph : array_like or sparse array or matrix, 2 dimensions
compressed-sparse graph, with shape (N, N).
normed : bool, optional
If True, then compute symmetrically normalized Laplacian.
Default: False.
return_diag : bool, optional
If True, then also return an array related to vertex degrees.
Default: False.
use_out_degree : bool, optional
If True, then use out-degree instead of in-degree.
This distinction matters only if the graph is asymmetric.
Default: False.
copy: bool, optional
If False, then change `csgraph` in place if possible,
avoiding doubling the memory use.
Default: True, for backward compatibility.
form: 'array', or 'function', or 'lo'
Determines the format of the output Laplacian:
* 'array' is a numpy array;
* 'function' is a pointer to evaluating the Laplacian-vector
or Laplacian-matrix product;
* 'lo' results in the format of the `LinearOperator`.
Choosing 'function' or 'lo' always avoids doubling
the memory use, ignoring `copy` value.
Default: 'array', for backward compatibility.
dtype: None or one of numeric numpy dtypes, optional
The dtype of the output. If ``dtype=None``, the dtype of the
output matches the dtype of the input csgraph, except for
the case ``normed=True`` and integer-like csgraph, where
the output dtype is 'float' allowing accurate normalization,
but dramatically increasing the memory use.
Default: None, for backward compatibility.
symmetrized: bool, optional
If True, then the output Laplacian is symmetric/Hermitian.
The symmetrization is done by ``csgraph + csgraph.T.conj``
without dividing by 2 to preserve integer dtypes if possible
prior to the construction of the Laplacian.
The symmetrization will increase the memory footprint of
sparse matrices unless the sparsity pattern is symmetric or
`form` is 'function' or 'lo'.
Default: False, for backward compatibility.
Returns
-------
lap : ndarray, or sparse array or matrix, or `LinearOperator`
The N x N Laplacian of csgraph. It will be a NumPy array (dense)
if the input was dense, or a sparse array otherwise, or
the format of a function or `LinearOperator` if
`form` equals 'function' or 'lo', respectively.
diag : ndarray, optional
The length-N main diagonal of the Laplacian matrix.
For the normalized Laplacian, this is the array of square roots
of vertex degrees or 1 if the degree is zero.
Notes
-----
The Laplacian matrix of a graph is sometimes referred to as the
"Kirchhoff matrix" or just the "Laplacian", and is useful in many
parts of spectral graph theory.
In particular, the eigen-decomposition of the Laplacian can give
insight into many properties of the graph, e.g.,
is commonly used for spectral data embedding and clustering.
The constructed Laplacian doubles the memory use if ``copy=True`` and
``form="array"`` which is the default.
Choosing ``copy=False`` has no effect unless ``form="array"``
or the matrix is sparse in the ``coo`` format, or dense array, except
for the integer input with ``normed=True`` that forces the float output.
Sparse input is reformatted into ``coo`` if ``form="array"``,
which is the default.
If the input adjacency matrix is not symmetric, the Laplacian is
also non-symmetric unless ``symmetrized=True`` is used.
Diagonal entries of the input adjacency matrix are ignored and
replaced with zeros for the purpose of normalization where ``normed=True``.
The normalization uses the inverse square roots of row-sums of the input
adjacency matrix, and thus may fail if the row-sums contain
negative or complex with a non-zero imaginary part values.
The normalization is symmetric, making the normalized Laplacian also
symmetric if the input csgraph was symmetric.
References
----------
.. [1] Laplacian matrix. https://en.wikipedia.org/wiki/Laplacian_matrix
Examples
--------
>>> import numpy as np
>>> from scipy.sparse import csgraph
Our first illustration is the symmetric graph
>>> G = np.arange(4) * np.arange(4)[:, np.newaxis]
>>> G
array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6],
[0, 3, 6, 9]])
and its symmetric Laplacian matrix
>>> csgraph.laplacian(G)
array([[ 0, 0, 0, 0],
[ 0, 5, -2, -3],
[ 0, -2, 8, -6],
[ 0, -3, -6, 9]])
The non-symmetric graph
>>> G = np.arange(9).reshape(3, 3)
>>> G
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
has different row- and column sums, resulting in two varieties
of the Laplacian matrix, using an in-degree, which is the default
>>> L_in_degree = csgraph.laplacian(G)
>>> L_in_degree
array([[ 9, -1, -2],
[-3, 8, -5],
[-6, -7, 7]])
or alternatively an out-degree
>>> L_out_degree = csgraph.laplacian(G, use_out_degree=True)
>>> L_out_degree
array([[ 3, -1, -2],
[-3, 8, -5],
[-6, -7, 13]])
Constructing a symmetric Laplacian matrix, one can add the two as
>>> L_in_degree + L_out_degree.T
array([[ 12, -4, -8],
[ -4, 16, -12],
[ -8, -12, 20]])
or use the ``symmetrized=True`` option
>>> csgraph.laplacian(G, symmetrized=True)
array([[ 12, -4, -8],
[ -4, 16, -12],
[ -8, -12, 20]])
that is equivalent to symmetrizing the original graph
>>> csgraph.laplacian(G + G.T)
array([[ 12, -4, -8],
[ -4, 16, -12],
[ -8, -12, 20]])
The goal of normalization is to make the non-zero diagonal entries
of the Laplacian matrix to be all unit, also scaling off-diagonal
entries correspondingly. The normalization can be done manually, e.g.,
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> L, d = csgraph.laplacian(G, return_diag=True)
>>> L
array([[ 2, -1, -1],
[-1, 2, -1],
[-1, -1, 2]])
>>> d
array([2, 2, 2])
>>> scaling = np.sqrt(d)
>>> scaling
array([1.41421356, 1.41421356, 1.41421356])
>>> (1/scaling)*L*(1/scaling)
array([[ 1. , -0.5, -0.5],
[-0.5, 1. , -0.5],
[-0.5, -0.5, 1. ]])
Or using ``normed=True`` option
>>> L, d = csgraph.laplacian(G, return_diag=True, normed=True)
>>> L
array([[ 1. , -0.5, -0.5],
[-0.5, 1. , -0.5],
[-0.5, -0.5, 1. ]])
which now instead of the diagonal returns the scaling coefficients
>>> d
array([1.41421356, 1.41421356, 1.41421356])
Zero scaling coefficients are substituted with 1s, where scaling
has thus no effect, e.g.,
>>> G = np.array([[0, 0, 0], [0, 0, 1], [0, 1, 0]])
>>> G
array([[0, 0, 0],
[0, 0, 1],
[0, 1, 0]])
>>> L, d = csgraph.laplacian(G, return_diag=True, normed=True)
>>> L
array([[ 0., -0., -0.],
[-0., 1., -1.],
[-0., -1., 1.]])
>>> d
array([1., 1., 1.])
Only the symmetric normalization is implemented, resulting
in a symmetric Laplacian matrix if and only if its graph is symmetric
and has all non-negative degrees, like in the examples above.
The output Laplacian matrix is by default a dense array or a sparse
array or matrix inferring its class, shape, format, and dtype from
the input graph matrix:
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]]).astype(np.float32)
>>> G
array([[0., 1., 1.],
[1., 0., 1.],
[1., 1., 0.]], dtype=float32)
>>> csgraph.laplacian(G)
array([[ 2., -1., -1.],
[-1., 2., -1.],
[-1., -1., 2.]], dtype=float32)
but can alternatively be generated matrix-free as a LinearOperator:
>>> L = csgraph.laplacian(G, form="lo")
>>> L
<3x3 _CustomLinearOperator with dtype=float32>
>>> L(np.eye(3))
array([[ 2., -1., -1.],
[-1., 2., -1.],
[-1., -1., 2.]])
or as a lambda-function:
>>> L = csgraph.laplacian(G, form="function")
>>> L
<function _laplace.<locals>.<lambda> at 0x0000012AE6F5A598>
>>> L(np.eye(3))
array([[ 2., -1., -1.],
[-1., 2., -1.],
[-1., -1., 2.]])
The Laplacian matrix is used for
spectral data clustering and embedding
as well as for spectral graph partitioning.
Our final example illustrates the latter
for a noisy directed linear graph.
>>> from scipy.sparse import diags_array, random_array
>>> from scipy.sparse.linalg import lobpcg
Create a directed linear graph with ``N=35`` vertices
using a sparse adjacency matrix ``G``:
>>> N = 35
>>> G = diags_array(np.ones(N - 1), offsets=1, format="csr")
Fix a random seed ``rng`` and add a random sparse noise to the graph ``G``:
>>> rng = np.random.default_rng()
>>> G += 1e-2 * random_array((N, N), density=0.1, rng=rng)
Set initial approximations for eigenvectors:
>>> X = rng.random((N, 2))
The constant vector of ones is always a trivial eigenvector
of the non-normalized Laplacian to be filtered out:
>>> Y = np.ones((N, 1))
Alternating (1) the sign of the graph weights allows determining
labels for spectral max- and min- cuts in a single loop.
Since the graph is undirected, the option ``symmetrized=True``
must be used in the construction of the Laplacian.
The option ``normed=True`` cannot be used in (2) for the negative weights
here as the symmetric normalization evaluates square roots.
The option ``form="lo"`` in (2) is matrix-free, i.e., guarantees
a fixed memory footprint and read-only access to the graph.
Calling the eigenvalue solver ``lobpcg`` (3) computes the Fiedler vector
that determines the labels as the signs of its components in (5).
Since the sign in an eigenvector is not deterministic and can flip,
we fix the sign of the first component to be always +1 in (4).
>>> for cut in ["max", "min"]:
... G = -G # 1.
... L = csgraph.laplacian(G, symmetrized=True, form="lo") # 2.
... _, eves = lobpcg(L, X, Y=Y, largest=False, tol=1e-2) # 3.
... eves *= np.sign(eves[0, 0]) # 4.
... print(cut + "-cut labels:\\n", 1 * (eves[:, 0]>0)) # 5.
max-cut labels:
[1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1]
min-cut labels:
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
As anticipated for a (slightly noisy) linear graph,
the max-cut strips all the edges of the graph coloring all
odd vertices into one color and all even vertices into another one,
while the balanced min-cut partitions the graph
in the middle by deleting a single edge.
Both determined partitions are optimal.
"""
is_pydata_sparse = is_pydata_spmatrix(csgraph)
if is_pydata_sparse:
pydata_sparse_cls = csgraph.__class__
csgraph = convert_pydata_sparse_to_scipy(csgraph)
if csgraph.ndim != 2 or csgraph.shape[0] != csgraph.shape[1]:
raise ValueError('csgraph must be a square matrix or array')
if normed and (
np.issubdtype(csgraph.dtype, np.signedinteger)
or np.issubdtype(csgraph.dtype, np.uint)
):
csgraph = csgraph.astype(np.float64)
if form == "array":
create_lap = (
_laplacian_sparse if issparse(csgraph) else _laplacian_dense
)
else:
create_lap = (
_laplacian_sparse_flo
if issparse(csgraph)
else _laplacian_dense_flo
)
degree_axis = 1 if use_out_degree else 0
lap, d = create_lap(
csgraph,
normed=normed,
axis=degree_axis,
copy=copy,
form=form,
dtype=dtype,
symmetrized=symmetrized,
)
if is_pydata_sparse:
lap = pydata_sparse_cls.from_scipy_sparse(lap)
if return_diag:
return lap, d
return lap
def _setdiag_dense(m, d):
step = len(d) + 1
m.flat[::step] = d
def _laplace(m, d):
return lambda v: v * d[:, np.newaxis] - m @ v
def _laplace_normed(m, d, nd):
laplace = _laplace(m, d)
return lambda v: nd[:, np.newaxis] * laplace(v * nd[:, np.newaxis])
def _laplace_sym(m, d):
return (
lambda v: v * d[:, np.newaxis]
- m @ v
- np.transpose(np.conjugate(np.transpose(np.conjugate(v)) @ m))
)
def _laplace_normed_sym(m, d, nd):
laplace_sym = _laplace_sym(m, d)
return lambda v: nd[:, np.newaxis] * laplace_sym(v * nd[:, np.newaxis])
def _linearoperator(mv, shape, dtype):
return LinearOperator(matvec=mv, matmat=mv, shape=shape, dtype=dtype)
def _laplacian_sparse_flo(graph, normed, axis, copy, form, dtype, symmetrized):
# The keyword argument `copy` is unused and has no effect here.
del copy
if dtype is None:
dtype = graph.dtype
graph_sum = np.asarray(graph.sum(axis=axis)).ravel()
graph_diagonal = graph.diagonal()
diag = graph_sum - graph_diagonal
if symmetrized:
graph_sum += np.asarray(graph.sum(axis=1 - axis)).ravel()
diag = graph_sum - graph_diagonal - graph_diagonal
if normed:
isolated_node_mask = diag == 0
w = np.where(isolated_node_mask, 1, np.sqrt(diag))
if symmetrized:
md = _laplace_normed_sym(graph, graph_sum, 1.0 / w)
else:
md = _laplace_normed(graph, graph_sum, 1.0 / w)
if form == "function":
return md, w.astype(dtype, copy=False)
elif form == "lo":
m = _linearoperator(md, shape=graph.shape, dtype=dtype)
return m, w.astype(dtype, copy=False)
else:
raise ValueError(f"Invalid form: {form!r}")
else:
if symmetrized:
md = _laplace_sym(graph, graph_sum)
else:
md = _laplace(graph, graph_sum)
if form == "function":
return md, diag.astype(dtype, copy=False)
elif form == "lo":
m = _linearoperator(md, shape=graph.shape, dtype=dtype)
return m, diag.astype(dtype, copy=False)
else:
raise ValueError(f"Invalid form: {form!r}")
def _laplacian_sparse(graph, normed, axis, copy, form, dtype, symmetrized):
# The keyword argument `form` is unused and has no effect here.
del form
if dtype is None:
dtype = graph.dtype
needs_copy = False
if graph.format in ('lil', 'dok'):
m = graph.tocoo()
else:
m = graph
if copy:
needs_copy = True
if symmetrized:
m += m.T.conj()
w = np.asarray(m.sum(axis=axis)).ravel() - m.diagonal()
if normed:
m = m.tocoo(copy=needs_copy)
isolated_node_mask = (w == 0)
w = np.where(isolated_node_mask, 1, np.sqrt(w))
m.data /= w[m.row]
m.data /= w[m.col]
m.data *= -1
m.setdiag(1 - isolated_node_mask)
else:
if m.format == 'dia':
m = m.copy()
else:
m = m.tocoo(copy=needs_copy)
m.data *= -1
m.setdiag(w)
return m.astype(dtype, copy=False), w.astype(dtype)
def _laplacian_dense_flo(graph, normed, axis, copy, form, dtype, symmetrized):
if copy:
m = np.array(graph)
else:
m = np.asarray(graph)
if dtype is None:
dtype = m.dtype
graph_sum = m.sum(axis=axis)
graph_diagonal = m.diagonal()
diag = graph_sum - graph_diagonal
if symmetrized:
graph_sum += m.sum(axis=1 - axis)
diag = graph_sum - graph_diagonal - graph_diagonal
if normed:
isolated_node_mask = diag == 0
w = np.where(isolated_node_mask, 1, np.sqrt(diag))
if symmetrized:
md = _laplace_normed_sym(m, graph_sum, 1.0 / w)
else:
md = _laplace_normed(m, graph_sum, 1.0 / w)
if form == "function":
return md, w.astype(dtype, copy=False)
elif form == "lo":
m = _linearoperator(md, shape=graph.shape, dtype=dtype)
return m, w.astype(dtype, copy=False)
else:
raise ValueError(f"Invalid form: {form!r}")
else:
if symmetrized:
md = _laplace_sym(m, graph_sum)
else:
md = _laplace(m, graph_sum)
if form == "function":
return md, diag.astype(dtype, copy=False)
elif form == "lo":
m = _linearoperator(md, shape=graph.shape, dtype=dtype)
return m, diag.astype(dtype, copy=False)
else:
raise ValueError(f"Invalid form: {form!r}")
def _laplacian_dense(graph, normed, axis, copy, form, dtype, symmetrized):
if form != "array":
raise ValueError(f'{form!r} must be "array"')
if dtype is None:
dtype = graph.dtype
if copy:
m = np.array(graph)
else:
m = np.asarray(graph)
if dtype is None:
dtype = m.dtype
if symmetrized:
m += m.T.conj()
np.fill_diagonal(m, 0)
w = m.sum(axis=axis)
if normed:
isolated_node_mask = (w == 0)
w = np.where(isolated_node_mask, 1, np.sqrt(w))
m /= w
m /= w[:, np.newaxis]
m *= -1
_setdiag_dense(m, 1 - isolated_node_mask)
else:
m *= -1
_setdiag_dense(m, w)
return m.astype(dtype, copy=False), w.astype(dtype, copy=False)

View file

@ -0,0 +1,66 @@
import numpy as np
from scipy.sparse import issparse
from scipy.sparse._sputils import convert_pydata_sparse_to_scipy
from scipy.sparse.csgraph._tools import (
csgraph_to_dense, csgraph_from_dense,
csgraph_masked_from_dense, csgraph_from_masked
)
DTYPE = np.float64
def validate_graph(csgraph, directed, dtype=DTYPE,
csr_output=True, dense_output=True,
copy_if_dense=False, copy_if_sparse=False,
null_value_in=0, null_value_out=np.inf,
infinity_null=True, nan_null=True):
"""Routine for validation and conversion of csgraph inputs"""
if not (csr_output or dense_output):
raise ValueError("Internal: dense or csr output must be true")
accept_fv = [null_value_in]
if infinity_null:
accept_fv.append(np.inf)
if nan_null:
accept_fv.append(np.nan)
csgraph = convert_pydata_sparse_to_scipy(csgraph, accept_fv=accept_fv)
# if undirected and csc storage, then transposing in-place
# is quicker than later converting to csr.
if (not directed) and issparse(csgraph) and csgraph.format == "csc":
csgraph = csgraph.T
if issparse(csgraph):
if csr_output:
csgraph = csgraph.tocsr(copy=copy_if_sparse).astype(DTYPE, copy=False)
else:
csgraph = csgraph_to_dense(csgraph, null_value=null_value_out)
elif np.ma.isMaskedArray(csgraph):
if dense_output:
mask = csgraph.mask
csgraph = np.array(csgraph.data, dtype=DTYPE, copy=copy_if_dense)
csgraph[mask] = null_value_out
else:
csgraph = csgraph_from_masked(csgraph)
else:
if dense_output:
csgraph = csgraph_masked_from_dense(csgraph,
copy=copy_if_dense,
null_value=null_value_in,
nan_null=nan_null,
infinity_null=infinity_null)
mask = csgraph.mask
csgraph = np.asarray(csgraph.data, dtype=DTYPE)
csgraph[mask] = null_value_out
else:
csgraph = csgraph_from_dense(csgraph, null_value=null_value_in,
infinity_null=infinity_null,
nan_null=nan_null)
if csgraph.ndim != 2:
raise ValueError("compressed-sparse graph must be 2-D")
if csgraph.shape[0] != csgraph.shape[1]:
raise ValueError("compressed-sparse graph must be shape (N, N)")
return csgraph

View file

@ -0,0 +1,119 @@
import numpy as np
from numpy.testing import assert_equal, assert_array_almost_equal
from scipy.sparse import csgraph, csr_array
def test_weak_connections():
Xde = np.array([[0, 1, 0],
[0, 0, 0],
[0, 0, 0]])
Xsp = csgraph.csgraph_from_dense(Xde, null_value=0)
for X in Xsp, Xde:
n_components, labels =\
csgraph.connected_components(X, directed=True,
connection='weak')
assert_equal(n_components, 2)
assert_array_almost_equal(labels, [0, 0, 1])
def test_strong_connections():
X1de = np.array([[0, 1, 0],
[0, 0, 0],
[0, 0, 0]])
X2de = X1de + X1de.T
X1sp = csgraph.csgraph_from_dense(X1de, null_value=0)
X2sp = csgraph.csgraph_from_dense(X2de, null_value=0)
for X in X1sp, X1de:
n_components, labels =\
csgraph.connected_components(X, directed=True,
connection='strong')
assert_equal(n_components, 3)
labels.sort()
assert_array_almost_equal(labels, [0, 1, 2])
for X in X2sp, X2de:
n_components, labels =\
csgraph.connected_components(X, directed=True,
connection='strong')
assert_equal(n_components, 2)
labels.sort()
assert_array_almost_equal(labels, [0, 0, 1])
def test_strong_connections2():
X = np.array([[0, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 1, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0]])
n_components, labels =\
csgraph.connected_components(X, directed=True,
connection='strong')
assert_equal(n_components, 5)
labels.sort()
assert_array_almost_equal(labels, [0, 1, 2, 2, 3, 4])
def test_weak_connections2():
X = np.array([[0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 1, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0]])
n_components, labels =\
csgraph.connected_components(X, directed=True,
connection='weak')
assert_equal(n_components, 2)
labels.sort()
assert_array_almost_equal(labels, [0, 0, 1, 1, 1, 1])
def test_ticket1876():
# Regression test: this failed in the original implementation
# There should be two strongly-connected components; previously gave one
g = np.array([[0, 1, 1, 0],
[1, 0, 0, 1],
[0, 0, 0, 1],
[0, 0, 1, 0]])
n_components, labels = csgraph.connected_components(g, connection='strong')
assert_equal(n_components, 2)
assert_equal(labels[0], labels[1])
assert_equal(labels[2], labels[3])
def test_fully_connected_graph():
# Fully connected dense matrices raised an exception.
# https://github.com/scipy/scipy/issues/3818
g = np.ones((4, 4))
n_components, labels = csgraph.connected_components(g)
assert_equal(n_components, 1)
def test_int64_indices_undirected():
# See https://github.com/scipy/scipy/issues/18716
g = csr_array(([1], np.array([[0], [1]], dtype=np.int64)), shape=(2, 2))
assert g.indices.dtype == np.int64
n, labels = csgraph.connected_components(g, directed=False)
assert n == 1
assert_array_almost_equal(labels, [0, 0])
def test_int64_indices_directed():
# See https://github.com/scipy/scipy/issues/18716
g = csr_array(([1], np.array([[0], [1]], dtype=np.int64)), shape=(2, 2))
assert g.indices.dtype == np.int64
n, labels = csgraph.connected_components(g, directed=True,
connection='strong')
assert n == 2
assert_array_almost_equal(labels, [1, 0])

View file

@ -0,0 +1,61 @@
import numpy as np
from numpy.testing import assert_array_almost_equal
from scipy.sparse import csr_array
from scipy.sparse.csgraph import csgraph_from_dense, csgraph_to_dense
def test_csgraph_from_dense():
np.random.seed(1234)
G = np.random.random((10, 10))
some_nulls = (G < 0.4)
all_nulls = (G < 0.8)
for null_value in [0, np.nan, np.inf]:
G[all_nulls] = null_value
with np.errstate(invalid="ignore"):
G_csr = csgraph_from_dense(G, null_value=0)
G[all_nulls] = 0
assert_array_almost_equal(G, G_csr.toarray())
for null_value in [np.nan, np.inf]:
G[all_nulls] = 0
G[some_nulls] = null_value
with np.errstate(invalid="ignore"):
G_csr = csgraph_from_dense(G, null_value=0)
G[all_nulls] = 0
assert_array_almost_equal(G, G_csr.toarray())
def test_csgraph_to_dense():
np.random.seed(1234)
G = np.random.random((10, 10))
nulls = (G < 0.8)
G[nulls] = np.inf
G_csr = csgraph_from_dense(G)
for null_value in [0, 10, -np.inf, np.inf]:
G[nulls] = null_value
assert_array_almost_equal(G, csgraph_to_dense(G_csr, null_value))
def test_multiple_edges():
# create a random square matrix with an even number of elements
np.random.seed(1234)
X = np.random.random((10, 10))
Xcsr = csr_array(X)
# now double-up every other column
Xcsr.indices[::2] = Xcsr.indices[1::2]
# normal sparse toarray() will sum the duplicated edges
Xdense = Xcsr.toarray()
assert_array_almost_equal(Xdense[:, 1::2],
X[:, ::2] + X[:, 1::2])
# csgraph_to_dense chooses the minimum of each duplicated edge
Xdense = csgraph_to_dense(Xcsr)
assert_array_almost_equal(Xdense[:, 1::2],
np.minimum(X[:, ::2], X[:, 1::2]))

View file

@ -0,0 +1,209 @@
import numpy as np
from numpy.testing import assert_array_equal
import pytest
from scipy.sparse import csr_array, csc_array, csr_matrix
from scipy.sparse.csgraph import maximum_flow
from scipy.sparse.csgraph._flow import (
_add_reverse_edges, _make_edge_pointers, _make_tails
)
methods = ['edmonds_karp', 'dinic']
def test_raises_on_dense_input():
with pytest.raises(TypeError):
graph = np.array([[0, 1], [0, 0]])
maximum_flow(graph, 0, 1)
maximum_flow(graph, 0, 1, method='edmonds_karp')
def test_raises_on_csc_input():
with pytest.raises(TypeError):
graph = csc_array([[0, 1], [0, 0]])
maximum_flow(graph, 0, 1)
maximum_flow(graph, 0, 1, method='edmonds_karp')
def test_raises_on_floating_point_input():
with pytest.raises(ValueError):
graph = csr_array([[0, 1.5], [0, 0]], dtype=np.float64)
maximum_flow(graph, 0, 1)
maximum_flow(graph, 0, 1, method='edmonds_karp')
def test_raises_on_non_square_input():
with pytest.raises(ValueError):
graph = csr_array([[0, 1, 2], [2, 1, 0]])
maximum_flow(graph, 0, 1)
def test_raises_when_source_is_sink():
with pytest.raises(ValueError):
graph = csr_array([[0, 1], [0, 0]])
maximum_flow(graph, 0, 0)
maximum_flow(graph, 0, 0, method='edmonds_karp')
@pytest.mark.parametrize('method', methods)
@pytest.mark.parametrize('source', [-1, 2, 3])
def test_raises_when_source_is_out_of_bounds(source, method):
with pytest.raises(ValueError):
graph = csr_array([[0, 1], [0, 0]])
maximum_flow(graph, source, 1, method=method)
@pytest.mark.parametrize('method', methods)
@pytest.mark.parametrize('sink', [-1, 2, 3])
def test_raises_when_sink_is_out_of_bounds(sink, method):
with pytest.raises(ValueError):
graph = csr_array([[0, 1], [0, 0]])
maximum_flow(graph, 0, sink, method=method)
@pytest.mark.parametrize('method', methods)
def test_simple_graph(method):
# This graph looks as follows:
# (0) --5--> (1)
graph = csr_array([[0, 5], [0, 0]])
res = maximum_flow(graph, 0, 1, method=method)
assert res.flow_value == 5
expected_flow = np.array([[0, 5], [-5, 0]])
assert_array_equal(res.flow.toarray(), expected_flow)
@pytest.mark.parametrize('method', methods)
def test_return_type(method):
graph = csr_array([[0, 5], [0, 0]])
assert isinstance(maximum_flow(graph, 0, 1, method=method).flow, csr_array)
graph = csr_matrix([[0, 5], [0, 0]])
assert isinstance(maximum_flow(graph, 0, 1, method=method).flow, csr_matrix)
@pytest.mark.parametrize('method', methods)
def test_bottle_neck_graph(method):
# This graph cannot use the full capacity between 0 and 1:
# (0) --5--> (1) --3--> (2)
graph = csr_array([[0, 5, 0], [0, 0, 3], [0, 0, 0]])
res = maximum_flow(graph, 0, 2, method=method)
assert res.flow_value == 3
expected_flow = np.array([[0, 3, 0], [-3, 0, 3], [0, -3, 0]])
assert_array_equal(res.flow.toarray(), expected_flow)
@pytest.mark.parametrize('method', methods)
def test_backwards_flow(method):
# This example causes backwards flow between vertices 3 and 4,
# and so this test ensures that we handle that accordingly. See
# https://stackoverflow.com/q/38843963/5085211
# for more information.
graph = csr_array([[0, 10, 0, 0, 10, 0, 0, 0],
[0, 0, 10, 0, 0, 0, 0, 0],
[0, 0, 0, 10, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 10],
[0, 0, 0, 10, 0, 10, 0, 0],
[0, 0, 0, 0, 0, 0, 10, 0],
[0, 0, 0, 0, 0, 0, 0, 10],
[0, 0, 0, 0, 0, 0, 0, 0]])
res = maximum_flow(graph, 0, 7, method=method)
assert res.flow_value == 20
expected_flow = np.array([[0, 10, 0, 0, 10, 0, 0, 0],
[-10, 0, 10, 0, 0, 0, 0, 0],
[0, -10, 0, 10, 0, 0, 0, 0],
[0, 0, -10, 0, 0, 0, 0, 10],
[-10, 0, 0, 0, 0, 10, 0, 0],
[0, 0, 0, 0, -10, 0, 10, 0],
[0, 0, 0, 0, 0, -10, 0, 10],
[0, 0, 0, -10, 0, 0, -10, 0]])
assert_array_equal(res.flow.toarray(), expected_flow)
@pytest.mark.parametrize('method', methods)
def test_example_from_clrs_chapter_26_1(method):
# See page 659 in CLRS second edition, but note that the maximum flow
# we find is slightly different than the one in CLRS; we push a flow of
# 12 to v_1 instead of v_2.
graph = csr_array([[0, 16, 13, 0, 0, 0],
[0, 0, 10, 12, 0, 0],
[0, 4, 0, 0, 14, 0],
[0, 0, 9, 0, 0, 20],
[0, 0, 0, 7, 0, 4],
[0, 0, 0, 0, 0, 0]])
res = maximum_flow(graph, 0, 5, method=method)
assert res.flow_value == 23
expected_flow = np.array([[0, 12, 11, 0, 0, 0],
[-12, 0, 0, 12, 0, 0],
[-11, 0, 0, 0, 11, 0],
[0, -12, 0, 0, -7, 19],
[0, 0, -11, 7, 0, 4],
[0, 0, 0, -19, -4, 0]])
assert_array_equal(res.flow.toarray(), expected_flow)
@pytest.mark.parametrize('method', methods)
def test_disconnected_graph(method):
# This tests the following disconnected graph:
# (0) --5--> (1) (2) --3--> (3)
graph = csr_array([[0, 5, 0, 0],
[0, 0, 0, 0],
[0, 0, 9, 3],
[0, 0, 0, 0]])
res = maximum_flow(graph, 0, 3, method=method)
assert res.flow_value == 0
expected_flow = np.zeros((4, 4), dtype=np.int32)
assert_array_equal(res.flow.toarray(), expected_flow)
@pytest.mark.parametrize('method', methods)
def test_add_reverse_edges_large_graph(method):
# Regression test for https://github.com/scipy/scipy/issues/14385
n = 100_000
indices = np.arange(1, n)
indptr = np.array(list(range(n)) + [n - 1])
data = np.ones(n - 1, dtype=np.int32)
graph = csr_array((data, indices, indptr), shape=(n, n))
res = maximum_flow(graph, 0, n - 1, method=method)
assert res.flow_value == 1
expected_flow = graph - graph.transpose()
assert_array_equal(res.flow.data, expected_flow.data)
assert_array_equal(res.flow.indices, expected_flow.indices)
assert_array_equal(res.flow.indptr, expected_flow.indptr)
@pytest.mark.parametrize("a,b_data_expected", [
([[]], []),
([[0], [0]], []),
([[1, 0, 2], [0, 0, 0], [0, 3, 0]], [1, 2, 0, 0, 3]),
([[9, 8, 7], [4, 5, 6], [0, 0, 0]], [9, 8, 7, 4, 5, 6, 0, 0])])
def test_add_reverse_edges(a, b_data_expected):
"""Test that the reversal of the edges of the input graph works
as expected.
"""
a = csr_array(a, dtype=np.int32, shape=(len(a), len(a)))
b = _add_reverse_edges(a)
assert_array_equal(b.data, b_data_expected)
@pytest.mark.parametrize("a,expected", [
([[]], []),
([[0]], []),
([[1]], [0]),
([[0, 1], [10, 0]], [1, 0]),
([[1, 0, 2], [0, 0, 3], [4, 5, 0]], [0, 3, 4, 1, 2])
])
def test_make_edge_pointers(a, expected):
a = csr_array(a, dtype=np.int32)
rev_edge_ptr = _make_edge_pointers(a)
assert_array_equal(rev_edge_ptr, expected)
@pytest.mark.parametrize("a,expected", [
([[]], []),
([[0]], []),
([[1]], [0]),
([[0, 1], [10, 0]], [0, 1]),
([[1, 0, 2], [0, 0, 3], [4, 5, 0]], [0, 0, 1, 2, 2])
])
def test_make_tails(a, expected):
a = csr_array(a, dtype=np.int32)
tails = _make_tails(a)
assert_array_equal(tails, expected)

View file

@ -0,0 +1,368 @@
import pytest
import numpy as np
from numpy.testing import assert_allclose
from pytest import raises as assert_raises
from scipy import sparse
from scipy.sparse import csgraph
from scipy._lib._util import np_long, np_ulong
def check_int_type(mat):
return np.issubdtype(mat.dtype, np.signedinteger) or np.issubdtype(
mat.dtype, np_ulong
)
def test_laplacian_value_error():
for t in int, float, complex:
for m in ([1, 1],
[[[1]]],
[[1, 2, 3], [4, 5, 6]],
[[1, 2], [3, 4], [5, 5]]):
A = np.array(m, dtype=t)
assert_raises(ValueError, csgraph.laplacian, A)
def _explicit_laplacian(x, normed=False):
if sparse.issparse(x):
x = x.toarray()
x = np.asarray(x)
y = -1.0 * x
for j in range(y.shape[0]):
y[j,j] = x[j,j+1:].sum() + x[j,:j].sum()
if normed:
d = np.diag(y).copy()
d[d == 0] = 1.0
y /= d[:,None]**.5
y /= d[None,:]**.5
return y
def _check_symmetric_graph_laplacian(mat, normed, copy=True):
if not hasattr(mat, 'shape'):
mat = eval(mat, dict(np=np, sparse=sparse))
if sparse.issparse(mat):
sp_mat = mat
mat = sp_mat.toarray()
else:
sp_mat = sparse.csr_array(mat)
mat_copy = np.copy(mat)
sp_mat_copy = sparse.csr_array(sp_mat, copy=True)
n_nodes = mat.shape[0]
explicit_laplacian = _explicit_laplacian(mat, normed=normed)
laplacian = csgraph.laplacian(mat, normed=normed, copy=copy)
sp_laplacian = csgraph.laplacian(sp_mat, normed=normed,
copy=copy)
if copy:
assert_allclose(mat, mat_copy)
_assert_allclose_sparse(sp_mat, sp_mat_copy)
else:
if not (normed and check_int_type(mat)):
assert_allclose(laplacian, mat)
if sp_mat.format == 'coo':
_assert_allclose_sparse(sp_laplacian, sp_mat)
assert_allclose(laplacian, sp_laplacian.toarray())
for tested in [laplacian, sp_laplacian.toarray()]:
if not normed:
assert_allclose(tested.sum(axis=0), np.zeros(n_nodes))
assert_allclose(tested.T, tested)
assert_allclose(tested, explicit_laplacian)
def test_symmetric_graph_laplacian():
symmetric_mats = (
'np.arange(10) * np.arange(10)[:, np.newaxis]',
'np.ones((7, 7))',
'np.eye(19)',
'sparse.diags([1, 1], [-1, 1], shape=(4, 4))',
'sparse.diags([1, 1], [-1, 1], shape=(4, 4)).toarray()',
'sparse.diags([1, 1], [-1, 1], shape=(4, 4)).todense()',
'np.vander(np.arange(4)) + np.vander(np.arange(4)).T'
)
for mat in symmetric_mats:
for normed in True, False:
for copy in True, False:
_check_symmetric_graph_laplacian(mat, normed, copy)
def _assert_allclose_sparse(a, b, **kwargs):
# helper function that can deal with sparse matrices
if sparse.issparse(a):
a = a.toarray()
if sparse.issparse(b):
b = b.toarray()
assert_allclose(a, b, **kwargs)
def _check_laplacian_dtype_none(
A, desired_L, desired_d, normed, use_out_degree, copy, dtype, arr_type
):
mat = arr_type(A, dtype=dtype)
L, d = csgraph.laplacian(
mat,
normed=normed,
return_diag=True,
use_out_degree=use_out_degree,
copy=copy,
dtype=None,
)
if normed and check_int_type(mat):
assert L.dtype == np.float64
assert d.dtype == np.float64
_assert_allclose_sparse(L, desired_L, atol=1e-12)
_assert_allclose_sparse(d, desired_d, atol=1e-12)
else:
assert L.dtype == dtype
assert d.dtype == dtype
desired_L = np.asarray(desired_L).astype(dtype)
desired_d = np.asarray(desired_d).astype(dtype)
_assert_allclose_sparse(L, desired_L, atol=1e-12)
_assert_allclose_sparse(d, desired_d, atol=1e-12)
if not copy:
if not (normed and check_int_type(mat)):
if type(mat) is np.ndarray:
assert_allclose(L, mat)
elif mat.format == "coo":
_assert_allclose_sparse(L, mat)
def _check_laplacian_dtype(
A, desired_L, desired_d, normed, use_out_degree, copy, dtype, arr_type
):
mat = arr_type(A, dtype=dtype)
L, d = csgraph.laplacian(
mat,
normed=normed,
return_diag=True,
use_out_degree=use_out_degree,
copy=copy,
dtype=dtype,
)
assert L.dtype == dtype
assert d.dtype == dtype
desired_L = np.asarray(desired_L).astype(dtype)
desired_d = np.asarray(desired_d).astype(dtype)
_assert_allclose_sparse(L, desired_L, atol=1e-12)
_assert_allclose_sparse(d, desired_d, atol=1e-12)
if not copy:
if not (normed and check_int_type(mat)):
if type(mat) is np.ndarray:
assert_allclose(L, mat)
elif mat.format == 'coo':
_assert_allclose_sparse(L, mat)
INT_DTYPES = (np.intc, np_long, np.longlong)
REAL_DTYPES = (np.float32, np.float64, np.longdouble)
COMPLEX_DTYPES = (np.complex64, np.complex128, np.clongdouble)
DTYPES = INT_DTYPES + REAL_DTYPES + COMPLEX_DTYPES
@pytest.mark.parametrize("dtype", DTYPES)
@pytest.mark.parametrize("arr_type", [np.array,
sparse.csr_matrix,
sparse.coo_matrix,
sparse.csr_array,
sparse.coo_array])
@pytest.mark.parametrize("copy", [True, False])
@pytest.mark.parametrize("normed", [True, False])
@pytest.mark.parametrize("use_out_degree", [True, False])
def test_asymmetric_laplacian(use_out_degree, normed,
copy, dtype, arr_type):
# adjacency matrix
A = [[0, 1, 0],
[4, 2, 0],
[0, 0, 0]]
A = arr_type(np.array(A), dtype=dtype)
A_copy = A.copy()
if not normed and use_out_degree:
# Laplacian matrix using out-degree
L = [[1, -1, 0],
[-4, 4, 0],
[0, 0, 0]]
d = [1, 4, 0]
if normed and use_out_degree:
# normalized Laplacian matrix using out-degree
L = [[1, -0.5, 0],
[-2, 1, 0],
[0, 0, 0]]
d = [1, 2, 1]
if not normed and not use_out_degree:
# Laplacian matrix using in-degree
L = [[4, -1, 0],
[-4, 1, 0],
[0, 0, 0]]
d = [4, 1, 0]
if normed and not use_out_degree:
# normalized Laplacian matrix using in-degree
L = [[1, -0.5, 0],
[-2, 1, 0],
[0, 0, 0]]
d = [2, 1, 1]
_check_laplacian_dtype_none(
A,
L,
d,
normed=normed,
use_out_degree=use_out_degree,
copy=copy,
dtype=dtype,
arr_type=arr_type,
)
_check_laplacian_dtype(
A_copy,
L,
d,
normed=normed,
use_out_degree=use_out_degree,
copy=copy,
dtype=dtype,
arr_type=arr_type,
)
@pytest.mark.parametrize("fmt", ['csr', 'csc', 'coo', 'lil',
'dok', 'dia', 'bsr'])
@pytest.mark.parametrize("normed", [True, False])
@pytest.mark.parametrize("copy", [True, False])
def test_sparse_formats(fmt, normed, copy):
mat = sparse.diags_array([1, 1], offsets=[-1, 1], shape=(4, 4), format=fmt)
_check_symmetric_graph_laplacian(mat, normed, copy)
@pytest.mark.parametrize(
"arr_type", [np.asarray,
sparse.csr_matrix,
sparse.coo_matrix,
sparse.csr_array,
sparse.coo_array]
)
@pytest.mark.parametrize("form", ["array", "function", "lo"])
def test_laplacian_symmetrized(arr_type, form):
# adjacency matrix
n = 3
mat = arr_type(np.arange(n * n).reshape(n, n))
L_in, d_in = csgraph.laplacian(
mat,
return_diag=True,
form=form,
)
L_out, d_out = csgraph.laplacian(
mat,
return_diag=True,
use_out_degree=True,
form=form,
)
Ls, ds = csgraph.laplacian(
mat,
return_diag=True,
symmetrized=True,
form=form,
)
Ls_normed, ds_normed = csgraph.laplacian(
mat,
return_diag=True,
symmetrized=True,
normed=True,
form=form,
)
mat += mat.T
Lss, dss = csgraph.laplacian(mat, return_diag=True, form=form)
Lss_normed, dss_normed = csgraph.laplacian(
mat,
return_diag=True,
normed=True,
form=form,
)
assert_allclose(ds, d_in + d_out)
assert_allclose(ds, dss)
assert_allclose(ds_normed, dss_normed)
d = {}
for L in ["L_in", "L_out", "Ls", "Ls_normed", "Lss", "Lss_normed"]:
if form == "array":
d[L] = eval(L)
else:
d[L] = eval(L)(np.eye(n, dtype=mat.dtype))
_assert_allclose_sparse(d["Ls"], d["L_in"] + d["L_out"].T)
_assert_allclose_sparse(d["Ls"], d["Lss"])
_assert_allclose_sparse(d["Ls_normed"], d["Lss_normed"])
@pytest.mark.parametrize(
"arr_type", [np.asarray,
sparse.csr_matrix,
sparse.coo_matrix,
sparse.csr_array,
sparse.coo_array]
)
@pytest.mark.parametrize("dtype", DTYPES)
@pytest.mark.parametrize("normed", [True, False])
@pytest.mark.parametrize("symmetrized", [True, False])
@pytest.mark.parametrize("use_out_degree", [True, False])
@pytest.mark.parametrize("form", ["function", "lo"])
def test_format(dtype, arr_type, normed, symmetrized, use_out_degree, form):
n = 3
mat = [[0, 1, 0], [4, 2, 0], [0, 0, 0]]
mat = arr_type(np.array(mat), dtype=dtype)
Lo, do = csgraph.laplacian(
mat,
return_diag=True,
normed=normed,
symmetrized=symmetrized,
use_out_degree=use_out_degree,
dtype=dtype,
)
La, da = csgraph.laplacian(
mat,
return_diag=True,
normed=normed,
symmetrized=symmetrized,
use_out_degree=use_out_degree,
dtype=dtype,
form="array",
)
assert_allclose(do, da)
_assert_allclose_sparse(Lo, La)
L, d = csgraph.laplacian(
mat,
return_diag=True,
normed=normed,
symmetrized=symmetrized,
use_out_degree=use_out_degree,
dtype=dtype,
form=form,
)
assert_allclose(d, do)
assert d.dtype == dtype
Lm = L(np.eye(n, dtype=mat.dtype)).astype(dtype)
_assert_allclose_sparse(Lm, Lo, rtol=2e-7, atol=2e-7)
x = np.arange(6).reshape(3, 2)
if not (normed and dtype in INT_DTYPES):
assert_allclose(L(x), Lo @ x)
else:
# Normalized Lo is casted to integer, but L() is not
pass
def test_format_error_message():
with pytest.raises(ValueError, match="Invalid form: 'toto'"):
_ = csgraph.laplacian(np.eye(1), form='toto')

View file

@ -0,0 +1,307 @@
from itertools import product
import numpy as np
from numpy.testing import assert_array_equal, assert_equal
import pytest
from scipy.sparse import csr_array, coo_array, diags_array
from scipy.sparse.csgraph import (
maximum_bipartite_matching, min_weight_full_bipartite_matching
)
def test_maximum_bipartite_matching_raises_on_dense_input():
with pytest.raises(TypeError):
graph = np.array([[0, 1], [0, 0]])
maximum_bipartite_matching(graph)
def test_maximum_bipartite_matching_empty_graph():
graph = csr_array((0, 0))
x = maximum_bipartite_matching(graph, perm_type='row')
y = maximum_bipartite_matching(graph, perm_type='column')
expected_matching = np.array([])
assert_array_equal(expected_matching, x)
assert_array_equal(expected_matching, y)
def test_maximum_bipartite_matching_empty_left_partition():
graph = csr_array((2, 0))
x = maximum_bipartite_matching(graph, perm_type='row')
y = maximum_bipartite_matching(graph, perm_type='column')
assert_array_equal(np.array([]), x)
assert_array_equal(np.array([-1, -1]), y)
def test_maximum_bipartite_matching_empty_right_partition():
graph = csr_array((0, 3))
x = maximum_bipartite_matching(graph, perm_type='row')
y = maximum_bipartite_matching(graph, perm_type='column')
assert_array_equal(np.array([-1, -1, -1]), x)
assert_array_equal(np.array([]), y)
def test_maximum_bipartite_matching_graph_with_no_edges():
graph = csr_array((2, 2))
x = maximum_bipartite_matching(graph, perm_type='row')
y = maximum_bipartite_matching(graph, perm_type='column')
assert_array_equal(np.array([-1, -1]), x)
assert_array_equal(np.array([-1, -1]), y)
def test_maximum_bipartite_matching_graph_that_causes_augmentation():
# In this graph, column 1 is initially assigned to row 1, but it should be
# reassigned to make room for row 2.
graph = csr_array([[1, 1], [1, 0]])
x = maximum_bipartite_matching(graph, perm_type='column')
y = maximum_bipartite_matching(graph, perm_type='row')
expected_matching = np.array([1, 0])
assert_array_equal(expected_matching, x)
assert_array_equal(expected_matching, y)
def test_maximum_bipartite_matching_graph_with_more_rows_than_columns():
graph = csr_array([[1, 1], [1, 0], [0, 1]])
x = maximum_bipartite_matching(graph, perm_type='column')
y = maximum_bipartite_matching(graph, perm_type='row')
assert_array_equal(np.array([0, -1, 1]), x)
assert_array_equal(np.array([0, 2]), y)
def test_maximum_bipartite_matching_graph_with_more_columns_than_rows():
graph = csr_array([[1, 1, 0], [0, 0, 1]])
x = maximum_bipartite_matching(graph, perm_type='column')
y = maximum_bipartite_matching(graph, perm_type='row')
assert_array_equal(np.array([0, 2]), x)
assert_array_equal(np.array([0, -1, 1]), y)
def test_maximum_bipartite_matching_explicit_zeros_count_as_edges():
data = [0, 0]
indices = [1, 0]
indptr = [0, 1, 2]
graph = csr_array((data, indices, indptr), shape=(2, 2))
x = maximum_bipartite_matching(graph, perm_type='row')
y = maximum_bipartite_matching(graph, perm_type='column')
expected_matching = np.array([1, 0])
assert_array_equal(expected_matching, x)
assert_array_equal(expected_matching, y)
def test_maximum_bipartite_matching_feasibility_of_result():
# This is a regression test for GitHub issue #11458
data = np.ones(50, dtype=int)
indices = [11, 12, 19, 22, 23, 5, 22, 3, 8, 10, 5, 6, 11, 12, 13, 5, 13,
14, 20, 22, 3, 15, 3, 13, 14, 11, 12, 19, 22, 23, 5, 22, 3, 8,
10, 5, 6, 11, 12, 13, 5, 13, 14, 20, 22, 3, 15, 3, 13, 14]
indptr = [0, 5, 7, 10, 10, 15, 20, 22, 22, 23, 25, 30, 32, 35, 35, 40, 45,
47, 47, 48, 50]
graph = csr_array((data, indices, indptr), shape=(20, 25))
x = maximum_bipartite_matching(graph, perm_type='row')
y = maximum_bipartite_matching(graph, perm_type='column')
assert (x != -1).sum() == 13
assert (y != -1).sum() == 13
# Ensure that each element of the matching is in fact an edge in the graph.
for u, v in zip(range(graph.shape[0]), y):
if v != -1:
assert graph[u, v]
for u, v in zip(x, range(graph.shape[1])):
if u != -1:
assert graph[u, v]
def test_matching_large_random_graph_with_one_edge_incident_to_each_vertex():
np.random.seed(42)
A = diags_array(np.ones(25), offsets=0, format='csr')
rand_perm = np.random.permutation(25)
rand_perm2 = np.random.permutation(25)
Rrow = np.arange(25)
Rcol = rand_perm
Rdata = np.ones(25, dtype=int)
Rmat = csr_array((Rdata, (Rrow, Rcol)))
Crow = rand_perm2
Ccol = np.arange(25)
Cdata = np.ones(25, dtype=int)
Cmat = csr_array((Cdata, (Crow, Ccol)))
# Randomly permute identity matrix
B = Rmat @ A @ Cmat
# Row permute
perm = maximum_bipartite_matching(B, perm_type='row')
Rrow = np.arange(25)
Rcol = perm
Rdata = np.ones(25, dtype=int)
Rmat = csr_array((Rdata, (Rrow, Rcol)))
C1 = Rmat @ B
# Column permute
perm2 = maximum_bipartite_matching(B, perm_type='column')
Crow = perm2
Ccol = np.arange(25)
Cdata = np.ones(25, dtype=int)
Cmat = csr_array((Cdata, (Crow, Ccol)))
C2 = B @ Cmat
# Should get identity matrix back
assert_equal(any(C1.diagonal() == 0), False)
assert_equal(any(C2.diagonal() == 0), False)
@pytest.mark.parametrize('num_rows,num_cols', [(0, 0), (2, 0), (0, 3)])
def test_min_weight_full_matching_trivial_graph(num_rows, num_cols):
biadjacency = csr_array((num_cols, num_rows))
biadjacency1 = coo_array((num_cols, num_rows))
row_ind, col_ind = min_weight_full_bipartite_matching(biadjacency)
assert len(row_ind) == 0
assert len(col_ind) == 0
row_ind1, col_ind1 = min_weight_full_bipartite_matching(biadjacency1)
assert len(row_ind1) == 0
assert len(col_ind1) == 0
@pytest.mark.parametrize('biadjacency',
[
[[1, 1, 1], [1, 0, 0], [1, 0, 0]],
[[1, 1, 1], [0, 0, 1], [0, 0, 1]],
[[1, 0, 0, 1], [1, 1, 0, 1], [0, 0, 0, 0]],
[[1, 0, 0], [2, 0, 0]],
[[0, 1, 0], [0, 2, 0]],
[[1, 0], [2, 0], [5, 0]]
])
def test_min_weight_full_matching_infeasible_problems(biadjacency):
with pytest.raises(ValueError):
min_weight_full_bipartite_matching(csr_array(biadjacency))
with pytest.raises(ValueError):
min_weight_full_bipartite_matching(coo_array(biadjacency))
def test_min_weight_full_matching_large_infeasible():
# Regression test for GitHub issue #17269
a = np.asarray([
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001],
[0.0, 0.11687445, 0.0, 0.0, 0.01319788, 0.07509257, 0.0,
0.0, 0.0, 0.74228317, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.81087935, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.8408466, 0.0, 0.0, 0.0, 0.0, 0.01194389,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.82994211, 0.0, 0.0, 0.0, 0.11468516, 0.0, 0.0, 0.0,
0.11173505, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0],
[0.18796507, 0.0, 0.04002318, 0.0, 0.0, 0.0, 0.0, 0.0, 0.75883335,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.71545464, 0.0, 0.0, 0.0, 0.0, 0.0, 0.02748488,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.78470564, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.14829198,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.10870609, 0.0, 0.0, 0.0, 0.8918677, 0.0, 0.0, 0.0, 0.06306644,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.63844085, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.7442354, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09850549, 0.0, 0.0, 0.18638258,
0.2769244, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.73182464, 0.0, 0.0, 0.46443561,
0.38589284, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.29510278, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09666032, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
])
with pytest.raises(ValueError, match='no full matching exists'):
min_weight_full_bipartite_matching(csr_array(a))
with pytest.raises(ValueError, match='no full matching exists'):
min_weight_full_bipartite_matching(coo_array(a))
@pytest.mark.thread_unsafe
def test_explicit_zero_causes_warning():
biadjacency = csr_array(((2, 0, 3), (0, 1, 1), (0, 2, 3)))
with pytest.warns(UserWarning):
min_weight_full_bipartite_matching(biadjacency)
with pytest.warns(UserWarning):
min_weight_full_bipartite_matching(biadjacency.tocoo())
# General test for linear sum assignment solvers to make it possible to rely
# on the same tests for scipy.optimize.linear_sum_assignment.
def linear_sum_assignment_assertions(
solver, array_type, sign, test_case
):
cost_matrix, expected_cost = test_case
maximize = sign == -1
cost_matrix = sign * array_type(cost_matrix)
expected_cost = sign * np.array(expected_cost)
row_ind, col_ind = solver(cost_matrix, maximize=maximize)
assert_array_equal(row_ind, np.sort(row_ind))
assert_array_equal(expected_cost,
np.array(cost_matrix[row_ind, col_ind]).flatten())
cost_matrix = cost_matrix.T
row_ind, col_ind = solver(cost_matrix, maximize=maximize)
assert_array_equal(row_ind, np.sort(row_ind))
assert_array_equal(np.sort(expected_cost),
np.sort(np.array(
cost_matrix[row_ind, col_ind])).flatten())
linear_sum_assignment_test_cases = product(
[-1, 1],
[
# Square
([[400, 150, 400],
[400, 450, 600],
[300, 225, 300]],
[150, 400, 300]),
# Rectangular variant
([[400, 150, 400, 1],
[400, 450, 600, 2],
[300, 225, 300, 3]],
[150, 2, 300]),
([[10, 10, 8],
[9, 8, 1],
[9, 7, 4]],
[10, 1, 7]),
# Square
([[10, 10, 8, 11],
[9, 8, 1, 1],
[9, 7, 4, 10]],
[10, 1, 4]),
# Rectangular variant
([[10, float("inf"), float("inf")],
[float("inf"), float("inf"), 1],
[float("inf"), 7, float("inf")]],
[10, 1, 7])
])
@pytest.mark.parametrize('sign,test_case', linear_sum_assignment_test_cases)
def test_min_weight_full_matching_small_inputs(sign, test_case):
linear_sum_assignment_assertions(
min_weight_full_bipartite_matching, csr_array, sign, test_case)

View file

@ -0,0 +1,197 @@
import pytest
import numpy as np
import scipy.sparse as sp
import scipy.sparse.csgraph as spgraph
from scipy._lib import _pep440
from numpy.testing import assert_equal
try:
import sparse
except Exception:
sparse = None
pytestmark = pytest.mark.skipif(sparse is None,
reason="pydata/sparse not installed")
msg = "pydata/sparse (0.15.1) does not implement necessary operations"
sparse_params = (pytest.param("COO"),
pytest.param("DOK", marks=[pytest.mark.xfail(reason=msg)]))
def check_sparse_version(min_ver):
if sparse is None:
return pytest.mark.skip(reason="sparse is not installed")
return pytest.mark.skipif(
_pep440.parse(sparse.__version__) < _pep440.Version(min_ver),
reason=f"sparse version >= {min_ver} required"
)
@pytest.fixture(params=sparse_params)
def sparse_cls(request):
return getattr(sparse, request.param)
@pytest.fixture
def graphs(sparse_cls):
graph = [
[0, 1, 1, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
]
A_dense = np.array(graph)
A_sparse = sparse_cls(A_dense)
return A_dense, A_sparse
@pytest.mark.parametrize(
"func",
[
spgraph.shortest_path,
spgraph.dijkstra,
spgraph.floyd_warshall,
spgraph.bellman_ford,
spgraph.johnson,
spgraph.reverse_cuthill_mckee,
spgraph.maximum_bipartite_matching,
spgraph.structural_rank,
]
)
def test_csgraph_equiv(func, graphs):
A_dense, A_sparse = graphs
actual = func(A_sparse)
desired = func(sp.csc_array(A_dense))
assert_equal(actual, desired)
def test_connected_components(graphs):
A_dense, A_sparse = graphs
func = spgraph.connected_components
actual_comp, actual_labels = func(A_sparse)
desired_comp, desired_labels, = func(sp.csc_array(A_dense))
assert actual_comp == desired_comp
assert_equal(actual_labels, desired_labels)
def test_laplacian(graphs):
A_dense, A_sparse = graphs
sparse_cls = type(A_sparse)
func = spgraph.laplacian
actual = func(A_sparse)
desired = func(sp.csc_array(A_dense))
assert isinstance(actual, sparse_cls)
assert_equal(actual.todense(), desired.todense())
@pytest.mark.parametrize(
"func", [spgraph.breadth_first_order, spgraph.depth_first_order]
)
def test_order_search(graphs, func):
A_dense, A_sparse = graphs
actual = func(A_sparse, 0)
desired = func(sp.csc_array(A_dense), 0)
assert_equal(actual, desired)
@pytest.mark.parametrize(
"func", [spgraph.breadth_first_tree, spgraph.depth_first_tree]
)
def test_tree_search(graphs, func):
A_dense, A_sparse = graphs
sparse_cls = type(A_sparse)
actual = func(A_sparse, 0)
desired = func(sp.csc_array(A_dense), 0)
assert isinstance(actual, sparse_cls)
assert_equal(actual.todense(), desired.todense())
def test_minimum_spanning_tree(graphs):
A_dense, A_sparse = graphs
sparse_cls = type(A_sparse)
func = spgraph.minimum_spanning_tree
actual = func(A_sparse)
desired = func(sp.csc_array(A_dense))
assert isinstance(actual, sparse_cls)
assert_equal(actual.todense(), desired.todense())
def test_maximum_flow(graphs):
A_dense, A_sparse = graphs
sparse_cls = type(A_sparse)
func = spgraph.maximum_flow
actual = func(A_sparse, 0, 2)
desired = func(sp.csr_array(A_dense), 0, 2)
assert actual.flow_value == desired.flow_value
assert isinstance(actual.flow, sparse_cls)
assert_equal(actual.flow.todense(), desired.flow.todense())
def test_min_weight_full_bipartite_matching(graphs):
A_dense, A_sparse = graphs
func = spgraph.min_weight_full_bipartite_matching
actual = func(A_sparse[0:2, 1:3])
A_csc = sp.csc_array(A_dense)
desired = func(A_csc[0:2, 1:3])
desired1 = func(A_csc[0:2, 1:3].tocoo())
assert_equal(actual, desired)
assert_equal(actual, desired1)
@check_sparse_version("0.15.4")
@pytest.mark.parametrize(
"func",
[
spgraph.shortest_path,
spgraph.dijkstra,
spgraph.floyd_warshall,
spgraph.bellman_ford,
spgraph.johnson,
spgraph.minimum_spanning_tree,
]
)
@pytest.mark.parametrize(
"fill_value, comp_func",
[(np.inf, np.isposinf), (np.nan, np.isnan)],
)
def test_nonzero_fill_value(graphs, func, fill_value, comp_func):
A_dense, A_sparse = graphs
A_sparse = A_sparse.astype(float)
A_sparse.fill_value = fill_value
sparse_cls = type(A_sparse)
actual = func(A_sparse)
desired = func(sp.csc_array(A_dense))
if func == spgraph.minimum_spanning_tree:
assert isinstance(actual, sparse_cls)
assert comp_func(actual.fill_value)
actual = actual.todense()
actual[comp_func(actual)] = 0.0
assert_equal(actual, desired.todense())
else:
assert_equal(actual, desired)

View file

@ -0,0 +1,70 @@
import numpy as np
from numpy.testing import assert_equal
from scipy.sparse.csgraph import reverse_cuthill_mckee, structural_rank
from scipy.sparse import csc_array, csr_array, coo_array
def test_graph_reverse_cuthill_mckee():
A = np.array([[1, 0, 0, 0, 1, 0, 0, 0],
[0, 1, 1, 0, 0, 1, 0, 1],
[0, 1, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 0, 0, 1, 0],
[0, 1, 0, 0, 0, 1, 0, 1]], dtype=int)
graph = csr_array(A)
perm = reverse_cuthill_mckee(graph)
correct_perm = np.array([6, 3, 7, 5, 1, 2, 4, 0])
assert_equal(perm, correct_perm)
# Test int64 indices input
graph.indices = graph.indices.astype('int64')
graph.indptr = graph.indptr.astype('int64')
perm = reverse_cuthill_mckee(graph, True)
assert_equal(perm, correct_perm)
def test_graph_reverse_cuthill_mckee_ordering():
data = np.ones(63,dtype=int)
rows = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2,
2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5,
6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9,
9, 10, 10, 10, 10, 10, 11, 11, 11, 11,
12, 12, 12, 13, 13, 13, 13, 14, 14, 14,
14, 15, 15, 15, 15, 15])
cols = np.array([0, 2, 5, 8, 10, 1, 3, 9, 11, 0, 2,
7, 10, 1, 3, 11, 4, 6, 12, 14, 0, 7, 13,
15, 4, 6, 14, 2, 5, 7, 15, 0, 8, 10, 13,
1, 9, 11, 0, 2, 8, 10, 15, 1, 3, 9, 11,
4, 12, 14, 5, 8, 13, 15, 4, 6, 12, 14,
5, 7, 10, 13, 15])
graph = csr_array((data, (rows,cols)))
perm = reverse_cuthill_mckee(graph)
correct_perm = np.array([12, 14, 4, 6, 10, 8, 2, 15,
0, 13, 7, 5, 9, 11, 1, 3])
assert_equal(perm, correct_perm)
def test_graph_structural_rank():
# Test square matrix #1
A = csc_array([[1, 1, 0],
[1, 0, 1],
[0, 1, 0]])
assert_equal(structural_rank(A), 3)
# Test square matrix #2
rows = np.array([0,0,0,0,0,1,1,2,2,3,3,3,3,3,3,4,4,5,5,6,6,7,7])
cols = np.array([0,1,2,3,4,2,5,2,6,0,1,3,5,6,7,4,5,5,6,2,6,2,4])
data = np.ones_like(rows)
B = coo_array((data,(rows,cols)), shape=(8,8))
assert_equal(structural_rank(B), 6)
#Test non-square matrix
C = csc_array([[1, 0, 2, 0],
[2, 0, 4, 0]])
assert_equal(structural_rank(C), 2)
#Test tall matrix
assert_equal(structural_rank(C.T), 2)

View file

@ -0,0 +1,540 @@
from io import StringIO
import warnings
import numpy as np
from numpy.testing import assert_array_almost_equal, assert_array_equal, assert_allclose
from pytest import raises as assert_raises
from scipy.sparse.csgraph import (shortest_path, dijkstra, johnson,
bellman_ford, construct_dist_matrix, yen,
NegativeCycleError)
import scipy.sparse
from scipy.io import mmread
import pytest
directed_G = np.array([[0, 3, 3, 0, 0],
[0, 0, 0, 2, 4],
[0, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[2, 0, 0, 2, 0]], dtype=float)
# Undirected version of directed_G
undirected_G = np.array([[0, 3, 3, 1, 2],
[3, 0, 0, 2, 4],
[3, 0, 0, 0, 0],
[1, 2, 0, 0, 2],
[2, 4, 0, 2, 0]], dtype=float)
unweighted_G = (directed_G > 0).astype(float)
# Correct shortest path lengths for directed_G and undirected_G
directed_SP = [[0, 3, 3, 5, 7],
[3, 0, 6, 2, 4],
[np.inf, np.inf, 0, np.inf, np.inf],
[1, 4, 4, 0, 8],
[2, 5, 5, 2, 0]]
directed_2SP_0_to_3 = [[-9999, 0, -9999, 1, -9999],
[-9999, 0, -9999, 4, 1]]
undirected_SP = np.array([[0, 3, 3, 1, 2],
[3, 0, 6, 2, 4],
[3, 6, 0, 4, 5],
[1, 2, 4, 0, 2],
[2, 4, 5, 2, 0]], dtype=float)
undirected_SP_limit_2 = np.array([[0, np.inf, np.inf, 1, 2],
[np.inf, 0, np.inf, 2, np.inf],
[np.inf, np.inf, 0, np.inf, np.inf],
[1, 2, np.inf, 0, 2],
[2, np.inf, np.inf, 2, 0]], dtype=float)
undirected_SP_limit_0 = np.ones((5, 5), dtype=float) - np.eye(5)
undirected_SP_limit_0[undirected_SP_limit_0 > 0] = np.inf
# Correct predecessors for directed_G and undirected_G
directed_pred = np.array([[-9999, 0, 0, 1, 1],
[3, -9999, 0, 1, 1],
[-9999, -9999, -9999, -9999, -9999],
[3, 0, 0, -9999, 1],
[4, 0, 0, 4, -9999]], dtype=float)
undirected_pred = np.array([[-9999, 0, 0, 0, 0],
[1, -9999, 0, 1, 1],
[2, 0, -9999, 0, 0],
[3, 3, 0, -9999, 3],
[4, 4, 0, 4, -9999]], dtype=float)
# Other graphs
directed_sparse_zero_G = scipy.sparse.csr_array(
(
[0, 1, 2, 3, 1],
([0, 1, 2, 3, 4], [1, 2, 0, 4, 3]),
),
shape=(5, 5),
)
directed_sparse_zero_SP = [[0, 0, 1, np.inf, np.inf],
[3, 0, 1, np.inf, np.inf],
[2, 2, 0, np.inf, np.inf],
[np.inf, np.inf, np.inf, 0, 3],
[np.inf, np.inf, np.inf, 1, 0]]
undirected_sparse_zero_G = scipy.sparse.csr_array(
(
[0, 0, 1, 1, 2, 2, 1, 1],
([0, 1, 1, 2, 2, 0, 3, 4], [1, 0, 2, 1, 0, 2, 4, 3])
),
shape=(5, 5),
)
undirected_sparse_zero_SP = [[0, 0, 1, np.inf, np.inf],
[0, 0, 1, np.inf, np.inf],
[1, 1, 0, np.inf, np.inf],
[np.inf, np.inf, np.inf, 0, 1],
[np.inf, np.inf, np.inf, 1, 0]]
directed_negative_weighted_G = np.array([[0, 0, 0],
[-1, 0, 0],
[0, -1, 0]], dtype=float)
directed_negative_weighted_SP = np.array([[0, np.inf, np.inf],
[-1, 0, np.inf],
[-2, -1, 0]], dtype=float)
methods = ['auto', 'FW', 'D', 'BF', 'J']
def test_dijkstra_limit():
limits = [0, 2, np.inf]
results = [undirected_SP_limit_0,
undirected_SP_limit_2,
undirected_SP]
def check(limit, result):
SP = dijkstra(undirected_G, directed=False, limit=limit)
assert_array_almost_equal(SP, result)
for limit, result in zip(limits, results):
check(limit, result)
def test_directed():
def check(method):
SP = shortest_path(directed_G, method=method, directed=True,
overwrite=False)
assert_array_almost_equal(SP, directed_SP)
for method in methods:
check(method)
def test_undirected():
def check(method, directed_in):
if directed_in:
SP1 = shortest_path(directed_G, method=method, directed=False,
overwrite=False)
assert_array_almost_equal(SP1, undirected_SP)
else:
SP2 = shortest_path(undirected_G, method=method, directed=True,
overwrite=False)
assert_array_almost_equal(SP2, undirected_SP)
for method in methods:
for directed_in in (True, False):
check(method, directed_in)
def test_directed_sparse_zero():
# test directed sparse graph with zero-weight edge and two connected components
def check(method):
SP = shortest_path(directed_sparse_zero_G, method=method, directed=True,
overwrite=False)
assert_array_almost_equal(SP, directed_sparse_zero_SP)
for method in methods:
check(method)
def test_undirected_sparse_zero():
def check(method, directed_in):
if directed_in:
SP1 = shortest_path(directed_sparse_zero_G, method=method, directed=False,
overwrite=False)
assert_array_almost_equal(SP1, undirected_sparse_zero_SP)
else:
SP2 = shortest_path(undirected_sparse_zero_G, method=method, directed=True,
overwrite=False)
assert_array_almost_equal(SP2, undirected_sparse_zero_SP)
for method in methods:
for directed_in in (True, False):
check(method, directed_in)
@pytest.mark.parametrize('directed, SP_ans',
((True, directed_SP),
(False, undirected_SP)))
@pytest.mark.parametrize('indices', ([0, 2, 4], [0, 4], [3, 4], [0, 0]))
def test_dijkstra_indices_min_only(directed, SP_ans, indices):
SP_ans = np.array(SP_ans)
indices = np.array(indices, dtype=np.int64)
min_ind_ans = indices[np.argmin(SP_ans[indices, :], axis=0)]
min_d_ans = np.zeros(SP_ans.shape[0], SP_ans.dtype)
for k in range(SP_ans.shape[0]):
min_d_ans[k] = SP_ans[min_ind_ans[k], k]
min_ind_ans[np.isinf(min_d_ans)] = -9999
SP, pred, sources = dijkstra(directed_G,
directed=directed,
indices=indices,
min_only=True,
return_predecessors=True)
assert_array_almost_equal(SP, min_d_ans)
assert_array_equal(min_ind_ans, sources)
SP = dijkstra(directed_G,
directed=directed,
indices=indices,
min_only=True,
return_predecessors=False)
assert_array_almost_equal(SP, min_d_ans)
@pytest.mark.parametrize('n', (10, 100, 1000))
def test_dijkstra_min_only_random(n):
rng = np.random.default_rng(7345782358920239234)
data = scipy.sparse.random_array((n, n), density=0.5, format='lil',
rng=rng, dtype=np.float64)
data.setdiag(np.zeros(n, dtype=np.bool_))
# choose some random vertices
v = np.arange(n)
rng.shuffle(v)
indices = v[:int(n*.1)]
ds, pred, sources = dijkstra(data,
directed=True,
indices=indices,
min_only=True,
return_predecessors=True)
for k in range(n):
p = pred[k]
s = sources[k]
while p != -9999:
assert sources[p] == s
p = pred[p]
@pytest.mark.parametrize('n', (10, 100))
@pytest.mark.parametrize("method", ['FW', 'J', 'BF'])
@pytest.mark.parametrize('directed', (True, False))
def test_star_graph(n, method, directed):
# Build the star graph
star_arr = np.zeros((n, n), dtype=float)
star_center_idx = 0
star_arr[star_center_idx, :] = star_arr[:, star_center_idx] = range(n)
G = scipy.sparse.csr_matrix(star_arr, shape=(n, n))
# Build the distances matrix
SP_solution = np.zeros((n, n), dtype=float)
SP_solution[:] = star_arr[star_center_idx]
for idx in range(1, n):
SP_solution[idx] += star_arr[idx, star_center_idx]
np.fill_diagonal(SP_solution, 0)
SP = shortest_path(G, method=method, directed=directed)
assert_allclose(
SP_solution, SP
)
def test_dijkstra_random():
# reproduces the hang observed in gh-17782
n = 10
indices = [0, 4, 4, 5, 7, 9, 0, 6, 2, 3, 7, 9, 1, 2, 9, 2, 5, 6]
indptr = [0, 0, 2, 5, 6, 7, 8, 12, 15, 18, 18]
data = [0.33629, 0.40458, 0.47493, 0.42757, 0.11497, 0.91653, 0.69084,
0.64979, 0.62555, 0.743, 0.01724, 0.99945, 0.31095, 0.15557,
0.02439, 0.65814, 0.23478, 0.24072]
graph = scipy.sparse.csr_array((data, indices, indptr), shape=(n, n))
dijkstra(graph, directed=True, return_predecessors=True)
def test_gh_17782_segfault():
text = """%%MatrixMarket matrix coordinate real general
84 84 22
2 1 4.699999809265137e+00
6 14 1.199999973177910e-01
9 6 1.199999973177910e-01
10 16 2.012000083923340e+01
11 10 1.422000026702881e+01
12 1 9.645999908447266e+01
13 18 2.012000083923340e+01
14 13 4.679999828338623e+00
15 11 1.199999973177910e-01
16 12 1.199999973177910e-01
18 15 1.199999973177910e-01
32 2 2.299999952316284e+00
33 20 6.000000000000000e+00
33 32 5.000000000000000e+00
36 9 3.720000028610229e+00
36 37 3.720000028610229e+00
36 38 3.720000028610229e+00
37 44 8.159999847412109e+00
38 32 7.903999328613281e+01
43 20 2.400000000000000e+01
43 33 4.000000000000000e+00
44 43 6.028000259399414e+01
"""
data = mmread(StringIO(text), spmatrix=False)
dijkstra(data, directed=True, return_predecessors=True)
def test_shortest_path_indices():
indices = np.arange(4)
def check(func, indshape):
outshape = indshape + (5,)
SP = func(directed_G, directed=False,
indices=indices.reshape(indshape))
assert_array_almost_equal(SP, undirected_SP[indices].reshape(outshape))
for indshape in [(4,), (4, 1), (2, 2)]:
for func in (dijkstra, bellman_ford, johnson, shortest_path):
check(func, indshape)
assert_raises(ValueError, shortest_path, directed_G, method='FW',
indices=indices)
def test_predecessors():
SP_res = {True: directed_SP,
False: undirected_SP}
pred_res = {True: directed_pred,
False: undirected_pred}
def check(method, directed):
SP, pred = shortest_path(directed_G, method, directed=directed,
overwrite=False,
return_predecessors=True)
assert_array_almost_equal(SP, SP_res[directed])
assert_array_almost_equal(pred, pred_res[directed])
for method in methods:
for directed in (True, False):
check(method, directed)
def test_construct_shortest_path():
def check(method, directed):
SP1, pred = shortest_path(directed_G,
directed=directed,
overwrite=False,
return_predecessors=True)
SP2 = construct_dist_matrix(directed_G, pred, directed=directed)
assert_array_almost_equal(SP1, SP2)
for method in methods:
for directed in (True, False):
check(method, directed)
@pytest.mark.parametrize("directed", [True, False])
def test_construct_dist_matrix_predecessors_error(directed):
SP1, pred = shortest_path(directed_G,
directed=directed,
overwrite=False,
return_predecessors=True)
assert_raises(TypeError, construct_dist_matrix,
directed_G, pred.astype(np.int64), directed)
def test_unweighted_path():
def check(method, directed):
SP1 = shortest_path(directed_G,
directed=directed,
overwrite=False,
unweighted=True)
SP2 = shortest_path(unweighted_G,
directed=directed,
overwrite=False,
unweighted=False)
assert_array_almost_equal(SP1, SP2)
for method in methods:
for directed in (True, False):
check(method, directed)
def test_negative_cycles():
# create a small graph with a negative cycle
graph = np.ones([5, 5])
graph.flat[::6] = 0
graph[1, 2] = -2
def check(method, directed):
assert_raises(NegativeCycleError, shortest_path, graph, method,
directed)
for directed in (True, False):
for method in ['FW', 'J', 'BF']:
check(method, directed)
assert_raises(NegativeCycleError, yen, graph, 0, 1, 1,
directed=directed)
@pytest.mark.parametrize("method", ['FW', 'J', 'BF'])
def test_negative_weights(method):
SP = shortest_path(directed_negative_weighted_G, method, directed=True)
assert_allclose(SP, directed_negative_weighted_SP, atol=1e-10)
def test_masked_input():
np.ma.masked_equal(directed_G, 0)
def check(method):
SP = shortest_path(directed_G, method=method, directed=True,
overwrite=False)
assert_array_almost_equal(SP, directed_SP)
for method in methods:
check(method)
def test_overwrite():
G = np.array([[0, 3, 3, 1, 2],
[3, 0, 0, 2, 4],
[3, 0, 0, 0, 0],
[1, 2, 0, 0, 2],
[2, 4, 0, 2, 0]], dtype=float)
foo = G.copy()
shortest_path(foo, overwrite=False)
assert_array_equal(foo, G)
@pytest.mark.parametrize('method', methods)
def test_buffer(method):
# Smoke test that sparse matrices with read-only buffers (e.g., those from
# joblib workers) do not cause::
#
# ValueError: buffer source array is read-only
#
G = scipy.sparse.csr_array([[1.]])
G.data.flags['WRITEABLE'] = False
shortest_path(G, method=method)
def test_NaN_warnings():
with warnings.catch_warnings(record=True) as record:
shortest_path(np.array([[0, 1], [np.nan, 0]]))
for r in record:
assert r.category is not RuntimeWarning
def test_sparse_matrices():
# Test that using lil,csr and csc sparse matrix do not cause error
G_dense = np.array([[0, 3, 0, 0, 0],
[0, 0, -1, 0, 0],
[0, 0, 0, 2, 0],
[0, 0, 0, 0, 4],
[0, 0, 0, 0, 0]], dtype=float)
SP = shortest_path(G_dense)
G_csr = scipy.sparse.csr_array(G_dense)
G_csc = scipy.sparse.csc_array(G_dense)
G_lil = scipy.sparse.lil_array(G_dense)
assert_array_almost_equal(SP, shortest_path(G_csr))
assert_array_almost_equal(SP, shortest_path(G_csc))
assert_array_almost_equal(SP, shortest_path(G_lil))
def test_yen_directed():
distances, predecessors = yen(
directed_G,
source=0,
sink=3,
K=2,
return_predecessors=True
)
assert_allclose(distances, [5., 9.])
assert_allclose(predecessors, directed_2SP_0_to_3)
def test_yen_dense():
dense_undirected_G = np.array([
[0, 3, 3, 1, 2],
[3, 0, 7, 6, 5],
[3, 7, 0, 4, 0],
[1, 6, 4, 0, 2],
[2, 5, 0, 2, 0]], dtype=float)
distances = yen(
dense_undirected_G,
source=0,
sink=4,
K=5,
directed=False,
)
assert_allclose(distances, [2., 3., 8., 9., 11.])
def test_yen_undirected():
distances = yen(
undirected_G,
source=0,
sink=3,
K=4,
directed=False,
)
assert_allclose(distances, [1., 4., 5., 8.])
def test_yen_unweighted():
# Ask for more paths than there are, verify only the available paths are returned
distances, predecessors = yen(
directed_G,
source=0,
sink=3,
K=4,
unweighted=True,
return_predecessors=True,
)
assert_allclose(distances, [2., 3.])
assert_allclose(predecessors, directed_2SP_0_to_3)
def test_yen_no_paths():
distances = yen(
directed_G,
source=2,
sink=3,
K=1,
)
assert distances.size == 0
def test_yen_negative_weights():
distances = yen(
directed_negative_weighted_G,
source=2,
sink=0,
K=1,
)
assert_allclose(distances, [-2.])
@pytest.mark.parametrize("min_only", (True, False))
@pytest.mark.parametrize("directed", (True, False))
@pytest.mark.parametrize("return_predecessors", (True, False))
@pytest.mark.parametrize("index_dtype", (np.int32, np.int64))
@pytest.mark.parametrize("indices", (None, [1]))
def test_20904(min_only, directed, return_predecessors, index_dtype, indices):
"""Test two failures from gh-20904: int32 and indices-as-None."""
adj_mat = scipy.sparse.eye_array(4, format="csr")
adj_mat = scipy.sparse.csr_array(
(
adj_mat.data,
adj_mat.indices.astype(index_dtype),
adj_mat.indptr.astype(index_dtype),
),
)
dijkstra(
adj_mat,
directed,
indices=indices,
min_only=min_only,
return_predecessors=return_predecessors,
)

View file

@ -0,0 +1,66 @@
"""Test the minimum spanning tree function"""
import numpy as np
from numpy.testing import assert_
import numpy.testing as npt
from scipy.sparse import csr_array
from scipy.sparse.csgraph import minimum_spanning_tree
def test_minimum_spanning_tree():
# Create a graph with two connected components.
graph = [[0,1,0,0,0],
[1,0,0,0,0],
[0,0,0,8,5],
[0,0,8,0,1],
[0,0,5,1,0]]
graph = np.asarray(graph)
# Create the expected spanning tree.
expected = [[0,1,0,0,0],
[0,0,0,0,0],
[0,0,0,0,5],
[0,0,0,0,1],
[0,0,0,0,0]]
expected = np.asarray(expected)
# Ensure minimum spanning tree code gives this expected output.
csgraph = csr_array(graph)
mintree = minimum_spanning_tree(csgraph)
mintree_array = mintree.toarray()
npt.assert_array_equal(mintree_array, expected,
'Incorrect spanning tree found.')
# Ensure that the original graph was not modified.
npt.assert_array_equal(csgraph.toarray(), graph,
'Original graph was modified.')
# Now let the algorithm modify the csgraph in place.
mintree = minimum_spanning_tree(csgraph, overwrite=True)
npt.assert_array_equal(mintree.toarray(), expected,
'Graph was not properly modified to contain MST.')
np.random.seed(1234)
for N in (5, 10, 15, 20):
# Create a random graph.
graph = 3 + np.random.random((N, N))
csgraph = csr_array(graph)
# The spanning tree has at most N - 1 edges.
mintree = minimum_spanning_tree(csgraph)
assert_(mintree.nnz < N)
# Set the sub diagonal to 1 to create a known spanning tree.
idx = np.arange(N-1)
graph[idx,idx+1] = 1
csgraph = csr_array(graph)
mintree = minimum_spanning_tree(csgraph)
# We expect to see this pattern in the spanning tree and otherwise
# have this zero.
expected = np.zeros((N, N))
expected[idx, idx+1] = 1
npt.assert_array_equal(mintree.toarray(), expected,
'Incorrect spanning tree found.')

View file

@ -0,0 +1,148 @@
import numpy as np
import pytest
from numpy.testing import assert_array_almost_equal
from scipy.sparse import csr_array, csr_matrix, coo_array, coo_matrix
from scipy.sparse.csgraph import (breadth_first_tree, depth_first_tree,
csgraph_to_dense, csgraph_from_dense, csgraph_masked_from_dense)
def test_graph_breadth_first():
csgraph = np.array([[0, 1, 2, 0, 0],
[1, 0, 0, 0, 3],
[2, 0, 0, 7, 0],
[0, 0, 7, 0, 1],
[0, 3, 0, 1, 0]])
csgraph = csgraph_from_dense(csgraph, null_value=0)
bfirst = np.array([[0, 1, 2, 0, 0],
[0, 0, 0, 0, 3],
[0, 0, 0, 7, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
for directed in [True, False]:
bfirst_test = breadth_first_tree(csgraph, 0, directed)
assert_array_almost_equal(csgraph_to_dense(bfirst_test),
bfirst)
def test_graph_depth_first():
csgraph = np.array([[0, 1, 2, 0, 0],
[1, 0, 0, 0, 3],
[2, 0, 0, 7, 0],
[0, 0, 7, 0, 1],
[0, 3, 0, 1, 0]])
csgraph = csgraph_from_dense(csgraph, null_value=0)
dfirst = np.array([[0, 1, 0, 0, 0],
[0, 0, 0, 0, 3],
[0, 0, 0, 0, 0],
[0, 0, 7, 0, 0],
[0, 0, 0, 1, 0]])
for directed in [True, False]:
dfirst_test = depth_first_tree(csgraph, 0, directed)
assert_array_almost_equal(csgraph_to_dense(dfirst_test), dfirst)
def test_return_type():
from .._laplacian import laplacian
from .._min_spanning_tree import minimum_spanning_tree
np_csgraph = np.array([[0, 1, 2, 0, 0],
[1, 0, 0, 0, 3],
[2, 0, 0, 7, 0],
[0, 0, 7, 0, 1],
[0, 3, 0, 1, 0]])
csgraph = csr_array(np_csgraph)
assert isinstance(laplacian(csgraph), coo_array)
assert isinstance(minimum_spanning_tree(csgraph), csr_array)
for directed in [True, False]:
assert isinstance(depth_first_tree(csgraph, 0, directed), csr_array)
assert isinstance(breadth_first_tree(csgraph, 0, directed), csr_array)
csgraph = csgraph_from_dense(np_csgraph, null_value=0)
assert isinstance(csgraph, csr_array)
assert isinstance(laplacian(csgraph), coo_array)
assert isinstance(minimum_spanning_tree(csgraph), csr_array)
for directed in [True, False]:
assert isinstance(depth_first_tree(csgraph, 0, directed), csr_array)
assert isinstance(breadth_first_tree(csgraph, 0, directed), csr_array)
csgraph = csgraph_masked_from_dense(np_csgraph, null_value=0)
assert isinstance(csgraph, np.ma.MaskedArray)
assert csgraph._baseclass is np.ndarray
# laplacian doesnt work with masked arrays so not here
assert isinstance(minimum_spanning_tree(csgraph), csr_array)
for directed in [True, False]:
assert isinstance(depth_first_tree(csgraph, 0, directed), csr_array)
assert isinstance(breadth_first_tree(csgraph, 0, directed), csr_array)
# start of testing with matrix/spmatrix types
with np.testing.suppress_warnings() as sup:
sup.filter(DeprecationWarning, "the matrix subclass.*")
sup.filter(PendingDeprecationWarning, "the matrix subclass.*")
nm_csgraph = np.matrix([[0, 1, 2, 0, 0],
[1, 0, 0, 0, 3],
[2, 0, 0, 7, 0],
[0, 0, 7, 0, 1],
[0, 3, 0, 1, 0]])
csgraph = csr_matrix(nm_csgraph)
assert isinstance(laplacian(csgraph), coo_matrix)
assert isinstance(minimum_spanning_tree(csgraph), csr_matrix)
for directed in [True, False]:
assert isinstance(depth_first_tree(csgraph, 0, directed), csr_matrix)
assert isinstance(breadth_first_tree(csgraph, 0, directed), csr_matrix)
csgraph = csgraph_from_dense(nm_csgraph, null_value=0)
assert isinstance(csgraph, csr_matrix)
assert isinstance(laplacian(csgraph), coo_matrix)
assert isinstance(minimum_spanning_tree(csgraph), csr_matrix)
for directed in [True, False]:
assert isinstance(depth_first_tree(csgraph, 0, directed), csr_matrix)
assert isinstance(breadth_first_tree(csgraph, 0, directed), csr_matrix)
mm_csgraph = csgraph_masked_from_dense(nm_csgraph, null_value=0)
assert isinstance(mm_csgraph, np.ma.MaskedArray)
# laplacian doesnt work with masked arrays so not here
assert isinstance(minimum_spanning_tree(csgraph), csr_matrix)
for directed in [True, False]:
assert isinstance(depth_first_tree(csgraph, 0, directed), csr_matrix)
assert isinstance(breadth_first_tree(csgraph, 0, directed), csr_matrix)
# end of testing with matrix/spmatrix types
def test_graph_breadth_first_trivial_graph():
csgraph = np.array([[0]])
csgraph = csgraph_from_dense(csgraph, null_value=0)
bfirst = np.array([[0]])
for directed in [True, False]:
bfirst_test = breadth_first_tree(csgraph, 0, directed)
assert_array_almost_equal(csgraph_to_dense(bfirst_test), bfirst)
def test_graph_depth_first_trivial_graph():
csgraph = np.array([[0]])
csgraph = csgraph_from_dense(csgraph, null_value=0)
bfirst = np.array([[0]])
for directed in [True, False]:
bfirst_test = depth_first_tree(csgraph, 0, directed)
assert_array_almost_equal(csgraph_to_dense(bfirst_test),
bfirst)
@pytest.mark.parametrize('directed', [True, False])
@pytest.mark.parametrize('tree_func', [breadth_first_tree, depth_first_tree])
def test_int64_indices(tree_func, directed):
# See https://github.com/scipy/scipy/issues/18716
g = csr_array(([1], np.array([[0], [1]], dtype=np.int64)), shape=(2, 2))
assert g.indices.dtype == np.int64
tree = tree_func(g, 0, directed=directed)
assert_array_almost_equal(csgraph_to_dense(tree), [[0, 1], [0, 0]])

View file

@ -0,0 +1,22 @@
# This file is not meant for public use and will be removed in SciPy v2.0.0.
# Use the `scipy.sparse` namespace for importing the functions
# included below.
from scipy._lib.deprecation import _sub_module_deprecation
__all__ = [ # noqa: F822
'csr_matrix',
'isspmatrix_csr',
'spmatrix',
]
def __dir__():
return __all__
def __getattr__(name):
return _sub_module_deprecation(sub_package="sparse", module="csr",
private_modules=["_csr"], all=__all__,
attribute=name)

View file

@ -0,0 +1,18 @@
# This file is not meant for public use and will be removed in SciPy v2.0.0.
# Use the `scipy.sparse` namespace for importing the functions
# included below.
from scipy._lib.deprecation import _sub_module_deprecation
__all__ : list[str] = []
def __dir__():
return __all__
def __getattr__(name):
return _sub_module_deprecation(sub_package="sparse", module="data",
private_modules=["_data"], all=__all__,
attribute=name)

View file

@ -0,0 +1,22 @@
# This file is not meant for public use and will be removed in SciPy v2.0.0.
# Use the `scipy.sparse` namespace for importing the functions
# included below.
from scipy._lib.deprecation import _sub_module_deprecation
__all__ = [ # noqa: F822
'dia_matrix',
'isspmatrix_dia',
'spmatrix',
]
def __dir__():
return __all__
def __getattr__(name):
return _sub_module_deprecation(sub_package="sparse", module="dia",
private_modules=["_dia"], all=__all__,
attribute=name)

View file

@ -0,0 +1,22 @@
# This file is not meant for public use and will be removed in SciPy v2.0.0.
# Use the `scipy.sparse` namespace for importing the functions
# included below.
from scipy._lib.deprecation import _sub_module_deprecation
__all__ = [ # noqa: F822
'dok_matrix',
'isspmatrix_dok',
'spmatrix',
]
def __dir__():
return __all__
def __getattr__(name):
return _sub_module_deprecation(sub_package="sparse", module="dok",
private_modules=["_dok"], all=__all__,
attribute=name)

View file

@ -0,0 +1,23 @@
# This file is not meant for public use and will be removed in SciPy v2.0.0.
# Use the `scipy.sparse` namespace for importing the functions
# included below.
from scipy._lib.deprecation import _sub_module_deprecation
__all__ = [ # noqa: F822
'coo_matrix',
'find',
'tril',
'triu',
]
def __dir__():
return __all__
def __getattr__(name):
return _sub_module_deprecation(sub_package="sparse", module="extract",
private_modules=["_extract"], all=__all__,
attribute=name)

View file

@ -0,0 +1,22 @@
# This file is not meant for public use and will be removed in SciPy v2.0.0.
# Use the `scipy.sparse` namespace for importing the functions
# included below.
from scipy._lib.deprecation import _sub_module_deprecation
__all__ = [ # noqa: F822
'isspmatrix_lil',
'lil_array',
'lil_matrix',
]
def __dir__():
return __all__
def __getattr__(name):
return _sub_module_deprecation(sub_package="sparse", module="lil",
private_modules=["_lil"], all=__all__,
attribute=name)

Some files were not shown because too many files have changed in this diff Show more