Let’s say we want to compress a sparse diagonal (or symmetric matrix). Here’s how scipy.sparse (and IBM) approach this, in standard DIAgonal format:

```
>>> from scipy.sparse import dia_matrix
>>> data = np.array([[1,0,0,5], [0,2,0,0], [8,0,3,0], [6,8,0,4]])
>>> data
array([[1, 0, 0, 5],
[0, 2, 0, 0],
[8, 0, 3, 0],
[6, 8, 0, 4]])
>>> m = dia_matrix(data)
>>> m
<4x4 sparse matrix of type '<class 'numpy.int64'>'
with 8 stored elements (4 diagonals) in DIAgonal format>
>>> m.data # ...usually labeled AD
array([[6, 0, 0, 0],
[8, 8, 0, 0],
[1, 2, 3, 4],
[0, 0, 0, 5]])
>>> m.offsets # ...usually labeled LA
array([-3, -2, 0, 3], dtype=int32)
```

The offsets in this case refer to the diagonal index of each band. In this case, we have:

```
array([[ 0, 1, 2, 3],
[-1, 0, 1, 2],
[-2, -1, 0, 1],
[-3, -2, -1, 0]])
```

Note however that this approach pads the bands with zeros. I’m not exactly sure why; perhaps to simplify indexing or provide a standard matrix representation. However, this requires additional space that is effectively wasted. What if, instead, we use a single dimension and provide offsets, i.e.,

```
>>> from my_scipy.sparse import dia_matrix_v2
>>> data = np.array([[1,0,0,5], [0,2,0,0], [8,0,3,0], [6,8,0,4]])
>>> data
array([[1, 0, 0, 5],
[0, 2, 0, 0],
[8, 0, 3, 0],
[6, 8, 0, 4]])
>>> m = dia_matrix_v2(data)
>>> m.data
array([6, 8, 8, 1, 2, 3, 4, 5])
>>> m.offsets
{-3: 0, -2: 1, 0: 3, 3: 7}
```

We’ve provided a mapping from its diagonal index to its offset in the data array. Moreover, we can determine the end of the slice implicitly from the diagonal index. Given an $N \times N$ matrix and a diagonal index $d_i$, the length is $N - \text{abs}(d_i)$. More generally, given a data array $AD$ and a offset mapping $LA$ for an $N \times N$ matrix, the diagonal at $d_i$ is bounded in $AD$ by:

$$ \Bigl[LA[d_i], LA[d_i] + N - \text{abs}(d_i)\Bigr) $$

For example, in the array above, the diagonal at diagonal index 0 is bounded by:

$$ \Bigl[LA[d_i], LA[d_i] + N - \text{abs}(d_i)\Bigr) = \Bigl[3, 3 + 4 - \text{abs}(0)\Bigr) = \Bigl[3, 7\Bigr) $$

Furthermore, this format is amenable to both gather/scatter and coiteration in sparse iteration theory. Given an index $(i, j)$ and a data array AD in row-major order, we can locate the respective value in compressed matrix $m$ using the formula:

$$ m(i,j) = \begin{cases} AD\Bigl( LA[j-i], i \Bigr) \text{ if } (j-i) \in LA \ 0 \text { otherwise } \ \end{cases} $$

The additional requirement for this format is a mapping from the diagonal index to the offset in the data array and an additional lookup to find the offset. Nevertheless, we know ahead of time the size of this mapping is bounded by $N * 2 - 1$ for an $N \times N$ diagonal, asymmetric matrix (and $N$ for a diagonal symmetric matrix). As a result, this format can avoid redundant padding: concretely, we’ve halved the size of the data array in the example above. More generally, the IBM/scipy.sparsepy format requires padding all diagonals to size N, which would require space

$$ 2N^2 - N $$

in the worst case. The intuition behind this is there are $2N - 1$ diagonals in a $N \times N$ matrix, and each requires $N$ space. Conversely, this new format only stores the immediate diagonal values, and requires space equivalent to

$$ N + 2 \sum_{i=1}^{N-1} i = N^2 $$

While only a constant factor improvement, this is still a non-trivial gain for extremely large diagonal matrices. It also provides the same data locality benefits, i.e., all values in a diagonal are stored in contiguous memory and will in theory lead to less cache misses since we don’t pad. Lastly, since a symmetric matrix is a special case of a diagonal matrix, the same optimization can be applied to symmetric matrices.