Todo

From Eigen
Jump to: navigation, search
Eigen Owl Todo.jpg

This is the general, long-term Todo list, with a special emphasis on open jobs for new contributors to tackle. There is also a separate Todo for 3.0 aimed specifically at preparing Eigen 3.0.

Notice to prospective contributors

Before you start spending time on any of these items, contact us about it!

  • Sometimes the Todo is outdated
  • Sometimes some items in the todo haven't been sufficiently discussed

So by contacting us you can avoid any bad surprise.

Parallelization

At some point we will definitely want that parallelization occurs inside Eigen itself. Details are there: Working notes - SMP support.

BLAS and LAPACK backends

Although we have very good performance for 1 CPU core, we don't have any parallelization so we don't take advantage of multicore CPUs; and we also don't take advantage of GPUs. An easy way to address that, is to implement BLAS and LAPACK backends for common heavy operations (matrix product could use BLAS, eigensolvers could use LAPACK...). In this way, users may use Eigen together with optimized BLAS/LAPACK libraries. These backends would be optional of course. An example to follow is how the Sparse module already has optional backends. This job is very accessible to someone who doesn't necessarily have much Eigen experience, and is very useful.

BLAS implementation on top of Eigen

This is the converse of the above job. We could offer a BLAS implemented using Eigen.

Update: This has been started and already well advanced by Gael, see blas/ and lapack/ directories. The eigen_blas library is complete. The eigen_lapack currently implements cholesky and lu decomposition. Contact us if you want to help.

Complex numbers

  • Support for complex numbers and mixing complex-real is already pretty good. However, some operations are not vectorized or fully optimized yet. E.g., complex-real scalar (or coeff wise) products. Complex normalization, etc.

Geometry module

The main part is there: Transform, rotations (Quaternion, Angle-Axis), cross product.

  • Available job: Add methods to create perspective/orthogonal projection matrix ?
  • Available job: Extend the current HyperPlane and ParametrizedLine classes with all features one could expect to see here, and complete this set of basic geometry primitives with at least AlignedBox and HyperSphere. Perhaps some other primitives could be added too.
  • Need a fixed-size Gram-Schmidt here, so it's not clear whether it belongs here or in SVD (or QR !). Is this a case for having it in a separate module?
  • Available job: Add quaternion fitting. This is an algorithm taking two sets of points and finding the rotation making the first set of points as close as possible to the second set of points. Of course this can be done by performing a Gram-Schmidt on the transition matrix, but there is a quaternion-based method that is reportedly faster, and in wide use in chemistry. This would be useful for OpenBabel. See this link: http://server.ccl.net/cca/software/SOURCES/C/quaternion-mol-fit/README.shtml and this paper: http://charles.karney.info/papers/jmgm07.pdf Update: see this thread on the mailing list: http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/04/msg00173.html (we still need someone to actually do make it happen)

Optimization module

Available job

This module should provide easy to use optimization routines for non-linear minimization, differential equations, etc. For instance, I'm thinking to algorithms such as Levenberg-Marquardt for least-squares problems, the multiple Runge-Kutta variants for ODE, etc. Those routines should provide a flexible interface. For instance, with respect to the derivatives, it should allow to either provide them manually, ask the routines to compute them using finite differences or using an automatic differentiation library (eg., adolc).

Note that this module is highly related to the Sparse module and other linear and dense solver modules (LU, Cholesky, etc.).

Status There is an experimental module in unsupported/NonLinear, which provide good implementation of Levenberg-Marquardt and non-linear system solving, based on the algorithms found in the well-known netlib package "MINPACK". It can use the unsupported/NumericalDiff' module.

Initial support for Automatic differentiation (one based on adol-c, and one purely in eigen) is already present in the devel branch under unsupported/Eigen/AutoDiff.

LU module

Available jobs:

  • Write fixed-size specializations. Not only of the LU decomposition but of all the methods like solve() and computeImage(), etc.
    • So far we only have specializations for inverse() and determinant(), that's not enough.
  • The benefit of vectorization is not nearly as high as it should be, there is room for improvement.
    • In PartialLU, this should be a matter of examining the asm carefully.
    • In (full pivoting) LU, the main limitation is that the maxCoeff() visitor is not vectorized. So vectorizing visitors would be most useful here. If not possible to do in such generality then consider adding a specialized function for that.
  • Make sure that everything goes as fast for row-major matrices as it does for column-major matrices. At least insofar as it doesn't require to write 2x more code everywhere. Also, the choice of code paths should be made at compile-time, i.e. compile only 1 path.

Cholesky module

    • Write a meta unroller for small fixed sizes (LLT) : already tried but this did not bring any significant benefits...

Sparse matrices/vectors

  • The goal is to provide a seamless integration of sparse matrix on top of Eigen with both basic linear algebra features and more advanced algorithms like linear and eigen value solvers. Those high level algorithms could be implemented using external libraries while providing a unified API.
  • This module is just started, see the specific SparseMatrix page for the details.

Unit-tests

  • Keep adding more!

Documentation

  • Keep updating
  • The online tutorial needs love. badly.
    • Especially the Geometry and Advances linear algebra pages
  • Write special pages on certain topics, illustrate them with examples:
    • compile-time switches (preprocessor symbols like EIGEN_NO_DEBUG etc)
    • Compiler support/options and Vectorization : already covered in the FAQ, but feel free to start a more detailed dox page
    • basic notions about Eigen expressions, MatrixBase, the standard typedefs such as PlainMatrixType and Scalar...
    • Thread safety
    • Fixed-size matrices/vectors and fixed-size variants of many methods like block()
      • in particular, usage with OpenGL
    • Dynamic-size matrices/vectors


Core module

This module is essentially complete and working, but needs improvement in these areas:

  • Expression evaluator

Working notes - Expression evaluator.

  • General API considerations
    • Move internal stuffs into dedicated namespace(s) ?
      • ---> It's just going to add a lot of "::" around, I don't see the benefit. Namespaces are justified when one is interested in doing "using" on part or all of the namespace. But here? I'd rather not say that the prefix is what makes something "internal", because a lot of internal things don't have a prefix already. For example ProductBase: if tomorrow we find a better design where ProductBase isn't useful anymore, we shouldn't be bound by compatibility requirements to keep it. So I consider ProductBase as internal. Conclusion: the prefixing isn't here to determine what's internal (that is rather done in the documentation). Instead, the prefixing is there to prevent conflicts in case the user does both "using namespace Eigen" and "using namespace std" (or other libs he might use). It's therefore hard to determine when to use it. We can't drop it altogether, e.g. we can't drop it on ei_sin, and more generally stuff like ei_aligned_malloc would be very polluting without the prefix. We could just say that _all_ lowercase_names start with ei_ so at least we'd be consistent. So ei_aligned_allocator. Actually, aligned_allocator is very polluting.
    • Provide good support for Matrix of Matrix, and more generally provide good support for general complex types. Here by "complex" type I mean large types made of, e.g., many scalar values. A good example is the AutoDiffScalar type which is mainly a pair of scalar and vector.


  • Vectorization
    • Vectorize diagonal product. The current diagonal product takes an actual matrix that just happens to be diagonal. We need instead it to take a DiagonalMatrix expression (that's just a wrapper around the diagonal vector). We can then remove the old path as it can be implemented (if needed, uncommon anyway) as m.diagonal().asDiagonal() * other. Since m.diagonal() doesn't have the PacketAccessBit, we must ensure that this propagates to asDiagonal() so Eigen knows that it can't vectorize this latter product.
    • vectorize fixed sizes that aren't a multiple of a packet but still are bigger than a packet. Example: Linear operations (like cwise expression assignment) on Matrix3f and Matrix3d.
      • Hm on second thought, doing that without runtime checks would require the data to be aligned. Which can't be the default as it can waste memory (though very little in the case of Matrix3d). But then if it's not the default it's hard to offer as an option because of binary compatibility...
    • Improve partial redux vectorization: according to the storage order of the matrix and the redux direction different strategies can be implemented...
    • vectorize more coeff wise operations (e.g., comparisons)
      • to vectorize comparisons we first need to change the comparison operators to return bitmasks of the same size as the scalar type with all bits equal to ones (or all zero). We can do that only for vectorized types. And good news: the compiler is clever enough to remove this extra stuff when it is not needed (even for double/int64 on a 32 bits system).
      • the second step is to vectorize select using bitmask operations, however make sure it is worth it because here both the "then" and "else" cases will be evaluated (use the expression cost)
      • must vectorize all() and any() as well, otherwise it is useless !
    • Explicitly vectorized loops are usually surrounded by two small scalar loops. We therefore have to take care the compiler does not try to vectorize them itself! Hopefully it is smart enough to find out the iteration counts is always smaller than a packet size. Otherwise, gcc-4.4 provides new function attributes allowing to tweak any -f options (as well as several -m options) on a per function basis.


  • Misc Features
    • Add stride to Map ? Note the current workaround is to use block, e.g.: Map<MatrixXf>(ptr,stride,cols).block(offset,0,rows,cols)
    • Shallow copy of dynamic matrix: we already have efficient swap for them, what is really needed is to add a .isTemporary()/.markAsTemporary() to MatrixStorage which could use the last bit if the size to store the state. Then the Matrix constructor and operator= that take the same Matrix type as input can simply check the temporary state to select between a simple swap or a deep copy (see the Sparse module). Note that since the temporary state is know at compile time we could also use a flag but then we create artificial new types.
      • could be done only for c++1x using Rvalue references/move semantic
    • Peel loops where appropriate.
    • (NOTE: isn't this outdated? We do use alloca currently) Consider the use of alloca (dynamic allocation on the stack for dynamic-sized temporary). After some benchmarks, it seems this really pay off for matrices smaller than 16x16, for larger matrices the gain is not so obvious. In practice alloca cannot be called in MatrixStorage, and it as to be called in the function creating the temporary. This might be implemented using Matrix::map().
    • Another fine tuning idea (very low priority): reuse a previously allocated temporary when possible. For instance, theoretically, the Cholesky decomposition could be done "in place", by working directly on the input matrix. Currently, Cholesky stores its own data. However, a typical use case is: (m.adjoint() * m).cholesky()... In that case, if m is large enough, the matrix product creates a temporary and Cholesky too while the latter could simply "pick" the data allocated by the product since we know this data cannot be reused by any other expression. I guess this concept could be extended to a more general mechanism. (this is only applicable for dynamic-size matrix and in the case we don't use alloca, cf. the previous point)

Array module

  • Misc features
    • add shift(int) and rotate(int) for vectors. The arg for both is an integer, guess negative would mean "on the left", and positive "on the right", thought that seems quite western-oriented.

Transcendental functions

It would be great if Eigen used Julien Pommier's fast, SSE optimized log/exp/sin/cos functions available from this page. Done for sin, cos, log, exp

Available job: do the same for other functions taking inspiration from [cephes] (tan, acos, asin, atan, etc.) note: have to discuss which functions are useful and which aren't.

Special matrices

(bjacob handling this at the moment) Goal; provide good support for: triangular, self-adjoint, diagonal, band, Toeplitz matrices. See this page: SpecialMatrix.

SVD module

The current implementation uses code from JAMA. It supports MxN rectangular matrices with M>=N of real scalar values (no complex yet). It also provides a robust linear solver. job (bjacob handling this at the moment):

  • rewrite the decomposition algorithm in a clean and understandable way,
  • with support for complex matrices,
  • Consider support for reduced versions : http://en.wikipedia.org/wiki/Singular_value_decomposition#Reduced_SVDs
  • and maybe with specializations for small fixed-size if needed (since it a high level algo, it should be automatic though).
    • I don't think we can expect GCC to adequately unroll loops in the case of fixed size. That alone may justify doing fixed-size specializations. Then quite often there also shortcuts available for small sizes.

QR module

at the moment the following seem mostly being taken care of by Andrea and bjacob

  • implement Givens QR : taken care of by Andrea (check this branch)
  • implement efficient QR decomposition for 2x2 and 3x3 matrix with optional computation of R (maybe using Gram-Schmitd instead of Householder transformations ? or Givens)
  • polish (partially rewrite) eigensolvers, probably split them away in a separate module (bjacob handling this, as this is very similar to SVD)
    • see http://www.acm.caltech.edu/~mlatini/research/qr_alg-feb04.pdf for a comprehensive introduction to shifted QR and a brief survey of accelerating techniques.
    • spectral radius, matrix exponential and maybe more operator calculus by applying a cwiseUnaryOp to the eigenvalues vector.
    • available job: generalized eigen problem using Cholesky decomposition (done for the Ax=lBx case, perhaps we want to provide API to handle the other cases: BAx=lx and ABx=lx ?)

Statistics module

This seems to be taken care of by Marton, see his branch at http://bitbucket.org/marton78/eigen/ , feel free to contact him to offer help...

This module should support at least mean, variance, standard deviation, sorting, median, etc. all in a matrix-, column- and row-wise flavors. See the respective ML thread for some design discussion (http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2008/09/msg00014.html). Feature list proposal:

  • mean, variance, standard deviation, absolute deviation, skew, kurtosis
  • covariance / principal component analysis (PCA), dimension reductions
    • then the regression module could also be merged here (e.g., fitHyperplane could simply call the PCA function).
  • support for weighted data
  • sorting
  • median, percentiles
  • histogram

FFT module

A new effort was started in May 2009 by Mark Borgerding to port/adapt kissfft to Eigen. More information about progress and opportunities to help out can be found at the EigenFFT page. The code is on the main development branch, under unsupported/Eigen/src/FFT

A previous offering was initiated by Tim Molteno. See this ML thread for the details. He offered a template metaprogramming example that, while clever, had two drawbacks. It was only capable of power-of-2 transforms and the size must be known at compile time.

Random ideas