Introduction to TDA

What is Topological Data Analysis?

Topological Data Analysis (short: TDA) is a framework in applied mathematics that is based on concepts found in algebraic topology. The objective of TDA lies in the analysis of geometrical data, especially by detecting robust features that remain invariant under continuous deformation. This can be particularly useful for high-dimensional data where the 'underlying' shape and structure are not immediately visible. In our previously mentioned 3D-Lenia lifeforms, we are interested in their interpretation and how they are similar to real, biological structures in nature. TDA is a potentially useful tool for categorizing the various lifeforms and track the changes during a lifeforms lifetime on a fundamental level.

When studying the topology of some dataset, we are usually interested in the basic building blocks that make up the fundamental shape. These building blocks can be thought of as the simplest shapes imaginable for a given dimensionality and are called simplices. According to the dimension, such a simplex can be (f.e.) a point, line segments, triangle or a tetrahedron (or their analogues in higher dimensions). We talk about a simplicial complex when we have a structured set of simplices, such that all faces and intersections of them are also part of the set. Since we are working with 3D-volume data, we are primarily interested in 0-dimensional, 1-dimensional and 2-dimensional features. These structural features are measured by using homology groups H_k. The rank of these groups are described through Betti numbers, which describe how many independent features exist per dimension. Intuitively these Betti numbers can be interpreted as follows:

β₀: Number of disconnected components (for example separate blobs in space)
β₁: Number of independent loops/tunnels
β₂: Number of enclosed voids (like the inside of a hollow sphere)

Persistent Homology

When working with real-world data, we often encounter a certain amount of noise that is not inherent to the geometry one wants to study. This is where we work with persistent homology, a method for computing topological features over varying spatial resolutions. The idea is that 'true' features will stay persistent over a wider range of spatial scales while noise or other artifacts will only persist for a comparably smaller spatial resolution extend. Persistent Homology introduces a filtration, which builts complexes over varying threshold values. While this threshold increases, topological features will appear and possibly vanish. We describe this as the birth and death of a feature, and we call the (threshold) distance between a features birth and death its persistence. Since every feature can be described by a (birth, death) pair, we can visualize a datasets features by plotting them on a persistence diagram, where one axis counts when a features birth occurs and the other when it dies. The closer a feature appears to the diagrams diagonal, the more it is short-lived and likely irrelevant for the underlying shape.

Cubical Complexes and Vietoris-Rips Complexes

The most prominent complexes usually used are cubical complexes and Vietoris-Rips complexes. The former are best suited for image-like data, for example 2D data in a pixel-grid or 3D data in a voxel grid. In the 3D case each voxel is treated as a cube, where adjacent voxels share a face, edges and vertices. In this case we can form a filtration by sorting the voxels according to their corresponding scalar value (density). This creates sublevel and superlevel sets, according to the value threshold that is sweeped through. This is a fairly straightforward approach and benefits from the fact that adjancency is easily inherited from the cubical nature of voxel data.

Vietoris-Rips complexes on the other hand are primarily used for cases where we don't have such a 'neat' grid to work with. Point clouds are a common use-case for these complexes. Given a set of points we pick a distance parameter that progressively increases with our filtration. When a pair of points is within this distance, we connect them with an edge. Similarly, triplets of mutually close points form triangles (and so on for higher dimensions). Since Lenia lifeforms are described through volumetric data the natural choice was to use Cubical Complexes.

In this practical I am particularly interested in Betti numbers, as these correspond directly to separate regions, tunnels/connections and cavities. Assuming one has found a filtration that truly only leaves highly persistent features, the Betti numbers are a particularly interesting metric to describe how a lifeform evolves over time. This is why I make use of plotting Betti-numbers over time. This, when synchronized with a lifeforms animation, may hold hints to structural changes that are not immediately obvious.