Deep TDA. A new dimensionality reduction algorithm

Introduction

🚀 Dive into the world of Topological Data Analysis (TDA) paired with Self-supervised ML!

TDA, with its knack for understanding data's shape and structure, meets the power of models learning from unlabeled data.

🧠 The result? Deep TDA!

This new algorithm harnesses the best of both worlds, outpacing traditional tools like t-SNE & UMAP.

From computer vision, and time series analysis to NLP, the applications are vast.

Ready to explore this game-changer in data analysis?

Let's deep dive! 💡

Why TDA?

Dimensionality reduction algorithms like t-SNE and UMAP have been around for a long time and are essential to analyze complex data.

Specifically, t-SNE is one of the most popular algorithms I've seen used in the industry.

Hinton and van der Maaten developed it in 2008.

But now we can do even better!

Deep TDA is a technique created by @datarefiner.

It combines the power of self-supervised learning and Topological Data Analysis (TDA) to unlock new insights from complex datasets.

Advantage

The key advantages of Deep TDA basically are:

1️⃣ Robustness: Unlike t-SNE & UMAP, TDA's topology-based approach is a champ against noise & outliers.

2️⃣ Multiscale Analysis: It's not just about the details; TDA captures the grand picture AND the nitty-gritty.

3️⃣ Complex Patterns: Thanks to self-supervised deep learning, it's a pro at deciphering intricate data relationships.

4️⃣ Low Maintenance: Forget endless parameter tuning; TDA's got you covered.

5️⃣ Scalability: Big, complex datasets? No sweat. TDA's efficiency is unmatched.

Time series use case

In a time-series case study comparing all three techniques:

t-SNE captures too much structure, most of which does not exist.
UMAP does a better job, but the structure is somewhat blurry.
TDA does a much better job and keeps a lot of fine-grained structure.

Image use case

Unfortunately, this data is too complex for the t-SNE/UMAP algorithms to extract any meaningful structure.

In contrast, just providing images to DataRefiner TDA gives us good results.

Text use case

Using the Amazon Fine Food review dataset, t-SNE, and UMAP capture too much structure.

TDA does a much better job of capturing enough structure for humans to unpack.

Tools

1️⃣ scikit-tda: A Python library that provides algorithms for persistent homology, mapper, and other TDA techniques. It also includes visualization tools and datasets for testing.

2️⃣ giotto-tda: An open-source Python library that provides a complete TDA workflow, including preprocessing, feature extraction, and visualization. It supports a variety of TDA methods, including persistent homology, mapper, and UMAP.

3️⃣ Mapper: A Python library for the Mapper algorithm for topological data analysis.

But remember, while open-source is fantastic, some gems like DataRefiner's Deep TDA, but it's not open-source.

Conclusion

In summary, Topological Data Analysis (TDA) combined with self-supervised deep learning is a powerful approach for extracting meaningful information from complex, high-dimensional data.

TDA trumps traditional methods like t-SNE & UMAP in robustness, multiscale insights, and more.

If you like this, share with others ♻️