Self-supervised Learning

<aside> ❓

What is Collapse?

Model doesn’t learn diverse and informative embeddings and treat “everything as the same”. To prevent collapse:

Contrastive Learning (SimCLR, MoCo) → negative samples (ensure different images have different features).
DINO → Uses centering and sharpening to keep embeddings diverse
BYOL → Uses asymmetric architecture with a predictor module </aside>

BYOL (Bootstrap your Own Latent)

SSL: DINO

Multi-crop

essentially make the model learn the same concept from different viewpoints.

Global Embedding Methods Summaru

Breaking Down the Components of the Joint/Global-Embedding Architecture

This diagram represents the core architecture behind self-supervised learning (SSL) methods like SimCLR, MoCo, BYOL, VicReg, and DINO. Let’s go through each part in depth.

1️⃣ Data Augmentation: Generating Different Views

The image I undergoes two different augmentations (e.g., cropping, color jittering, rotation, blurring) to create two transformed versions:
- I_{\hat{\tau}} (view 1)
- I_{\tilde{\tau}} (view 2)
These augmentations ensure that the model learns invariances (i.e., understands an object is the same even when it looks different due to transformations).