Self-supervised Learning
<aside>
❓
What is Collapse?
Model doesn’t learn diverse and informative embeddings and treat “everything as the same”. To prevent collapse:
- Contrastive Learning (SimCLR, MoCo) → negative samples (ensure different images have different features).
- DINO → Uses centering and sharpening to keep embeddings diverse
- BYOL → Uses asymmetric architecture with a predictor module
</aside>
BYOL (Bootstrap your Own Latent)

SSL: DINO


Multi-crop
essentially make the model learn the same concept from different viewpoints.

Global Embedding Methods Summaru

Breaking Down the Components of the Joint/Global-Embedding Architecture
This diagram represents the core architecture behind self-supervised learning (SSL) methods like SimCLR, MoCo, BYOL, VicReg, and DINO. Let’s go through each part in depth.
1️⃣ Data Augmentation: Generating Different Views
- The image I undergoes two different augmentations (e.g., cropping, color jittering, rotation, blurring) to create two transformed versions:
- I_{\hat{\tau}} (view 1)
- I_{\tilde{\tau}} (view 2)
- These augmentations ensure that the model learns invariances (i.e., understands an object is the same even when it looks different due to transformations).
2️⃣ Encoder: Extracting Features