yann-lecun-tecnico

Sub-skill técnica de Yann LeCun. Cobre CNNs, LeNet, backpropagation, JEPA (I-JEPA, V-JEPA, MC-JEPA), AMI (Advanced Machinery of Intelligence), Self-Supervised Learning (SimCLR, MAE, BYOL), Energy-Based Models (EBMs) e código PyTorch completo.

31,392 stars

bysickn33

View on GitHub Installation ↓

Best use case

yann-lecun-tecnico is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Sub-skill técnica de Yann LeCun. Cobre CNNs, LeNet, backpropagation, JEPA (I-JEPA, V-JEPA, MC-JEPA), AMI (Advanced Machinery of Intelligence), Self-Supervised Learning (SimCLR, MAE, BYOL), Energy-Based Models (EBMs) e código PyTorch completo.

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "yann-lecun-tecnico" skill to help with this workflow task. Context: Sub-skill técnica de Yann LeCun. Cobre CNNs, LeNet, backpropagation, JEPA (I-JEPA, V-JEPA, MC-JEPA), AMI (Advanced Machinery of Intelligence), Self-Supervised Learning (SimCLR, MAE, BYOL), Energy-Based Models (EBMs) e código PyTorch completo.

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/yann-lecun-tecnico/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/yann-lecun-tecnico/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/yann-lecun-tecnico/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How yann-lecun-tecnico Compares

Feature / Agent	yann-lecun-tecnico	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# YANN LECUN — MÓDULO TÉCNICO v3.0

## Overview

Sub-skill técnica de Yann LeCun. Cobre CNNs, LeNet, backpropagation, JEPA (I-JEPA, V-JEPA, MC-JEPA), AMI (Advanced Machinery of Intelligence), Self-Supervised Learning (SimCLR, MAE, BYOL), Energy-Based Models (EBMs) e código PyTorch completo.

## When to Use This Skill

- When you need specialized assistance with this domain

## Do Not Use This Skill When

- The task is unrelated to yann lecun tecnico
- A simpler, more specific tool can handle the request
- The user needs general-purpose assistance without domain expertise

## How It Works

> Este módulo é carregado pelo agente yann-lecun principal quando a conversa
> exige profundidade técnica. Você continua sendo LeCun — apenas com acesso
> a todo o arsenal técnico.

---

## Convolutional Neural Networks: Do Princípio

A operação de convolução 2D discreta:

```
Saida[i][j] = sum_{m} sum_{n} Input[i+m][j+n] * Kernel[m][n]
```

O insight arquitetural **triplo** das CNNs:

**1. Local Connectivity**
```

## Antes (Fully Connected): Neurônio I -> Todos Os Pixels

params = input_size * hidden_size  # enorme

## Cnns: Neurônio -> Região Local [K X K]

params = kernel_h * kernel_w * in_channels * out_channels

## Fisicamente Motivado: Features Visuais São Locais

```

**2. Weight Sharing**
```

## Resultado: Translation Equivariance

for i in range(output_height):
    for j in range(output_width):
        output[i][j] = conv2d(input[i:i+k, j:j+k], shared_kernel)
```

**3. Hierarquia de Representações**
```

## Total: ~60,000 Parâmetros

```

O insight central: **features não precisam ser handcrafted**. Aprendem por gradiente.
Em 2012, AlexNet provou. Eu dizia isso desde 1989.

## Backpropagation: A Equação Central

```
delta_L = dL/da_L  (gradiente na camada de saída)
delta_l = (W_{l+1}^T * delta_{l+1}) * f'(z_l)
dL/dW_l = delta_l * a_{l-1}^T
dL/db_l = delta_l
```

Backprop não é algoritmo milagroso. É chain rule aplicada a funções compostas.
Implementável eficientemente em GPUs por ser sequência de multiplicações de matrizes.

## Self-Supervised Learning: Objetivos E Formalização

**Variante generativa (MAE, BERT)**:
```
L_gen = E[||f_theta(x_masked) - x_target||^2]

## Para Imagens: Cada Pixel. Desperdiçador De Capacidade.

```

**Variante contrastiva (SimCLR, MoCo)**:
```
L_contrastive = -log( exp(sim(z_i, z_j) / tau) /
                      sum_k exp(sim(z_i, z_k) / tau) )

## Tau: Temperature Hyperparameter

```

Problema das contrastivas: precisam de "negatives" — batch grande. Motivou BYOL e JEPA.

---

## Formulação Central

JEPA: **prever em espaço de representações, não em espaço de inputs**.

```

## Dois Encoders (Ou Um Com Stop-Gradient):

s_x = f_theta(x)           # contexto encoder
s_y = f_theta_bar(y)       # target encoder (momentum de theta)

## Predictor:

s_hat_y = g_phi(s_x)       # prevê representação de y dado x

## Objetivo:

L_JEPA = ||s_y - s_hat_y||^2    # MSE no espaço de representações

## Prevenção De Colapso: Target Encoder Usa Momentum (Ema)

theta_bar <- m * theta_bar + (1-m) * theta   # m ~ 0.996
```

**Por que JEPA supera geração de pixels/tokens**:

| Abordagem | Prevê | Capacidade gasta em | Semântica |
|-----------|-------|---------------------|-----------|
| MAE | Pixels exatos | Texturas, ruídos, irrelevantes | Custosamente |
| BERT | Tokens exatos | Detalhes lexicais | Custosamente |
| Contrastiva | Invariâncias | Negativos (batch grande) | Sim |
| **JEPA** | **Representação abstrata** | **Relações semânticas** | **Eficientemente** |

## I-Jepa: Pseudocódigo Pytorch Completo

```python
import torch
import torch.nn as nn
import torch.nn.functional as F
import copy

class IJEPA(nn.Module):
    """
    I-JEPA: Image Joint Embedding Predictive Architecture
    Assran et al. 2023 — CVPR
    """
    def __init__(self, encoder, predictor, momentum=0.996):
        super().__init__()
        self.context_encoder = encoder
        self.target_encoder = copy.deepcopy(encoder)
        self.predictor = predictor
        self.momentum = momentum

        for param in self.target_encoder.parameters():
            param.requires_grad = False

    @torch.no_grad()
    def update_target_encoder(self):
        """EMA update"""
        for param_ctx, param_tgt in zip(
            self.context_encoder.parameters(),
            self.target_encoder.parameters()
        ):
            param_tgt.data = (
                self.momentum * param_tgt.data +
                (1 - self.momentum) * param_ctx.data
            )

    def forward(self, images):
        context_patches, target_patches, masks = self.create_masks(images)
        context_embeds = self.context_encoder(context_patches, masks)

        with torch.no_grad():
            target_embeds = self.target_encoder(target_patches)

        predicted_embeds = self.predictor(context_embeds, target_positions)
        loss = F.mse_loss(predicted_embeds, target_embeds.detach())
        return loss

    def create_masks(self, images, num_target_blocks=4, context_scale=0.85):
        """
        Estratégia I-JEPA:
        - Múltiplos blocos alvo aleatórios (alto aspect ratio)
        - Contexto: imagem com blocos alvo mascarados
        """
        B, C, H, W = images.shape
        patch_size = 16
        n_patches_h = H // patch_size
        n_patches_w = W // patch_size

        target_masks = generate_random_blocks(
            n_patches_h, n_patches_w,
            num_blocks=num_target_blocks,
            scale_range=(0.15, 0.2),
            aspect_ratio_range=(0.75, 1.5)
        )
        context_mask = ~targe

## V-Jepa: Extensão Temporal

```python

## Prever Representação De Frames Futuros Em Posições Mascaradas

L_V_JEPA = E[||f_target(video_masked) - g(f_ctx(video_ctx), positions)||^2]

## Sem Nenhum Label.

```

## Hierarquia De Encoders

Level 0: pixels -> patches -> representações locais (bordas, texturas)
Level 1: patches -> regiões -> representações de objetos
Level 2: regiões -> cena -> representações de relações espaciais
Level 3: cena -> temporal -> representações de eventos

## Cada Nível Tem Seu Próprio Jepa:

L_total = sum_l lambda_l * L_JEPA_l

## Resultado: World Model Hierárquico Multi-Escala

```

---

## Seção Ami — Advanced Machinery Of Intelligence

Paper: "A Path Towards Autonomous Machine Intelligence" (2022)

## Os 6 Módulos Do Ami

```
+----------------------------------------------------------+
|                 SISTEMA AMI COMPLETO                      |
|                                                          |
|  +-----------+    +------------------+                  |
|  | Perceptor |    | World Model      |                  |
|  | (encoders)|    | (JEPA hierárquico)|                 |
|  +-----------+    +------------------+                  |
|        |                  |                             |
|        v                  v                             |
|  +----------+    +------------------+                   |
|  | Memory   |<-->| Cost Module      |                   |
|  | (epis,   |    | (intrínseco +    |                   |
|  |  semant) |    |  configurável)   |                   |
|  +----------+    +------------------+                   |
|                           |                             |
|                  +------------------+                   |
|                  | Actor (planner   |                   |
|                  | + executor)      |                   |
|                  +------------------+                   |
+----------------------------------------------------------+
```

**Módulo 1 — Configurator**: Configura os outros módulos para a tarefa atual.

**Módulo 2 — Perception**: Encoders sensório-motores que alimentam o world model.

**Módulo 3 — World Model** (coração do sistema):
```

## Simulação Interna: "O Que Acontece Se Eu Fizer X?"

predicted_next_state = world_model(current_state, action_X)
cost_predicted = cost_module(predicted_next_state)

## Escolhe Ação Que Minimiza O Custo

```

**Módulo 4 — Cost Module**:
```

## Dois Tipos De Custo:

E(s) = alpha * intrinsic_cost(s) + beta * task_cost(s)

## Task_Cost: Objetivo Configurável Por Tarefa/Humano

```

**Módulo 5 — Short-term Memory**: Buffer de estados, simulações, contexto imediato.

**Módulo 6 — Actor**:
- Modo reativo: ações diretas do estado atual
- Modo deliberativo: simula múltiplos futuros, escolhe mínimo custo

## Ami Vs Llms

| Feature | LLM | AMI |
|---------|-----|-----|
| Objetivo | Prever próximo token | Minimizar erro em representação |
| World model | Nenhum | Módulo dedicado central |
| Planning | Texto sobre planning | Planning real com simulação |
| Memória | Context window (fixo) | Memória episódica atualizável |
| Objetivos | Apenas treinamento | Cost module configurável |
| Input | Texto | Multi-modal (video, audio, propriocepção) |
| Causalidade | Correlacional | Causal (dinâmicas do mundo) |

---

## Seção Ebm — Energy-Based Models

Contribuição subestimada que vai ser mais influente a longo prazo.

**O problema com probabilísticos**:
```
P(x) = exp(-E(x)) / Z
Z = integral exp(-E(x)) dx   # intratável em alta dimensão!
```

**A solução EBM**: esquecer Z. Defina E(x) onde:
- Baixa energia = configuração compatível com dados observados
- Alta energia = configuração incompatível

```python
class EnergyBasedModel(nn.Module):
    """
    EBM: F(x) = energia de x
    P(x) ~ exp(-F(x)) / Z  — mas nunca calculamos Z!
    Vantagem: sem partition function intratável.
    """
    def __init__(self, latent_dim=512):
        super().__init__()
        self.energy_net = nn.Sequential(
            nn.Linear(latent_dim, 256),
            nn.SiLU(),
            nn.Linear(256, 128),
            nn.SiLU(),
            nn.Linear(128, 1)  # escalar: energia
        )

    def energy(self, x):
        return self.energy_net(x).squeeze(-1)

    def contrastive_loss(self, x_pos, x_neg):
        """
        L = E[F(x_pos)] - E[F(x_neg)] + regularização
        Queremos: E_pos < E_neg
        """
        E_pos = self.energy(x_pos)
        E_neg = self.energy(x_neg)
        loss = E_pos.mean() - E_neg.mean()
        reg = 0.1 * (E_pos.pow(2).mean() + E_neg.pow(2).mean())
        return loss + reg

## Ebms Capturam Isso Naturalmente — São Sobre Compatibilidade, Não Probabilidade."

```

**JEPA como EBM no espaço de representações**:
```
E(x, y) = ||f_theta(x) - g_phi(f_theta_bar(y))||^2

## Simclr Simplificado

```python
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as T


class ProjectionHead(nn.Module):
    """MLP que projeta representações para espaço contrastivo"""
    def __init__(self, in_dim=512, hidden_dim=256, out_dim=128):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(in_dim, hidden_dim),
            nn.BatchNorm1d(hidden_dim),
            nn.ReLU(inplace=True),
            nn.Linear(hidden_dim, out_dim)
        )

    def forward(self, x):
        return F.normalize(self.net(x), dim=-1)


class SimCLRLoss(nn.Module):
    """NT-Xent Loss (Chen et al. 2020)"""
    def __init__(self, temperature=0.5):
        super().__init__()
        self.temp = temperature

    def forward(self, z1, z2):
        """
        z1, z2: [B, D] — duas views do mesmo batch
        z1[i] e z2[i]: positive pair
        Todos outros pares: negatives
        """
        B = z1.size(0)
        z = torch.cat([z1, z2], dim=0)
        sim = torch.mm(z, z.t()) / self.temp
        mask = torch.eye(2*B, device=z.device).bool()
        sim.masked_fill_(mask, float('-inf'))
        labels = torch.arange(B, device=z.device)
        labels = torch.cat([labels + B, labels])
        return F.cross_entropy(sim, labels)


def get_ssl_augmentations(size=224):
    """
    As augmentações DEFINEM o que o modelo aprende a ser invariante.
    Rotação -> invariância a rotação.
    Crop -> invariância a posição.
    """
    return T.Compose([
        T.RandomResizedCrop(size, scale=(0.2, 1.0)),
        T.RandomHorizontalFlip(),
        T.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.1),
        T.RandomGrayscale(p=0.2),
        T.GaussianBlur(kernel_size=size//10*2+1, sigma=(0.1, 2.0)),
        T.ToTensor(),
        T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
```

## Lenet-5 Original Em Pytorch Moderno

```python
class LeNet5Modern(nn.Module):
    """
    LeNet-5 (LeCun et al. 1998) reimplementada em PyTorch moderno.
    Esta arquitetura rodou em produção no Bank of America em 1993.
    ~60,000 parâmetros. Mesmos princípios de modelos modernos com bilhões.
    """
    def __init__(self, num_classes=10):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 6, kernel_size=5, padding=2),
            nn.Tanh(),
            nn.AvgPool2d(kernel_size=2, stride=2),
            nn.Conv2d(6, 16, kernel_size=5),
            nn.Tanh(),
            nn.AvgPool2d(kernel_size=2, stride=2),
            nn.Conv2d(16, 120, kernel_size=5),
            nn.Tanh(),
        )
        self.classifier = nn.Sequential(
            nn.Linear(120, 84),
            nn.Tanh(),
            nn.Linear(84, num_classes),
        )

    def forward(self, x):
        x = self.features(x)    # [B, 120, 1, 1]
        x = x.view(x.size(0), -1)
        return self.classifier(x)
```

---

## Papers Fundamentais (Lecun)

- LeCun et al. (1998). "Gradient-Based Learning Applied to Document Recognition" — IEEE 86(11)
- LeCun et al. (2015). "Deep Learning" — Nature 521:436-444
- LeCun (2022). "A Path Towards Autonomous Machine Intelligence" — OpenReview preprint

## Jepa Papers

- Assran et al. (2023). "Self-Supervised Learning from Images with a JEPA" — CVPR 2023 (I-JEPA)
- Bardes et al. (2024). "V-JEPA: Self-Supervised Learning of Video Representations" — NeurIPS 2023
- LeCun (2016). "Predictive Learning" — NIPS Keynote (The Cake Analogy)

## Ssl Relevantes

- He et al. (2022). "Masked Autoencoders Are Scalable Vision Learners" — CVPR 2022
- Chen et al. (2020). "A Simple Framework for Contrastive Learning" (SimCLR) — ICML 2020
- Grill et al. (2020). "Bootstrap Your Own Latent" (BYOL) — NeurIPS 2020

## Energy-Based Models

- LeCun et al. (2006). "A Tutorial on Energy-Based Learning" — ICLR Workshop
- LeCun (2021). "Energy-Based Models for Autonomous and Predictive Learning" — ICLR Keynote

## Best Practices

- Provide clear, specific context about your project and requirements
- Review all suggestions before applying them to production code
- Combine with other complementary skills for comprehensive analysis

## Common Pitfalls

- Using this skill for tasks outside its domain expertise
- Applying recommendations without understanding your specific context
- Not providing enough project context for accurate analysis

## Related Skills

- `yann-lecun` - Complementary skill for enhanced analysis
- `yann-lecun-debate` - Complementary skill for enhanced analysis
- `yann-lecun-filosofia` - Complementary skill for enhanced analysis

## Limitations
- Use this skill only when the task clearly matches the scope described above.
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.

Related Skills

yann-lecun

31392

from sickn33/antigravity-awesome-skills

Agente que simula Yann LeCun — inventor das Convolutional Neural Networks, Chief AI Scientist da Meta, Prêmio Turing 2018.

yann-lecun-filosofia

31392

from sickn33/antigravity-awesome-skills

Sub-skill filosófica e pedagógica de Yann LeCun.

yann-lecun-debate

31392

from sickn33/antigravity-awesome-skills

Sub-skill de debates e posições de Yann LeCun. Cobre críticas técnicas detalhadas aos LLMs, rivalidades intelectuais (LeCun vs Hinton, Sutskever, Russell, Yudkowsky, Bostrom), lista completa de rejeições a afirmações mainstream, posição sobre risco existencial de IA, e técnicas de debate ao vivo.

nextjs-best-practices

31392

from sickn33/antigravity-awesome-skills

Next.js App Router principles. Server Components, data fetching, routing patterns.

network-101

31392

from sickn33/antigravity-awesome-skills

Configure and test common network services (HTTP, HTTPS, SNMP, SMB) for penetration testing lab environments. Enable hands-on practice with service enumeration, log analysis, and security testing against properly configured target systems.

neon-postgres

31392

from sickn33/antigravity-awesome-skills

Expert patterns for Neon serverless Postgres, branching, connection pooling, and Prisma/Drizzle integration

nanobanana-ppt-skills

31392

from sickn33/antigravity-awesome-skills

AI-powered PPT generation with document analysis and styled images

multi-agent-patterns

31392

from sickn33/antigravity-awesome-skills

This skill should be used when the user asks to "design multi-agent system", "implement supervisor pattern", "create swarm architecture", "coordinate multiple agents", or mentions multi-agent patterns, context isolation, agent handoffs, sub-agents, or parallel agent execution.

monorepo-management

31392

from sickn33/antigravity-awesome-skills

Build efficient, scalable monorepos that enable code sharing, consistent tooling, and atomic changes across multiple packages and applications.

monetization

31392

from sickn33/antigravity-awesome-skills

Estrategia e implementacao de monetizacao para produtos digitais - Stripe, subscriptions, pricing experiments, freemium, upgrade flows, churn prevention, revenue optimization e modelos de negocio SaaS.

modern-javascript-patterns

31392

from sickn33/antigravity-awesome-skills

Comprehensive guide for mastering modern JavaScript (ES6+) features, functional programming patterns, and best practices for writing clean, maintainable, and performant code.

microservices-patterns

31392

from sickn33/antigravity-awesome-skills

Master microservices architecture patterns including service boundaries, inter-service communication, data management, and resilience patterns for building distributed systems.