Using logs for fast multiplication
https://www.youtube.com/watch?v=GG9yOsPEGek
Here is a summary of the video transcript formatted in Markdown.
Tesla’s AI Patent: The “Multiplication to Addition” Breakthrough
Source: Dr. Know-it-all (crediting “Tesla Ming” for spotting the patent) Core Thesis: The fundamental rule for AI breakthroughs is “Multiplication is bad; Addition is good.” Tesla’s new patent applies this rule to solve long-term memory issues in Full Self-Driving (FSD) and Optimus robots.
1. The Problem: The Tyranny of Multiplication
In Neural Networks (NNs), memory and continuity rely on math. However, the traditional method of calculating weights involves heavy multiplication.
- The Exploding Gradient: When you multiply numbers (especially small ones like 0.01) over and over again—which happens in deep networks or over long timeframes (e.g., 30 seconds at 30fps)—the numbers eventually degrade.
- Vanishing: The number becomes effectively zero.
- Exploding: The number shoots toward infinity.
- The Consequence: The AI loses “memory.” A car might forget a stop sign seen 30 seconds ago because the mathematical values representing that stop sign have vanished due to repeated multiplication.
- The Hardware Cost: To combat this, computers usually increase precision (32-bit, 64-bit, 128-bit). This requires massive memory bandwidth and compute power, which isn’t feasible in a car or humanoid robot due to power and thermal constraints.
2. The Evolution of AI Memory
- RNNs & LSTMs: Earlier architectures used heavy multiplication and failed at long-context tasks.
- Transformers: Revolutionized AI by converting some multiplication into addition (via Attention mechanisms), allowing for larger context windows, though they still suffer from drift over long periods.
3. The Solution: Rotary Positional Encoding (RoPE)
Tesla utilizes a concept called RoPE. Instead of treating data as static numbers, it treats them as vectors (magnitude and direction).
- Rotation: To encode position/time, the vector is rotated.
- Why it helps: Rotating a vector involves adding angles (degrees) rather than multiplying values. This moves the math from a multiplicative domain to an additive domain.
4. Tesla’s Patent: The “Cheat Code” (Logarithmic Addition)
Tesla engineers patented a way to run elite, high-precision models on cheap, 8-bit hardware by converting the remaining multiplication tasks into addition tasks.
How it works:
-
The Logarithm Rule: They utilize the mathematical property: This allows the system to add two logarithms rather than multiplying two complex numbers.
-
Lookup Tables (LUTs): Calculating logarithms is computationally expensive. Tesla bypasses this by using a Lookup Table.
- Instead of doing the math, the chip looks up the pre-calculated log value.
- They map high-precision numbers into 256 bins (perfect for 8-bit integers).
-
Non-Uniform Bins: Since neural networks deal mostly with small numbers (near zero), the lookup table is “weighted.” It has high resolution for small numbers and lower resolution for large numbers.
-
Reconstitution: At the end of the process, they convert the log-space answer back to a standard number (using techniques like Horner’s Method to approximate Taylor expansions) using mostly addition.
5. The Benefits
- Stability: Moves from “vanishing gradients” to “linear drift.” Errors accumulate linearly (slowly) rather than exponentially (instantly).
- Efficiency: Allows cheap 8-bit chips to perform like high-end 32-bit systems.
- Performance:
- Zero compute heavy lifting (just fetching values from a table).
- Minimal memory bandwidth usage.
- No thermal throttling.
- Result: This allows Tesla vehicles and Optimus bots to maintain “long-term memory” (30+ seconds) without needing a supercomputer in the trunk.
Summary: Tesla has patented a method to perform high-fidelity AI calculations using low-fidelity hardware by converting complex multiplication problems into simple addition problems using logarithmic lookup tables.