§ 13 · Instrument Filed 2026.05 · Reference SL · 26 · 013

Two inputs, two neurons, and a curved frontier.

A neural network with three layers and a handful of weights — the smallest contraption that can solve XOR, the problem that famously embarrassed the perceptron for two decades. Drag any edge to change its weight. The decision boundary on the right re-draws itself as you pull. Press step to run one round of backprop on a batch of points and watch the gradients ripple backward, edge by edge.

§ Net 2 · 4 · 1

Drag an edge to change its weight · click a node to see its activation

§ Boundary loss —

Heatmap = network output · dots = labelled points acc —

Class A Class B Positive weight Negative weight

Hidden 4 Params 0 Steps 0 Activation tanh

Activation

Hidden neurons

Learning rate

0.10

Train

Why XOR mattered

§ 02 · Notes

A perceptron — one layer, no hidden units — can only carve the plane with a single straight line. XOR's two diagonal classes are unreachable: there is no line that separates them.

Add even two hidden neurons with a non-linear activation and the boundary can bend. Each hidden unit draws one line; the output unit combines them. Two lines can fence a square.

Backpropagation is the chain rule applied to a graph: gradients flow from the loss at the output back toward the inputs, redistributing blame along edges in proportion to how much each one contributed.

Sigmoid squashes everything into (0, 1) — friendly but slow to learn at the edges. Tanh is the same shape, recentred on zero. ReLU is a hinge that turns off at zero, easy to train but blind below the elbow.

The animation on a step is faithful: each edge pulses with the magnitude of its gradient before it nudges. Bright pulses up front are where this iteration learned the most.

Drag a weight to a wild value and you'll see why initialisation matters. Tiny random numbers near zero usually train; symmetric weights or huge magnitudes stall and shudder.

Implementation notes — forward, backward, and the update rule

The network is a two-layer MLP: h = φ(W₁ x + b₁) followed by ŷ = σ(W₂ h + b₂), where φ is the chosen hidden activation and σ is sigmoid at the output for binary classification. The loss is binary cross-entropy. All matrices and biases live in plain JavaScript arrays — no library.

Backprop falls out of one line of calculus. Output error δ₂ = ŷ − y; hidden error δ₁ = (W₂ᵀ δ₂) · φ′(z₁). Gradients are ∂L/∂W = δ · aᵀ averaged over the batch. Every step the page animates the per-edge magnitude of those gradients along the line connecting the two neurons.

The decision boundary is rendered by sampling the network on a 80 × 80 grid and drawing the result with nearest-neighbour zoom. At 80 × 80 the recompute is cheap enough to run every frame while you are dragging a weight or watching training.

Weights initialise with a He-style scale — 𝒩(0, √(2/n)). Drag any edge with the mouse to override it. Hidden width is clamped to one through eight neurons; with one hidden unit XOR is provably unsolvable, which is itself a useful thing to demonstrate.