Common Failure Modes & Patterns#

Experienced debuggers are faster not because they measure faster, but because they recognize patterns. Most circuit failures fall into a handful of categories, and learning to recognize the symptoms early saves enormous time.

Power Masquerading as Logic Faults#

The single most common debugging trap. A power problem rarely announces itself as a power problem — it shows up as something else entirely.

What happens: A voltage rail drops below the minimum operating voltage for a downstream IC. The IC doesn’t stop working cleanly — it works wrong. Outputs are at wrong levels, communication protocols produce garbage, state machines go to undefined states.

What it looks like: Firmware crash, I2C/SPI errors, corrupted data, random resets, “noisy” ADC readings, “intermittent” digital failures.

The fix is always the same: Check all power rails first. Every time. Measure at the IC supply pin, not at the regulator output — voltage drop across traces and vias matters.

See Is voltage correct under load?

Thermal Failures#

The circuit works when cold and fails when hot (or vice versa). Temperature-dependent faults point to a component operating at the edge of its margin.

Common culprits:

  • Solder joints with hairline cracks — expansion breaks contact, contraction restores it
  • Electrolytic capacitors with rising ESR — worse when hot
  • Transistors with gain that shifts with temperature — biasing goes wrong
  • Crystal oscillators that shift frequency outside the tolerance window

How to provoke:

  • Heat gun on individual components (not the whole board) to isolate which part is sensitive
  • Freeze spray on suspected components — if the circuit starts working when a specific component is frozen, that component is the suspect
  • Let the board warm up naturally and monitor for when the failure appears — note which components are hottest at that moment

Caution: Heat guns can damage components. Use low settings, keep moving, and don’t heat anything that can’t be replaced.

Ground Problems#

“Ground” is not a single point — it’s a network of copper with non-zero impedance. When return currents flow through that impedance, different “ground” points sit at different voltages.

Symptoms:

  • Voltage difference between two ground points (measure it — even millivolts matter for analog)
  • Signals that look noisy on the scope but measure fine with a DMM
  • Behavior that changes when the scope ground clip is moved to a different location
  • Audio hum, video bars, or repeating interference patterns

What to check:

  • Measure ground-to-ground voltage between key points
  • Check for high-current paths sharing ground returns with sensitive circuits
  • Look for ground loops in test setups (scope ground, bench supply ground, DUT ground)

See Ground Loop Detection

Hidden Oscillation#

A circuit oscillates at high frequency, and it goes unnoticed because nobody is looking for it.

What happens: A voltage regulator, op-amp, or transistor circuit breaks into oscillation — often at tens of MHz. The oscillation eats voltage headroom, generates heat, causes EMI, and makes downstream circuits behave unpredictably.

Clues that oscillation is present:

  • Output voltage is lower than expected (the oscillation is eating headroom)
  • Component runs hotter than expected (oscillation power is being dissipated)
  • Adding a load makes the circuit work better (the load dampens the oscillation)
  • Random, irreproducible digital errors downstream

How to find it: Put a scope probe on the suspect output, zoom out to a fast timebase (100 ns/div or faster), and look. Make sure bandwidth limiting is OFF — a 20 MHz bandwidth limit will hide a 50 MHz oscillation.

See Ripple & Noise on the Rail

Connector and Wiring Failures#

Connectors are mechanical components, and mechanical components wear out.

Position-sensitive intermittent: The circuit works when the cable is held at a certain angle, a connector is pressed, or the board is flexed. This is always a mechanical connection issue.

  • Wiggle test: gently flex cables, press on connectors, tap the board near suspects
  • Corroded pins: look for green/white deposits, especially in humid environments
  • Backed-out terminals: the wire is still in the connector housing but the terminal isn’t fully seated
  • Worn contact springs: the connector has been mated too many times and the spring force is gone
  • Board-to-board connectors: check for cracked solder joints at the mounting pads

See Intermittent Connection

ESD Damage#

Electrostatic discharge can damage components without leaving any visible trace. The result is often partial damage — the part still works, but not to spec.

Parametric ESD damage:

  • An op-amp with increased offset voltage or noise
  • A MOSFET with increased leakage (won’t fully turn off)
  • A logic gate with degraded thresholds (marginal at voltage extremes)
  • An ADC with increased noise floor or missing codes

Why it’s the worst: The part isn’t dead — it’s degraded. It passes basic tests but fails under real operating conditions or at temperature extremes. Parametric failures are notoriously hard to diagnose without reference measurements from a known-good board.

When to suspect ESD: The board was handled without ESD precautions, the failure is subtle and parametric rather than hard, and the symptoms don’t match any other pattern.

Pattern Recognition Table#

Given a symptom, start with the most common cause and the quickest measurement to confirm or rule it out:

SymptomFirst SuspectFirst Measurement
Board completely deadNo power, blown fuse, shorted railCheck input voltage, measure current draw
Works then dies after secondsThermal shutdown, overcurrent, shortFeel for hot components, measure current
Random resets / crashesPower rail dip under load, brownoutScope the supply rail during the fault
Communication errors (I2C/SPI)Power, missing pull-ups, signal integrityCheck power first, then check signal levels
Signal present but distortedWrong bias point, clipping, oscillationScope the signal path stage by stage
Intermittent, position-dependentBad connector, cracked solder jointWiggle test, visual inspection, continuity
Intermittent, temperature-dependentMarginal component, cracked jointHeat/freeze individual components
Works on bench, fails in enclosureThermal, grounding, EMITemperature monitoring, ground integrity
Audio hum / video barsGround loop, power supply rippleMeasure ground-to-ground voltage, scope supply
Consumes too much currentShorted component, latch-up, oscillationCurrent draw, thermal imaging / touch test

In Practice#

  • A board revision that fixes one problem but introduces another often indicates that the original diagnosis was at the wrong layer — the board change coincidentally improved the original condition while disturbing a different coupling path or timing relationship that was previously adequate.
  • A system that works for hours and then fails is frequently showing a coordination failure that accumulates — a small timing drift, a slow memory leak, a gradual thermal shift — that eventually crosses a threshold. The individual devices haven’t degraded; the coordination margin has eroded.
  • An intermittent fault that occurs more frequently under high system load often indicates a marginal condition — timing, supply, or thermal — that crosses its threshold only when the full system is active and drawing maximum current, generating maximum heat, and creating maximum electromagnetic interference simultaneously.
  • A subsystem IC that works correctly in evaluation board form but fails in the production design is frequently showing that the collapsed subsystem’s internal design assumed specific external conditions (layout, component placement, thermal environment) that the evaluation board provided but the production design does not.