Evidence-Driven Debugging for C and C++ Firmware and Tools

Many hard C and C++ debugging sessions do not begin with a line that is obviously wrong. They begin with a board that looked fine on the bench, then fails a few minutes into normal firmware execution after several UART, USB, CAN, or TCP messages have already moved through the system. The same pattern shows up when a serial parser rejects one frame from a long capture, when a Linux utility works until the input file changes shape, or when a C++ lifetime bug appears only with optimization enabled. In that moment, editing first usually makes the failure less clear, because the edit can change timing, memory layout, or the exact sequence that triggered the bug.

Capture the exact failure before editing code

When a failure is still vague, the first useful job is to describe it in a form another engineer could reproduce. For firmware, that means the board revision, firmware build, power condition, communication sequence, last reset reason, and the observed state. For a host-side tool, it may be the command line, input file, operating system, compiler flags, and the last few log entries. The point is not paperwork; it is keeping the symptom from turning into a moving target while the investigation is underway.

A captured symptom also gives the fix a real finish line. If the original failure was a bad CRC counter after a USB reconnect, then a cleaner parser or a longer timeout is not enough by itself. The same reconnect sequence has to run again, and the counters or trace should show that the path which failed before now behaves correctly. The diagram below is the workflow used throughout this article.

A useful investigation loop captures the symptom, reproduces the input, instruments one boundary, isolates the failing layer, and verifies against the original failure.

Freeze the input that triggers the bug

Stable input makes a debugger far more useful. When a parser fails on one packet, keep that packet. When a production tester fails after a button sequence, save the sequence. If a C++ container or string parser fails only for one input shape, turn that shape into a small replay case. This removes one common source of wasted time: trying to debug a failure while the stimulus keeps changing underneath you.

The next snippet stores a failing frame as named bytes. The start byte belongs to the protocol, and the command byte names the operation being tested. Those names matter, because a later reader can tell whether the test is about framing, command decoding, length handling, or checksum behavior without reconstructing the protocol from raw numbers.


#include <array>
#include <cstdint>

constexpr std::uint8_t FRAME_START_BYTE = 0xA5u;
constexpr std::uint8_t COMMAND_READ_STATUS = 0x12u;

static constexpr std::array<std::uint8_t, 5> failing_packet{
    FRAME_START_BYTE,
    COMMAND_READ_STATUS,
    0x02u,
    0x00u,
    0x91u
};

void replay_failing_packet(void)
{
    // Good: the failing input is stable and named.
    // Note: parser_accept_bytes() is the receive-path entry point under test.
    parser_accept_bytes(failing_packet.data(), failing_packet.size());
}

#include <array>

#include <cstdint>

constexpr std::uint8_t FRAME_START_BYTE = 0xA5u;

constexpr std::uint8_t COMMAND_READ_STATUS = 0x12u;

static constexpr std::array<std::uint8_t, 5> failing_packet{

FRAME_START_BYTE,

COMMAND_READ_STATUS,

0x02u,

0x00u,

0x91u

};

void replay_failing_packet(void)

{

// Good: the failing input is stable and named.

// Note: parser_accept_bytes() is the receive-path entry point under test.

parser_accept_bytes(failing_packet.data(), failing_packet.size());

}

This capture does not prove where the defect lives, but it gives the team something stable to replay after parser changes, compiler changes, or a discussion about the bug. That small artifact is often what keeps a debugging session from drifting.

Build a timeline from cheap trace records

Some failures are order problems rather than value problems. You can stare at the final value for a long time and still miss the cause, because the value itself is not always wrong. The bug may be that a DMA callback fired before buffer ownership changed, a timeout expired just as the retry path started, or a worker thread posted a result after the receiving object was already gone. In those cases, the useful evidence is the order of events, not only the value left behind at the end.

A short trace should capture enough sequence information to answer which event happened first. In firmware, that trace has to be cheap enough to run near the failure without moving the failure somewhere else. On a host tool, the same idea may use richer logs or test harness timestamps, but the goal is still ordered evidence rather than a large unstructured log.


#include <stdint.h>

typedef enum {
    TraceEvent_RxStarted,
    TraceEvent_RxComplete,
    TraceEvent_ParseStarted,
    TraceEvent_ParseRejected
} TraceEvent;

typedef struct {
    uint32_t timestamp_ms;
    TraceEvent event;
} TraceEntry;

#define TRACE_ENTRY_COUNT 32u

static TraceEntry trace_entries[TRACE_ENTRY_COUNT];

// Note: volatile is for visibility only; this example assumes one writer.
static volatile uint8_t trace_write_index;

void trace_record(TraceEvent event)
{
    // Note: read the volatile index once, so the fast path does a single load
    // and the entry slot and the increment below stay consistent with each other.
    const uint8_t write_index = trace_write_index;
    TraceEntry *entry = &trace_entries[write_index];

    // Good: keep only ordered evidence, not formatted text in the fast path.
    // Note: HAL_GetTick() is the STM32 HAL millisecond tick; any monotonic
    // tick source works here.
    entry->timestamp_ms = HAL_GetTick();
    entry->event = event;

    trace_write_index = (uint8_t)((write_index + 1u) % TRACE_ENTRY_COUNT);
}

#include <stdint.h>

typedef enum {

TraceEvent_RxStarted,

TraceEvent_RxComplete,

TraceEvent_ParseStarted,

TraceEvent_ParseRejected

} TraceEvent;

typedef struct {

uint32_t timestamp_ms;

TraceEvent event;

} TraceEntry;

#define TRACE_ENTRY_COUNT 32u

static TraceEntry trace_entries[TRACE_ENTRY_COUNT];

// Note: volatile is for visibility only; this example assumes one writer.

static volatile uint8_t trace_write_index;

void trace_record(TraceEvent event)

{

// Note: read the volatile index once, so the fast path does a single load

// and the entry slot and the increment below stay consistent with each other.

const uint8_t write_index = trace_write_index;

TraceEntry *entry = &trace_entries[write_index];

// Good: keep only ordered evidence, not formatted text in the fast path.

// Note: HAL_GetTick() is the STM32 HAL millisecond tick; any monotonic

// tick source works here.

entry->timestamp_ms = HAL_GetTick();

entry->event = event;

trace_write_index = (uint8_t)((write_index + 1u) % TRACE_ENTRY_COUNT);

}

A small ring of trace entries has one obvious limitation: old evidence can be overwritten. That is fine when the failure happens close to the trigger, but long-running faults usually need trigger conditions, a larger buffer, or a way to freeze the trace when a fault flag is set.

There is also an important execution-context assumption in this example. volatile can keep the compiler from caching the write index, but it does not make the index update atomic. If both an ISR and a task can write trace entries, serialize the writer, mask interrupts for the index update, or use an atomic primitive that is supported and measured on the target. On Cortex-M systems with suitable debug hardware, ITM/SWO or ETM trace through a JTAG/SWD probe can sometimes give better timing evidence than adding more software trace records.

Instrument the boundary where the state changes

The fault domain changes every time data crosses from one representation to another. Bytes become frames, frames become commands, commands change state, and state changes produce outputs. The same shape appears in Linux utilities and Qt tools, where files become records, records become model objects, and model objects drive UI or reports. Instrumenting those crossings is usually more useful than sprinkling prints across every function in the call path.

The diagram below shows the kind of boundary map that is worth drawing before adding instrumentation. The exact blocks change from project to project, but the useful counters usually sit where ownership, timing, format, or trust changes.

Useful counters usually live where data changes form or ownership, because that is where the next debugging decision becomes clearer.

Change one condition at a time

When several experiments are combined, causality disappears quickly. If a timeout is increased, a delay is added, and a buffer size changes in the same test run, a disappearing symptom does not tell you which condition mattered. It only says that the combined edit changed enough of the system to hide or remove the failure.

A temporary compile-time switch can help when you want to test one theory without burying it inside other edits. It should stay obviously temporary, and after the experiment it should either be removed or turned into a justified configuration. This snippet keeps the experiment narrow: only the sensor timeout changes, and the production path remains visible.


#define DEBUG_TEST_LONGER_SENSOR_TIMEOUT 0u

static uint32_t sensor_ready_timeout_ms(void)
{
#if DEBUG_TEST_LONGER_SENSOR_TIMEOUT
    // Tradeoff: temporary experiment to test whether timing is the failure cause.
    return 100u;
#else
    // Good: production path keeps the original measured timeout.
    return 20u;
#endif
}

#define DEBUG_TEST_LONGER_SENSOR_TIMEOUT 0u

static uint32_t sensor_ready_timeout_ms(void)

{

#if DEBUG_TEST_LONGER_SENSOR_TIMEOUT

// Tradeoff: temporary experiment to test whether timing is the failure cause.

return 100u;

#else

// Good: production path keeps the original measured timeout.

return 20u;

#endif

}

The tradeoff is patience. Testing one condition at a time feels slower when the team wants to try several ideas at once, but each result means something. That matters more than a quick pass that nobody can explain later.

Count failure classes, not just failures

A single error counter is rarely enough in firmware or protocol code. A short frame, a bad start byte, a bad CRC, and an application-layer rejection point to different causes, even if they all look like receive failures from the outside. Combining them into one number hides the layer where the first wrong thing occurred.

Here, 0xA5u is the protocol sync byte. Naming it as FRAME_START_BYTE keeps the check tied to the protocol concept and avoids leaving a bare magic number inside the failure path.


struct DecodeStats {
    unsigned frames_seen = 0;
    unsigned frames_accepted = 0;
    unsigned frames_too_short = 0;
    unsigned frames_bad_start = 0;
    unsigned frames_null_input = 0;
};

bool decode_frame(const std::uint8_t* data, unsigned length, DecodeStats& stats)
{
    ++stats.frames_seen;

    if (data == nullptr) {
        // Good: a null pointer is a caller bug, not a short frame; count it separately.
        ++stats.frames_null_input;
        return false;
    }

    if (length < 4u) {
        // Good: short input is counted as its own failure class.
        ++stats.frames_too_short;
        return false;
    }

    if (data[0] != FRAME_START_BYTE) {
        // Good: wrong sync byte is separated from short input.
        ++stats.frames_bad_start;
        return false;
    }

    ++stats.frames_accepted;
    return true;
}

struct DecodeStats {

unsigned frames_seen = 0;

unsigned frames_accepted = 0;

unsigned frames_too_short = 0;

unsigned frames_bad_start = 0;

unsigned frames_null_input = 0;

};

bool decode_frame(const std::uint8_t* data, unsigned length, DecodeStats& stats)

{

++stats.frames_seen;

if (data == nullptr) {

// Good: a null pointer is a caller bug, not a short frame; count it separately.

++stats.frames_null_input;

return false;

}

if (length < 4u) {

// Good: short input is counted as its own failure class.

++stats.frames_too_short;

return false;

}

if (data[0] != FRAME_START_BYTE) {

// Good: wrong sync byte is separated from short input.

++stats.frames_bad_start;

return false;

}

++stats.frames_accepted;

return true;

}

Counters can also grow until they become another interface to maintain. Keep the first set focused on boundaries that change the next debugging decision: transport, framing, validation, state transition, and application rejection.

Make invalid states report themselves

C and C++ systems often fail long after the first invalid state appears. A state machine may accept a transition it should reject, a stale owner may keep a buffer, or a driver may accept a command after shutdown has already started. If the code quietly accepts those states, the final symptom can look unrelated to the real cause.

The following state machine has only a few legal transitions. The helper records unexpected transitions so the state model does not jump across its own rules without leaving evidence.


typedef enum {
    MotorState_Idle,
    MotorState_Starting,
    MotorState_Running,
    MotorState_Fault
} MotorState;

static volatile unsigned invalid_transition_count;

static void record_invalid_transition(void)
{
    // Tradeoff: use an atomic increment or critical section if several contexts call this path.
    ++invalid_transition_count;
}

bool motor_transition_allowed(MotorState from, MotorState to)
{
    if (from == MotorState_Idle && to == MotorState_Starting) {
        return true;
    }
    if (from == MotorState_Starting && to == MotorState_Running) {
        return true;
    }
    // Note: a real motor needs a normal stop path, not only a fault exit.
    if (from == MotorState_Running && to == MotorState_Idle) {
        return true;
    }
    // Note: recovery to Idle is allowed only after the fault cause is handled.
    if (from == MotorState_Fault && to == MotorState_Idle) {
        return true;
    }
    if (to == MotorState_Fault) {
        return true;
    }

    // Good: unexpected paths leave evidence during bring-up.
    record_invalid_transition();
    return false;
}

typedef enum {

MotorState_Idle,

MotorState_Starting,

MotorState_Running,

MotorState_Fault

} MotorState;

static volatile unsigned invalid_transition_count;

static void record_invalid_transition(void)

{

// Tradeoff: use an atomic increment or critical section if several contexts call this path.

++invalid_transition_count;

}

bool motor_transition_allowed(MotorState from, MotorState to)

{

if (from == MotorState_Idle && to == MotorState_Starting) {

return true;

}

if (from == MotorState_Starting && to == MotorState_Running) {

return true;

}

// Note: a real motor needs a normal stop path, not only a fault exit.

if (from == MotorState_Running && to == MotorState_Idle) {

return true;

}

// Note: recovery to Idle is allowed only after the fault cause is handled.

if (from == MotorState_Fault && to == MotorState_Idle) {

return true;

}

if (to == MotorState_Fault) {

return true;

}

// Good: unexpected paths leave evidence during bring-up.

record_invalid_transition();

return false;

}

In safety-related or production firmware, an invalid transition may need stronger handling than a counter: a fault state, a diagnostic record, or a controlled reset. During bring-up, a counter is often enough to show whether the state model is being violated before the visible failure appears.

The transition counter is diagnostic, but it still has to follow the same concurrency rules as the rest of the system. If all transition requests are serialized through one control task, a normal counter is enough. If requests can come from an ISR, a task, worker threads, or multiple cores, the counter needs an atomic increment or a short critical section. Otherwise the diagnostic path can lose evidence while it is trying to report the bug.

Keep logging out of timing-critical paths

Adding more logging is not harmless in embedded systems. A log statement can change timing, increase stack usage, block on a UART, allocate memory, or quietly hide the race condition you are trying to catch. Desktop C++ has a different cost model, but logging can still move a concurrency bug by changing thread scheduling or object lifetime.

For timing-sensitive code, decide first which evidence must be captured in the fast path and which work can be deferred. Event counters, compact trace records, GPIO timing marks, and status snapshots are usually safer than formatted text inside an interrupt, control loop, or high-rate receive callback.

The counter example below uses project-specific critical-section wrappers. In a real codebase, those wrappers might save and restore interrupt state, enter an RTOS critical section, or map to a lock-free atomic operation. The important point is that volatile only addresses visibility. It does not make ++counter atomic when an ISR, task, or second core can touch the same counter.

Timing-sensitive code should record small facts first, then leave formatting, transport, and heavier analysis to a context that can afford it.


typedef enum {
    DebugEvent_RxByte = 0u,
    DebugEvent_FrameAccepted = 1u,
    DebugEvent_FrameRejected = 2u,
    DebugEvent_Count
} DebugEvent;

typedef unsigned DebugIrqState;

DebugIrqState debug_enter_critical(void);
void debug_exit_critical(DebugIrqState state);

static volatile unsigned debug_event_counts[DebugEvent_Count];

void debug_count_event(DebugEvent event)
{
    if ((unsigned)event >= (unsigned)DebugEvent_Count) {
        return;
    }

    // Good: protect the read-modify-write if several contexts can update it.
    DebugIrqState state = debug_enter_critical();
    ++debug_event_counts[(unsigned)event];
    debug_exit_critical(state);
}

typedef enum {

DebugEvent_RxByte = 0u,

DebugEvent_FrameAccepted = 1u,

DebugEvent_FrameRejected = 2u,

DebugEvent_Count

} DebugEvent;

typedef unsigned DebugIrqState;

DebugIrqState debug_enter_critical(void);

void debug_exit_critical(DebugIrqState state);

static volatile unsigned debug_event_counts[DebugEvent_Count];

void debug_count_event(DebugEvent event)

{

if ((unsigned)event >= (unsigned)DebugEvent_Count) {

return;

}

// Good: protect the read-modify-write if several contexts can update it.

DebugIrqState state = debug_enter_critical();

++debug_event_counts[(unsigned)event];

debug_exit_critical(state);

}

Bounded instrumentation gives less context than a full log, and that is the tradeoff. Preserve the behavior first, then add richer context from a slower task, a host-side replay, or a test harness once the suspicious boundary is known.

On host-side C++, std::atomic<unsigned> is usually the clearer expression for shared debug counters. In C firmware, C11 atomics or compiler builtins such as __atomic_fetch_add may be appropriate, but they still need to be checked against the MCU, compiler, and runtime settings. On small MCUs, a short critical section is often more predictable than assuming the compiler emitted a lock-free operation.

Check lifetime and ownership before rewriting logic

In C++, a value can be logically correct and still be invalid. Dangling views, references to temporary objects, invalidated iterators, callbacks that outlive their owner, and spans kept past the lifetime of their buffer can all produce symptoms far away from the faulty line. That is why lifetime deserves an early check when a failure changes with optimization level, logging, allocation pattern, or object layout.

The example below uses a status string. Returning a view to a local string is broken because the string is destroyed when the function returns. Returning a view to static storage is safe in this small case only because the storage lifetime is long enough.


#include <string>
#include <string_view>

std::string_view bad_status_text(void)
{
    std::string text = "fault";

    // Bad: the string dies before the returned view is used.
    return text;
}

std::string_view good_status_text(void)
{
    static constexpr char text[] = "fault";

    // Good: the returned view refers to static storage.
    return text;
}

#include <string>

#include <string_view>

std::string_view bad_status_text(void)

{

std::string text = "fault";

// Bad: the string dies before the returned view is used.

return text;

}

std::string_view good_status_text(void)

{

static constexpr char text[] = "fault";

// Good: the returned view refers to static storage.

return text;

}

The better fix depends on the interface. Static storage is acceptable for fixed text, but it is not a general solution for dynamic data. In other cases, the caller should own a std::string, the callee should write into a caller-provided buffer, or the API should make ownership explicit.

Host-side sanitizer builds are useful for this class of problem. AddressSanitizer, usually called ASan, can catch many use-after-free and stack lifetime errors. UndefinedBehaviorSanitizer, usually called UBSan, can expose several classes of undefined behavior before the issue reaches hardware. They do not replace target testing, and they may not fit bare-metal firmware, but they are valuable for parser logic, protocol code, libraries, and simulation builds that can run on a desktop.

Use assertions for broken assumptions

Some checks are design assumptions rather than recoverable runtime errors. Assertions are useful when a violation means the surrounding code is wrong, especially in debug builds, test utilities, simulation builds, and firmware bring-up. The important distinction is whether the system can recover. Bad input from outside the system should usually be handled; an impossible internal invariant should be made visible.

Here, the ring buffer capacity must be a power of two because the mask-based wrap depends on that shape. The static assertion makes the requirement part of the code instead of leaving it as a comment in a design note. The runtime assert covers the other kind of assumption, one that only exists while the code runs: an index that should already be wrapped when it reaches the store function.


#include <cassert>
#include <cstdint>

constexpr unsigned RX_RING_CAPACITY = 128u;
constexpr unsigned RX_RING_MASK = RX_RING_CAPACITY - 1u;

static_assert((RX_RING_CAPACITY & RX_RING_MASK) == 0u,
              "RX ring capacity must be a power of two");

static std::uint8_t rx_ring[RX_RING_CAPACITY];

unsigned wrap_rx_index(unsigned index)
{
    // Good: this wrap is valid because the static assertion proved the shape.
    return index & RX_RING_MASK;
}

void rx_ring_store(unsigned index, std::uint8_t value)
{
    // Good: the runtime assert catches an out-of-range index in debug and test builds.
    assert(index < RX_RING_CAPACITY);
    rx_ring[index] = value;
}

#include <cassert>

#include <cstdint>

constexpr unsigned RX_RING_CAPACITY = 128u;

constexpr unsigned RX_RING_MASK = RX_RING_CAPACITY - 1u;

static_assert((RX_RING_CAPACITY & RX_RING_MASK) == 0u,

"RX ring capacity must be a power of two");

static std::uint8_t rx_ring[RX_RING_CAPACITY];

unsigned wrap_rx_index(unsigned index)

{

// Good: this wrap is valid because the static assertion proved the shape.

return index & RX_RING_MASK;

}

void rx_ring_store(unsigned index, std::uint8_t value)

{

// Good: the runtime assert catches an out-of-range index in debug and test builds.

assert(index < RX_RING_CAPACITY);

rx_ring[index] = value;

}

On small targets, decide which assertions remain in production and what they do when they fire. Some products can stop in a fault state, some need a diagnostic record and reset, and some must keep running in a degraded mode. The assertion policy should match the system risk, not just the developer’s preference.

Reduce the failing path to one test

Once the failing input and boundary evidence point to a small area, turn that area into a test. The test does not need to model the whole product; it needs to preserve the bug’s essential shape so the fix can be checked again after the next edit.

The next test checks that a short frame is counted as short and not as a bad start byte. That distinction matters because each counter points to a different layer of the receive path.


#include <cassert>

void test_short_frame_is_not_bad_start(void)
{
    DecodeStats stats{};
    const std::uint8_t short_frame[] = { FRAME_START_BYTE, COMMAND_READ_STATUS };

    const bool ok = decode_frame(short_frame, 2u, stats);

    // Good: the test checks the exact failure classification.
    assert(ok == false);
    assert(stats.frames_too_short == 1u);
    assert(stats.frames_bad_start == 0u);
}

#include <cassert>

void test_short_frame_is_not_bad_start(void)

{

DecodeStats stats{};

const std::uint8_t short_frame[] = { FRAME_START_BYTE, COMMAND_READ_STATUS };

const bool ok = decode_frame(short_frame, 2u, stats);

// Good: the test checks the exact failure classification.

assert(ok == false);

assert(stats.frames_too_short == 1u);

assert(stats.frames_bad_start == 0u);

}

A focused test does not replace hardware validation. It removes one class of uncertainty before you return to the board, test fixture, or customer capture. For firmware, the usual pattern is to replay the parsing or state logic on the host, then run the original hardware sequence again after the code change.

Verify the fix against the original symptom

A fix is not complete just because the code is cleaner or a nearby test passes. It is complete when the original symptom is gone and the evidence shows that the intended path is now being used. Keep the original packet, test sequence, input file, or hardware setup until that verification is done.

The table summarizes the habits in this article. The useful part is not only the habit itself, but the kind of evidence it leaves behind. If a debugging action does not reduce uncertainty, it is probably activity rather than investigation.

Debugging habits that leave useful evidence behind.
Habit	Evidence it creates	Failure it prevents
Capture the exact symptom	Known start and end condition	Debugging a changing story
Freeze failing input	Repeatable stimulus	Chasing a moving packet, file, or state
Build a timeline	Ordered events	Confusing order bugs with value bugs
Instrument boundaries	Layer-specific evidence	Scattering logs without narrowing the fault
Change one condition	Experiment with causality	Lucky fixes with unknown mechanism
Count failure classes	Transport, format, or state distinction	One vague error counter
Use bounded instrumentation	Timing-safe clues	Measurement changing the failure
Check lifetime early	Ownership and validity evidence	Rewriting logic around a dangling object
Add a focused test	Regression protection	A fix that only works once

Practical takeaways

For C and C++ systems, the most useful debugging work usually happens before the first code change. Capture the failure, freeze the input, identify the boundary where correct state becomes incorrect, and choose instrumentation that does not change the timing or ownership behavior being measured. Once the cause is isolated, reduce it to a narrow test when possible, then verify the fix against the original symptom on the real target or the real input. This workflow can feel slower than guessing during the first ten minutes, but it holds up much better when the failure involves firmware timing, protocol boundaries, object lifetime, or production hardware that cannot be debugged by trial and error.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Evidence-Driven Debugging for C and C++ Firmware and Tools

Capture the exact failure before editing code

Freeze the input that triggers the bug

Build a timeline from cheap trace records

Instrument the boundary where the state changes

Change one condition at a time

Count failure classes, not just failures

Make invalid states report themselves

Keep logging out of timing-critical paths

Check lifetime and ownership before rewriting logic

Use assertions for broken assumptions

Reduce the failing path to one test

Verify the fix against the original symptom

Practical takeaways

Saeid Yazdani

Leave a ReplyCancel Reply

Capture the exact failure before editing code

Freeze the input that triggers the bug

Build a timeline from cheap trace records

Instrument the boundary where the state changes

Change one condition at a time

Count failure classes, not just failures

Make invalid states report themselves

Keep logging out of timing-critical paths

Check lifetime and ownership before rewriting logic

Use assertions for broken assumptions

Reduce the failing path to one test

Verify the fix against the original symptom

Practical takeaways

Saeid Yazdani

Related Posts

Raspberry Pi AD9850 Sine Wave Generator with GPIO Control

Essential Linux Skills for Embedded Systems Engineers

The Definitive Guide to Bit Manipulation for Embedded Systems

Writing Drivers for SPI Chips in Embedded Systems

Leave a ReplyCancel Reply