Every developer knows that small pause after a bug appears: the code was fine five minutes ago, the board or app now behaves like it has made its own plan, and you catch yourself staring at the screen thinking, "what changed now?" A value looks wrong, a packet disappears, a GUI freezes, or firmware takes a path that should have been impossible. The tempting reaction is to click around, add a few random prints, and single-step until patience runs out.
That feeling is familiar because debugging is rarely only about the broken line of code. It is also about how quickly you can turn confusion into evidence. The techniques below are not tied to one IDE, debugger, compiler, or keyboard shortcut. They are habits that work in desktop C++, embedded C, test tools, firmware drivers, and small command-line utilities. The goal is simple: leave better clues, narrow the search faster, and avoid changing code before you understand the failure.
1. Put source location into trace messages
Plain messages like failed or timeout feel useful until the same word appears from three different places. The same driver may be called by a boot path, a test mode, a GUI command, and a background worker. If a trace message does not say where it came from, the log has only moved the search from the screen back into the source tree.
C and C++ both give you predefined macros for the file and line. Modern C++ also has std::source_location, but the old macro approach is still useful in embedded projects and C code.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
#include <stdio.h> #define TRACE_ERROR(message) \ trace_error(__FILE__, __LINE__, (message)) static void trace_error(const char *file, int line, const char *message) { printf("[error] %s:%d: %s\n", file, line, message); } int open_calibration_file(const char *path) { if (path == NULL) { TRACE_ERROR("calibration path is null"); /* Good: source is included */ return -1; } return 0; } |
This is not about printing more text everywhere. It is about printing the missing coordinates when something fails. In firmware, the same idea can write to UART, SWO, RTT, a ring buffer, or a diagnostic packet instead of printf.
2. Make debug output switchable at runtime
Compile-time debug switches are useful, but they can turn into a small rebuild festival. Runtime switches are often better during investigation. You might want protocol traces for one failing command, allocator traces for one test, or verbose driver logging only after the board enters a specific mode.
The pattern is simple: keep a bit mask of enabled trace classes and check it at the call site.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
#include <stdint.h> #include <stdio.h> typedef enum { DBG_PROTOCOL = 1u << 0, /* Note: bit mask for packet traces */ DBG_STORAGE = 1u << 1, /* Note: bit mask for file or flash traces */ DBG_TIMING = 1u << 2, /* Note: bit mask for timing traces */ } DebugChannel; static uint32_t debug_channels; void debug_enable(DebugChannel channel) { debug_channels |= (uint32_t)channel; } void debug_log(DebugChannel channel, const char *message) { if ((debug_channels & (uint32_t)channel) == 0u) { return; /* OK: cheap filter before formatting */ } printf("%s\n", message); } |
The important detail is that the filter happens before expensive formatting. In a desktop tool this saves noise. In embedded firmware it can also save time, stack, and serial bandwidth.
3. Use conditional checks instead of stopping everywhere
Stopping at every loop iteration is the debugging version of checking every drawer in the room. Sometimes you have to do it, but usually the bug has a condition. The same applies to printing every packet, every object, or every sample. Write that condition down and make the code stop, log, or count only when the condition becomes interesting.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
#include <cstdint> #include <vector> struct Sample { std::uint32_t sequence; std::int32_t millivolts; }; static bool sample_is_suspicious(const Sample& sample) { if (sample.millivolts < -5000 || sample.millivolts > 5000) { return true; /* Good: captures the impossible range */ } if ((sample.sequence & 0x0fu) == 0u) { return false; /* Note: periodic records are expected here */ } return false; } void inspect_samples(const std::vector<Sample>& samples) { for (const Sample& sample : samples) { if (sample_is_suspicious(sample)) { debug_log(DBG_PROTOCOL, "suspicious sample detected"); } } } |
If you are using a debugger, the same condition can often be used as a conditional breakpoint. If you are using logs, it becomes a filter. If you are testing embedded firmware, it can toggle a pin or store an event record. The point is the same: make the condition precise.
4. Add debug-only helper state when it makes the failure visible
Production code should not carry unnecessary state just because yesterday's debugging session was annoying. But temporary debug-only state can be a good tool when it explains behavior that is otherwise hidden.
For example, a parser may return false, but that does not tell you which state rejected the packet. A debug-only state name or counter can reveal the path.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
enum class ParserState { WaitStart, ReadLength, ReadPayload, ReadCrc, }; struct PacketParser { ParserState state = ParserState::WaitStart; unsigned bad_crc_count = 0; #if defined(ENABLE_DEBUG_STATE) ParserState last_reject_state = ParserState::WaitStart; #endif }; static bool reject_packet(PacketParser& parser) { #if defined(ENABLE_DEBUG_STATE) parser.last_reject_state = parser.state; /* Good: records why debugging failed */ #endif parser.state = ParserState::WaitStart; return false; } |
This is a tradeoff. Debug-only state can become stale or misleading if it is not maintained with the real logic. Keep it small, name it clearly, and remove it when it stops being useful.
5. Avoid stepping through code that is already innocent
Single-stepping is useful when you are inside the suspicious area. It is a poor tool when you are still trying to find the suspicious area. After a while, stepping through constructors, accessors, generated code, or container internals starts to feel productive while quietly draining attention.
A better habit is to mark boundaries. Log when a subsystem starts and ends. Add counters around the suspected branch. Use assertions to catch impossible state at the boundary. Then step only after the boundary points to a small region.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
struct DecodeStats { unsigned frames_seen = 0; unsigned frames_accepted = 0; unsigned frames_rejected = 0; }; bool decode_frame(const unsigned char* data, unsigned length, DecodeStats& stats) { ++stats.frames_seen; if (data == nullptr || length < 4u) { ++stats.frames_rejected; /* Good: boundary failure is counted */ return false; } if (data[0] != 0xA5u) { ++stats.frames_rejected; /* Good: bad start byte is separated */ return false; } ++stats.frames_accepted; return true; } |
Now the question changes from "where is the bug?" to a smaller question: are frames rejected because the pointer is wrong, the length is short, or the start byte is missing?
6. Reproduce the bug outside the full application
A bug that only exists in the full application is expensive to debug because it brings all its friends: startup code, configuration, timing, UI state, background tasks, and old assumptions. A bug that can be reproduced with a short input file, one serial command, a unit test, or a small command-line tool is much easier to fix.
When the failure depends on data, capture the data. When it depends on a packet, save the packet. When it depends on a sequence, write down the sequence and automate it.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
#include <cassert> #include <cstdint> #include <vector> static bool parse_packet(const std::vector<std::uint8_t>& packet); void test_rejected_short_packet() { const std::vector<std::uint8_t> packet = { 0xA5, 0x02, 0x10 /* Note: intentionally incomplete example packet */ }; const bool accepted = parse_packet(packet); assert(accepted == false); /* Good: the failure is now repeatable */ } |
This does not replace debugger work. It makes debugger work cheaper. Once the failing input is small and repeatable, every investigation becomes faster.
7. Keep trace code from changing timing too much
Debugging output can change the bug, which is unfair but very real. A printf inside a tight loop may hide a race, fix a timing issue accidentally, or make an embedded system miss deadlines. This is one reason timing bugs feel random.
When timing matters, prefer lightweight event recording. Store a small event code and timestamp, then print it later.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
#include <stdint.h> typedef struct { uint32_t tick; uint16_t event; uint16_t value; } TraceEvent; #define TRACE_CAPACITY 64u static TraceEvent trace_buffer[TRACE_CAPACITY]; static uint32_t trace_write_index; void trace_event(uint32_t tick, uint16_t event, uint16_t value) { uint32_t index = trace_write_index++ % TRACE_CAPACITY; trace_buffer[index].tick = tick; trace_buffer[index].event = event; trace_buffer[index].value = value; } |
This still has cost, but it is bounded and predictable. For embedded systems, this pattern is usually safer than formatting text in an interrupt or a time-critical path.
8. Know when to debug the release build
Debug builds are friendlier to inspect, but they can also be a little too friendly. Some bugs only appear in optimized builds. The optimizer can expose undefined behavior, timing differences, missing volatile, bad lifetimes, uninitialized data, and code that accidentally relied on debug-build memory patterns.
Do not assume "works in debug" means the code is correct.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
int read_first_value(const int* values, unsigned count) { if (count == 0u) { return 0; } return values[0]; /* Risky: values may still be null */ } int read_first_value_checked(const int* values, unsigned count) { if (values == nullptr || count == 0u) { return 0; /* Good: handles both preconditions */ } return values[0]; } |
For C and C++, release-build debugging often means using symbols with optimization enabled, checking compiler warnings, enabling sanitizers on host builds, and reducing undefined behavior. In embedded projects, it may also mean watching timing, stack usage, and memory layout changes.
9. Use assertions as tripwires, not as a complete error strategy
Assertions are excellent for catching impossible internal states while developing. They are not a complete production error strategy. A driver still needs defined behavior when a caller passes a bad argument or hardware does not respond.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
#include <assert.h> #include <stddef.h> #include <stdint.h> typedef enum { DriverOk = 0, DriverBadArgument, DriverBusError, } DriverStatus; DriverStatus sensor_read(uint8_t reg, uint8_t *value) { assert(value != NULL); /* Good: development tripwire */ if (value == NULL) { return DriverBadArgument; /* Good: release behavior is still defined */ } return DriverOk; } |
The split matters. During bring-up, the assertion stops the mistake close to its source. In a release build where assertions may be disabled, the function still returns a useful error.
10. Make large object sets searchable
Debugging is harder when every object looks like every other object. If a system manages many messages, channels, tasks, sessions, or buffers, give each one a stable identifier and keep lightweight counters.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
#include <cstdint> struct Transfer { std::uint32_t id; std::uint32_t bytes_expected; std::uint32_t bytes_received; bool complete; }; bool transfer_is_stuck(const Transfer& transfer) { if (transfer.complete) { return false; } return transfer.bytes_received == transfer.bytes_expected; /* Risky: may indicate a missing completion flag */ } |
With identifiers, logs become searchable. With counters, you can compare what should have happened with what did happen. This is useful in GUI applications, communication stacks, queue-based firmware, and test automation.
11. Fix one thing, then verify the original symptom
Debugging sessions often end with several changes and a vague feeling that one of them helped. That is dangerous. You can accidentally hide a bug, introduce another one, or leave the original root cause unconfirmed.
Treat the investigation like a small experiment:
| Step | Question | Useful evidence |
|---|---|---|
| Reproduce | Can I trigger the failure again? | Test case, input file, command sequence, captured packet |
| Narrow | Which subsystem first shows wrong behavior? | Trace point, counter, breakpoint, scope capture |
| Explain | What exact assumption failed? | Bad argument, wrong state, timeout, invalid lifetime |
| Fix | What is the smallest correction? | One code change or one configuration change |
| Verify | Does the original symptom disappear? | Re-run the same reproduction path |
This is slower than guessing for the first five minutes. It is usually faster after the first hour.
Practical checklist
Use this checklist when a bug starts to spread across too many files:
- Can I reproduce the failure with the smallest possible input or sequence?
- Do my logs include source location, object identity, and enough state?
- Did I write the condition that makes the bug interesting?
- Am I stepping through innocent code instead of narrowing the boundary?
- Could my debug output be changing timing?
- Does the bug appear only in debug or only in release?
- Did I verify the original symptom after the fix?
Good debugging is not about using a specific tool perfectly. It is about creating evidence faster than the bug can create confusion.