Serial communication looks simple until the first real board starts dropping bytes at the worst possible moment. A command works from a terminal but fails from the production GUI. A packet survives at 9600 baud and breaks at 115200. A USB to serial adapter works on the bench, then the installed RS232 cable behaves differently in the cabinet.
Most of these problems come from one bad assumption: treating a serial byte stream like it already contains clean messages. UART gives you bytes. RS232 gives you an electrical interface. The protocol, buffering, recovery rules, and diagnostic evidence are your responsibility.
Design the frame before writing the parser
A receiver can only recover from noise, dropped bytes, or a reset in the middle of a transfer if the frame format gives it enough structure. A useful frame normally has a start condition, length or delimiter, message type, payload rule, integrity check, and timeout behavior.
The exact frame depends on the product. A bootloader, a sensor link, and a lab instrument command protocol do not need the same format. What they do need is a receiver that can find the next valid frame after something goes wrong.
The following constants make the protocol rules visible. 0x7E is the start byte, the length byte is limited to a small payload, and the timeout is part of the protocol instead of an accidental delay in the main loop.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
#define FRAME_START_BYTE 0x7Eu #define FRAME_MAX_PAYLOAD_LEN 64u #define FRAME_TIMEOUT_MS 20u #define CRC8_POLYNOMIAL 0x07u typedef enum { Parser_WaitStart, Parser_ReadLength, Parser_ReadType, Parser_ReadPayload, Parser_ReadCrc } ParserState; |
Named protocol facts make reviews easier. A raw 0x7E in the middle of a parser is just a number. FRAME_START_BYTE tells the next engineer why that value matters.
Keep the receive path short and measurable
The receive interrupt or byte callback should do the smallest useful amount of work. Store the byte, update a small amount of state, and leave parsing or command execution to a foreground task. If the callback waits for a response, formats a string, writes to flash, or updates a GUI, it will eventually lose bytes.
The ring buffer below assumes one ISR writer and one foreground reader. On small 8 bit MCUs, check whether index reads and writes are atomic. If they are not, protect the reader side with the same critical section style used in the rest of the firmware.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
#include <stdbool.h> #include <stdint.h> #define RX_BUFFER_SIZE 128u static volatile uint8_t rx_buffer[RX_BUFFER_SIZE]; static volatile uint8_t rx_head; static volatile uint8_t rx_tail; static volatile uint16_t rx_overflow_count; void uart_rx_byte_isr(uint8_t byte) { const uint8_t next_head = (uint8_t)((rx_head + 1u) % RX_BUFFER_SIZE); if (next_head == rx_tail) { /* Good: count the exact place where incoming bytes are dropped. */ ++rx_overflow_count; return; } rx_buffer[rx_head] = byte; rx_head = next_head; } bool uart_try_read_byte(uint8_t *byte) { if (rx_tail == rx_head) { return false; } *byte = rx_buffer[rx_tail]; rx_tail = (uint8_t)((rx_tail + 1u) % RX_BUFFER_SIZE); return true; } |
That overflow counter is bring-up evidence. If it increments during a burst test, the parser is too slow, the baud rate is too high for the current task scheduling, or the sender is allowed to transmit faster than the receiver can drain.
Parse with explicit states and limits
A parser should not be a long list of assumptions hidden inside nested if statements. Give the parser a state, keep the length bounded, count bad frames, and reset back to a known state when the frame is impossible.
This example uses CRC-8 to keep the snippet compact. Many product protocols use CRC-16 or CRC-32 instead. The important habit is that every byte that belongs to the checked part of the frame is fed through one named CRC update path.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
#include <stdbool.h> #include <stdint.h> typedef struct { ParserState state; uint8_t type; uint8_t length; uint8_t index; uint8_t payload[FRAME_MAX_PAYLOAD_LEN]; uint8_t crc; uint32_t deadline_ms; uint32_t frames_ok; uint32_t crc_errors; uint32_t length_errors; uint32_t timeout_errors; } SerialParser; static uint8_t crc8_update(uint8_t crc, uint8_t byte) { crc ^= byte; for (uint8_t bit = 0u; bit < 8u; ++bit) { crc = (crc & 0x80u) ? (uint8_t)((crc << 1u) ^ CRC8_POLYNOMIAL) : (uint8_t)(crc << 1u); } return crc; } |
The parser state object is deliberately boring. Boring is good here. It means a debugger watch window can show where the frame is, how many bytes remain, and which error counter is growing.
The next part shows the parser consuming bytes. The helper time_reached() uses wrap-safe unsigned time arithmetic, which is safer than comparing raw millisecond counters directly.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
static bool time_reached(uint32_t now_ms, uint32_t deadline_ms) { return (int32_t)(now_ms - deadline_ms) >= 0; } static void parser_reset(SerialParser *parser) { parser->state = Parser_WaitStart; parser->type = 0u; parser->length = 0u; parser->index = 0u; parser->crc = 0u; parser->deadline_ms = 0u; } void parser_accept_byte(SerialParser *parser, uint8_t byte, uint32_t now_ms) { if ((parser->state != Parser_WaitStart) && time_reached(now_ms, parser->deadline_ms)) { /* Good: partial frames have a defined end instead of living forever. */ ++parser->timeout_errors; parser_reset(parser); } switch (parser->state) { case Parser_WaitStart: if (byte == FRAME_START_BYTE) { parser->crc = 0u; parser->deadline_ms = now_ms + FRAME_TIMEOUT_MS; parser->state = Parser_ReadLength; } break; case Parser_ReadLength: if ((byte == 0u) || (byte > FRAME_MAX_PAYLOAD_LEN)) { /* Good: reject impossible sizes before writing into the payload. */ ++parser->length_errors; parser_reset(parser); break; } parser->length = byte; parser->crc = crc8_update(parser->crc, byte); parser->state = Parser_ReadType; break; case Parser_ReadType: parser->type = byte; parser->crc = crc8_update(parser->crc, byte); parser->index = 0u; parser->state = Parser_ReadPayload; break; case Parser_ReadPayload: parser->payload[parser->index++] = byte; parser->crc = crc8_update(parser->crc, byte); if (parser->index == parser->length) { parser->state = Parser_ReadCrc; } break; case Parser_ReadCrc: if (byte == parser->crc) { ++parser->frames_ok; } else { ++parser->crc_errors; } parser_reset(parser); break; } } |
In a real project, the good-frame path would call a message dispatcher or put the frame into another queue. It should not execute a motor command, block on flash, or update a user interface from inside the byte receive path.
Do not ignore UART hardware error flags
Most UART peripherals can tell you about overrun, framing, parity, and noise errors. Those flags are not decorative. If you only read the data register and never record the status, you lose the evidence that separates a parser bug from an electrical or baud-rate problem.
The names in this snippet are intentionally generic because every MCU vendor names the registers differently. The shape is the important part: read status, read the byte according to the reference manual requirements, count the errors, and only pass clean bytes to the receive buffer.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
typedef struct { bool overrun; bool framing; bool parity; } UartStatus; typedef struct { uint32_t overrun_errors; uint32_t framing_errors; uint32_t parity_errors; } SerialStats; static SerialStats serial_stats; void uart_rx_isr_handler(void) { const UartStatus status = uart_read_status(); const uint8_t byte = uart_read_data_register(); if (status.overrun) { ++serial_stats.overrun_errors; } if (status.framing) { ++serial_stats.framing_errors; } if (status.parity) { ++serial_stats.parity_errors; } if (status.overrun || status.framing || status.parity) { /* Good: bad hardware status stops the byte from entering the parser. */ return; } uart_rx_byte_isr(byte); } |
A rising framing error count often points to the clock, baud rate, polarity, or electrical interface. A rising overrun count usually means the firmware is not servicing receive fast enough. Those are different fixes, and the counters tell you which direction to look.
Check the baud rate from the real clock
Serial links are sensitive to timing error. It is easy to set 115200 in code and assume the hardware produces exactly that rate. The real baud rate depends on the oscillator, PLL configuration, peripheral clock divider, UART oversampling mode, and any clock changes made during low power states.
Useful checks during bring-up:
| Check | What it catches |
|---|---|
| Verify the peripheral clock used by the UART | Wrong clock tree or stale board-support setting |
| Calculate baud divider error | Marginal links that pass at one temperature or cable length |
| Capture TX with a logic analyzer | Actual bit time, polarity, idle level, and frame format |
| Test at the final burst rate | Buffers that pass single-byte tests but fail real traffic |
| Test after low power transitions | UART clock not restored or receiver enabled too late |
When the link is near the timing limit, adding random delays can hide the problem on the bench and leave it in the product. Measure the bit time, compare it with the configured clock path, and fix the cause.
Treat transmit as a queue with backpressure
Receive bugs are common, but transmit bugs can be just as damaging. A firmware module that prints too much debug text, a GUI that sends commands faster than the device can process, or a driver that blocks while waiting for TX space can make the whole system look unstable.
The transmit side needs an explicit policy. Either queue the frame, reject it cleanly, or block only in a context where blocking is allowed. Silent truncation is usually the worst choice because the receiver sees a valid start byte and then a damaged frame.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
#include <stdbool.h> #include <stddef.h> #include <stdint.h> #define TX_QUEUE_SIZE 256u static uint8_t tx_queue[TX_QUEUE_SIZE]; static uint16_t tx_head; static uint16_t tx_tail; static uint32_t tx_queue_full_count; static uint16_t tx_used(void) { return (uint16_t)((tx_head + TX_QUEUE_SIZE - tx_tail) % TX_QUEUE_SIZE); } static uint16_t tx_free(void) { return (uint16_t)(TX_QUEUE_SIZE - 1u - tx_used()); } bool serial_try_queue_bytes(const uint8_t *bytes, size_t length) { if (length > tx_free()) { /* Good: backpressure is visible to the caller and to diagnostics. */ ++tx_queue_full_count; return false; } for (size_t i = 0u; i < length; ++i) { tx_queue[tx_head] = bytes[i]; tx_head = (uint16_t)((tx_head + 1u) % TX_QUEUE_SIZE); } uart_enable_tx_empty_interrupt(); return true; } |
This pattern gives the caller a decision point. A telemetry frame may be dropped with a counter. A firmware update frame may need retry logic. A user command may need a busy response. Those policies belong above the byte queue, not hidden inside the UART driver.
Keep the PC or GUI side from blocking the device
Many embedded serial problems are half firmware and half host software. A desktop GUI that reads serial data on the UI thread can freeze while waiting for a response. A Python script that writes a command and immediately sleeps can accidentally create timing that never happens in the real system. A test tool that never drains receive data can make the device block when hardware flow control is enabled.
For host tools and GUIs, keep serial I/O away from the UI thread, set read timeouts deliberately, and log raw frames when debugging protocol issues. If a GUI sends commands, it should have the same kind of state machine as the firmware: request sent, waiting for response, timeout, retry, failed, done.
Remember that RS232 is not just UART with a connector
UART is the logic-level serial peripheral inside the microcontroller. RS232 is an electrical interface with different voltage levels and inverted signaling. A microcontroller UART pin should not be connected directly to a real RS232 port.
Check these details before blaming the parser:
- Use a proper RS232 transceiver between MCU logic levels and the RS232 connector.
- Confirm TX, RX, and ground at the actual connector, not only on the schematic symbol.
- Check whether the cable is straight-through or null-modem.
- Confirm whether hardware flow control lines are required, ignored, or tied to a safe state.
- Measure idle polarity at both sides of the transceiver.
- Check the ground reference and cable length in the installed system, not only on the bench.
Electrical mistakes often appear as framing errors, random bytes, or a link that works with one adapter and fails with another. That is why hardware counters and a logic analyzer save time.
Add diagnostics before the link fails in production
Serial failures are much easier to fix when the firmware already records the reason. At minimum, keep counters for overflow, frame timeout, CRC error, length error, UART overrun, framing error, parity error, TX queue full, and valid frames received.
The counters do not need to be fancy. They need to be readable from a debug shell, status command, test pin trace, or diagnostic frame. During bring-up, the pattern matters more than the exact interface.
| Symptom | Evidence to collect | Likely direction |
|---|---|---|
| Dropped bytes during bursts | RX overflow count, max queue depth | Increase buffer, improve scheduling, reduce burst rate |
| Parser gets stuck | Frame timeout count, parser state trace | Reset partial frames and check sender gaps |
| Random invalid commands | CRC and length error counts | Improve framing, inspect noise, check resynchronization |
| Works at low baud only | Framing errors and measured bit time | Fix clock, divider, polarity, or transceiver |
| GUI freezes | TX queue depth and host thread logs | Move serial I/O off UI thread and add command timeouts |
| Works on bench, fails installed | RS232 levels, grounding, cable type | Check transceiver, pinout, ground, and flow control |
Good diagnostics change the conversation from “serial is flaky” to “we lose frames after a 70 byte burst when the UI requests history while telemetry is enabled.” That is a problem an engineer can reproduce and fix.
Practical takeaways
Start with the frame format, not the parser. Keep receive work short, bounded, and measurable. Use explicit parser states, named protocol constants, length limits, CRC checks, and timeouts. Record UART hardware errors instead of throwing them away. Treat transmit as a queue with backpressure. Separate UART protocol debugging from RS232 electrical debugging.
The main habit is to make failure visible. Once the firmware can tell you whether it saw overflow, timeout, CRC failure, framing error, or TX backpressure, serial communication stops feeling random and starts behaving like an engineering problem.
