At the end of the previous tutorial I hinted that there was a flaw in the UART's design. Did you find it? The problem is the prescaler (or clock divider); it doesn't deliver an accurate clock. Today, we're going to fix that.

The Problem

The generated clocks aren't exactly the values that they should be. Let's do the calculations manually and see what we got:

  • TX_DIVISOR = CLOCK_FREQ / BAUD_RATE = 32,000,000 / 115,200 = 277
    • Resulting baud rate = CLOCK_FREQ / 277 = 32,000,000 / 277 = 115,523 Hz
    • Baud rate discrepancy = 115,523 - 115,200 = 323 Hz
  • RX_DIVISOR = CLOCK_FREQ / (OVERSAMPLING * BAUD_RATE) = 32,000,000 / (16 * 115,200) = 17
    • Resulting sampling clock = CLOCK_FREQ / 17 = 32,000,000 / 17 = 1,882,352 Hz
    • Resulting baud rate = resulting sampling clock / OVERSAMPLING = 1,882,352 / 16 = 117,647 Hz
    • Baud rate discrepancy = 117,647 - 115,200 = 2,447 Hz

NOTE: These calculations are using integer arithmetic, so there is no decimal point or fractional part.

As you can see, neither the transmitter nor the receiver run at the desired 115,200 baud rate. In fact, they don't even run at the same speeds as each other. The baud rates are just over 2 KHz apart.

But, it Worked in Simulation

Yes, it did. However, there are a few things to bear in mind. Firstly, the design can tolerate some difference (a 2 KHz difference equates to just under 2% difference at 115.2 KHz). The receiver's 16x oversampling helps with this. Secondly, the transmitter and receiver are both use the same master clock. This won't be true when it's connected to other devices.

It's important to get the clock as close as possible to the target frequency when real hardware is involved. The current design might work, but data corruption and framing errors are also a distinct possibility.

The Solution

A fractional prescaler is the solution to our woes. Fractional prescalers divide by slightly different values each time such that the average clock speed is the target frequency. This does result in some jitter (frequency fluctuations), but it'll be within acceptable limits.

Here's the "perfect algorithm" for a fractional prescaler:

accum += fdes / fin
if(accum >= 1) {
	emit tick
	accum -= 1
}

Fin is the input frequency, fdes is the desired frequency, and accum is a counter. Accum is incremented until it reaches 1. At that point one fdes clock cycle has occurred and a clock tick is emitted. The counter is reset at the same time, but in a manner that preserves how far the timing overshot the ideal clock period (accum -= 1). Keeping track of the overshoot is the key to handling fractional clock divisions.

The algorithm above can't be used as-is because it requires fractional numbers to be stored with infinite precision. Multiplying everything by fin enables integers to be used:

accum += fdes;
if(accum >= fin) {
	emit tick
	accum -= fin
}

We could implement it now, but the >= operation would consume a fair amount of logic. This is where it helps to understand digital hardware and the tricks you can pull. With signed integers the Most Significant Bit (MSB) indicates if the number is negative. So, shifting the comparison to 0 reduces the comparison to checking one bit. You can't get simpler than that! Here's the new algorithm:

accum += fdes;
if(accum >= 0) {
	emit tick
	accum -= fin
}

See this article for details on how negative numbers are stored. 

Finally, let's rework the pseudo-code so that only one arithmetic operation is required per clock:

if(accum >= 0) {
	emit tick
	accum += fdes - fin;
} else {
	accum += fdes
}

Implementation

Enough theory; let's translate the algorithm to actual code. Start with the previous tutorial's code, and replace the constants in UartConsts.cx with:

bundle UartConsts {
	typedef i17 ClkCounter_t;
	
	u32 CLOCK_FREQ = 32000000;    // clock in Hz
	u32 BAUD_RATE = 115200;
	u32 OVERSAMPLING = 16;
	u32 SAMPLING_FREQ = OVERSAMPLING * BAUD_RATE;
	
	u32 COMMON_SCALE = 800; // Reduce the number of bits needed (setting clock to nearest 800Hz)

	ClkCounter_t FREQ_IN = CLOCK_FREQ / COMMON_SCALE;
	ClkCounter_t FREQ_TX = BAUD_RATE / COMMON_SCALE;
	ClkCounter_t FREQ_RX_SAMP = SAMPLING_FREQ / COMMON_SCALE;
}

Notice how all clock values are divided by COMMON_SCALE. This allows us to use fewer bits in the counter without losing any accuracy (all clocks are divisible by 800). Also, look carefully at ClkCounter_t. It's been changed to a signed integer because that's what the new algorithm needs.

The New Prescaler

Both the transmitter and receiver will use the new prescaler, so create a new task called Prescaler.cx:

task Prescaler {
	const int accumWidth = 32;
	const int freqIn = 32000000;
	const int freqOut = 115200;
	
	out bool tick;
	
	int<accumWidth> accum;
	
	void setup() {
		accum = -freqIn;
		tick.write(0);
	}
	
	void loop() {
		// Emit a tick at freqOut Hz
		bool accumIsNeg = accum[accumWidth - 1];
		if(accumIsNeg == false) {
			// Emit tick for one clock cycle
			accum += freqOut - freqIn;
			tick.write(1);
		} else {
			accum += freqOut;
			tick.write(0);
		}
	}
}

It may look like this prescaler is hard-coded with one output frequency. However, Cx allows constants to be overridden when tasks are instantiated  (i.e., used inside a network). So both the transmitter and receiver can use the prescaler despite them requiring different frequencies.

As said earlier, >= can be implemented simply by checking the counter's MSB, which is done in this snippet:

bool accumIsNeg = accum[accumWidth - 1];
if(accumIsNeg == false) {

Updating UartRx

The first step is to import the new prescaler into UartRx.cx:

import com.keasigmadelta.simpleuart.Prescaler;

Now replace the original prescaler task with:

prescaler = new Prescaler({accumSize: 17, freqIn: FREQ_IN, freqOut: FREQ_RX_SAMP});

Congratulations! UartRx now uses the new prescaler. Look carefully at the parameters passed to the prescaler in the line above. They override the constants with specific values so that we get the desired frequency.

Updating UartTx

Adding the prescaler to the transmitter is a bit more complicated. First, import the prescaler into UartTx.cx:

import com.keasigmadelta.simpleuart.Prescaler;

Next, convert UartTx from a task to a network and add the prescaler. The first part of the new network becomes:

network UartTx {
	/** The input port.
	 */
	in sync ready u8 din;
	
	/** The serial output
	 */
	out bool dout;
	
	/** The prescaler generates the transmitter's serial clock
	 */
	prescaler = new Prescaler({accumSize: 17, freqIn: FREQ_IN, freqOut: FREQ_TX});

The original UartTx code has to be moved to an internal task within the new network. It must also be connected to the prescaler. This results in the following network:

network UartTx {
	/** The input port.
	 */
	in sync ready u8 din;
	
	/** The serial output
	 */
	out bool dout;
	
	/** The prescaler generates the transmitter's serial clock
	 */
	prescaler = new Prescaler({accumSize: 17, freqIn: FREQ_IN, freqOut: FREQ_TX});
	
	
	transmitter = new task {
		u8 data;
		u4 i;
		
		void setup() {					
			// UART signal is high when there's nothing
			dout.write(true);
		}
		
		
		void loop() {
			data = din.read();
			
			// Send the start bit
			StartBit:
			dout.write(0);
			waitUARTClock();
			
			// Send the bits
			SendData:
			for(i = 0; i < 8; i++) {
				SendBit:
				dout.write(data[0]);
				data >>= 1;
				waitUARTClock();
			}
			
			// Send the stop bit
			StopBit:
			dout.write(true);
			waitUARTClock();
		}
		
		/** Waits one UART clock cycle
		 */
		void waitUARTClock() {
			while(prescaler.tick == false) {
				// Keep waiting
			}
		}
	};
}

Notice how the internal task can access parts of the network it's in, e.g., prescaler.tick, din, dout. There's no need to create and connect up ports for these, which saves a few lines of code.

Testing the New Design

Right-clicking on UartTest.cx and selecting Run As => Simulation. All going well, the simulation will complete with no errors as before. It may look like nothing has changed, but that's because the print statements don't reveal the internal workings. You'd have to examine the simulated signals in detail to see the changes using something like ModelSim or GtkWave.

Final Words

That's it for today. The improved design uses an fractional prescaler which delivers a much more accurate baud rate.

We still haven't tried it on real hardware. I'm afraid that I ran out of time, so that'll have to wait. Plus, there's a fair bit to get through in this tutorial already.

NOTE: This excamera.com article was instrumental to me figuring out the fractional prescaler, so a big thank you to James Bowman for writing it.