I'm going to show you 4 different ways to convert between big-endian, little endian, and vice versa, starting with the slowest because it's also the easiest to understand. I'm also going to reveal a gotcha that can really trip you up. And, I'll show you the best-practise for making endianness handling code work on all CPUs. I'm assuming that you already know what endianness is, if not, see the previous video on this topic.

Method 1: Bit-Shifts

The most obvious way to do endiannness conversions, is to physically move the bytes around using bit-shift operations. For example, here's the code to convert a 32-bit number:

staticinlineuint32_tswapEndian32(uint32_tval) {
    return ((0xFF000000 & val) >> 24) |
        ((0x00FF0000 & val) >> 8) |
        ((0x0000FF00 & val) << 8) |
        ((0x000000FF & val) << 24);
}

You mask off each byte with an *and* operation, shift the byte to the correct location, and then *or* it with all the other shifted bytes. This code works on all systems. But it's slow because it takes multiple instructions to do the conversion, when most modern CPUs have built in byte-swapping instructions. The compiler's optimizer could theoretically realize that you're swapping bytes and optimize it. But, don't count on it.

Method 2: Built-in conversion functions

We could use assembly code to perform the endianness instructions. However, then we'd have to write code specific to each CPU. And, adding inline assembly into C code is non-standard, and varies by compiler. However, C compilers also come with built-in endian swapping functions or macros. They're still compiler-specific, but at least they're CPU independent. These built-in functions should use the best instructions for each CPU. For example, MSVC (Visual Studio) has the following:

 

#defineswapEndian16(x) _byteswap_ushort(x)
#defineswapEndian32(x) _byteswap_ulong(x)
#defineswapEndian64(x) _byteswap_uint64(x)

and both GCC and CLang use:

#defineswapEndian16(x) __builtin_bswap16(x)
#defineswapEndian32(x) __builtin_bswap32(x)
#defineswapEndian64(x) __builtin_bswap64(x)

Gotcha!

And here we get to our gotcha. Let's say that you need to endian-convert a floating-point number. So, you pass it to your endianness conversion function, and, oh oh.... You get the wrong number.

What went wrong? Well, C/C++ automatically cast your float to an integer prior to byte-swapping, and that naturally changed the bits. So you end up with the wrong value.

If you're using C++20 or newer, then you can use std::bit_cast<> (link). Otherwise, you can use a union, as follows:

staticinlineuint32_tswapEndian32F(floatval) {
    union {
        floatf;
        uint32_tu;
    } temp;
    temp.f = val;
    returnswapEndian32(temp.u);
}

Method 3: Networking Stack Conversion Functions

The next method is to use the byte-swapping functions that come with the networking stack. The TCP/IP networking protocol that powers the internet is big-endian, which means that little-endian machines need to convert things like IP addresses to big-endian format. 

To assist with this, networking stacks provide a set of host to/from network byte-order conversion functions. For example, htons() converts a 16-bit integer from host CPU tonetwork byte order, and ntohs() converts from network byte order back to host CPU endianness. Using them makes writing code that works on any CPU easy. No need to think about the byte-order.

There's just one problem with these functions: they only convert endianness on little-endian machines. Big-endian machines already have numbers in big-endian format, and so the host to/from network byte order functions do nothing.

That's not a mistake. They're doing exactly what they need to do.

Method 4: C++23's Byteswap

Thanks to C++23 we now finally have fully platform-independent byte-swapping functions (link). However, it's C++ only, and you need a fairly recent C++ compiler (see here).
If you have that, then you can:

 

#include<bit>

std::byteswap(val16)

C++ also has a way to check the CPU's native endianness since C++20 (see here).

Endian Conversion Best Practise

The example code I've shown so far isn't best practise for writing multi-platform software. That's because, it's always performing endianness conversion. Real-world code that reads/writes big-endian data only needs to perform endianness conversion on little-endian CPUs, and code that reads/writes little-endian data only needs to perform endianness conversion on big-endian machines.

We can take our cue from the network byte swapping functions. Remember how they convert to/from host format to network byte order? And those functions do nothing on big-endian CPUs because no conversion is necessary. Well, do the same with your code, and create functions that convert little-endian and big-endian to/from whatever the native CPU format is. Like this...

 

#ifdef IS_BIG_ENDIAN
#define cpuToBE16(val) (val)
#define beToCPU16(val) (val)
#define cpuToLE16(val) swapEndian16(val)
#define leToCPU16(val) swapEndian16(val)
#else
#definecpuToBE16(val) swapEndian16(val)
#definebeToCPU16(val) swapEndian16(val)
#definecpuToLE16(val) (val)
#defineleToCPU16(val) (val)
#endif

It might seem a bit redundant to have cpuToBE32() and then beToCPU32(). After all, they do the same thing. However, code wise it's clearer what's going on. The function name tells you whether the code is converting to or from the native CPU's endian format. So, at a glance, you see what the code's purpose is.

Download

All the code in the video is available in the Kea Campus for Creator & Elite tier members. Use the template code in your own projects. Click here to join the Kea Campus to get the code, and a whole lot more.