• Home
  • Blog
  • Contact
  • Projects
  • Content Mirror
  • Matrix Guide

  • Emulating 64-Bit Integers In C

    August 14, 2023

    As much as I love the C programming language and writing programs with it, the C89 standard sometimes requires clever workarounds to maintain portable code. I choose to write C89 because I can think of multiple compilers that do not fully support anything newer 1, and I want my code to run any compiler, not just the most popular ones. Writing standard C89 is a great way to ensure this is possible.

    Unfortunately, C89 comes with some frustrating limitations, although so far I have yet to find a limitation that can't be worked around in some fashion. One particular frustrating but workable limitation is that C89 defines no integer type that must be greater than 32 bits 2. The largest integer type is long, which is only gauranteed to be at least 32 bits. This is problematic if we require 64 bit integers—as is the case with my latest project, Telodendria. The Matrix specification—which Telodendria strives to implement as closely as possible—requires timestamps to be expressed as the number of milliseconds since the Unix epoch 3. This exacerbates the 2038 problem 4 significantly because even though we have until 2038 to deal with the number of seconds overflowing, the number of milliseconds since the Unix epoch has already exceeded what can fit in a 32-bit timestamp.

    The implication of this is that in order to implement a Matrix homeserver—or anything else for that matter—where we must express the time in milliseconds since the epoch, we require a 64 bit integer type. At first glance, it appears that there is no no standards-compliant way to implement a Matrix homeserver in C89, a most unfortunate tragedy for a project that has set out to do exactly that.

    As the situation stands, if I don't wish to write a bunch of additional code to manually implement 64-bit integers with the current selection of integer types, then I really only have the following options:

    Neither of these options is desirable; the former would exclude all 32 bit computers—the users of which Telodendria is the primary target because the other Matrix homeservers have made no effort to reach them—and the latter would similarly destroy the portability that Telodendria strives to achieve by locking users into using a particular compiler, or having to support all the different compilers with a bunch of preprocessor macros.

    Seeing no other good option, and because I am stubborn and refuse to switch my compiler target to C99, I decided to write—in pure C89—my own emulated 64 bit integers. With this solution, the only assumption I made is that the target platform supports a native integer type of exactly 32 bits. My implementation features:

    I achieved these goals by simply declaring a 64 bit integer—whether signed or unsigned—as struct containing an array of two unsigned 32 bit integers, the first integer representing the low bits, and the second representing the high bits. I chose this representation over two other representations I was considering; the first was representing the 64 bit integer as a struct containing two members, the high bits and the low bits, but I don't think this gaurantees that the struct will be the proper size due to padding and alignment tricks that the compiler may try to pull. The other representation I was considering was making it just an array of two integers. However, this would make it impossible to pass integers by value to a function, because passing arrays to functions is always by reference. For an integer, that would make no sense.

    By wrapping the array in a struct, I solved all of these problems. My new emulated 64 bit integers now look like this:

    typedef struct
        UInt32 i[2];
    } UInt64;
    typedef UInt64 Int64;

    As you can see, a signed integer uses the same bit representation as an unsigned integer, so I typedef-ed them to the same type. But now that I have my types, how do I perform operations on them? Integers are only useful to computers if they can be manipulated somehow. First, I defined the operations that I need to be able to perform. In my mind, there are two types of operations: “primitive” operations which can be reduced no further than the ALU circuitry required to complete them, and “composite” operations, which can be implemented at an abstract level by using only the primitive operations 5 or by interpreting the bits differently.

    The “primitive” operations are as follows:

    The “composite” operations—that is, operations that can be performed using only the above operations—are as follows:

    These operations may not be exactly in the right categories, but for my purposes, this division will do. While I was defining the operations, I took special note of the operators that are required to be aware of the “signed-ness” of the integer—that is, behave differently depending on whether or not the integer is signed or unsigned. These operations are as follows:

    You may find this list smaller than expected, but I was intentional in omitting addition and subtraction, because these rely on the two's compliment representation of negative numbers. Two's compliment allows these operations to work with negative numbers automatically.

    Once I made these lists, I sat down and got to work, carefully thinking through how an ALU would perform each of the operations. The bit shifting algorithms surprised me both at their complexity, yet also their elegance. While more difficult a problem to solve than I would have expected, the end result is one that runs in constant time and utilizes only one branch. To implement multiplication, I chose the Russian Peasant 6 method, which, although still dependent on the size of the integers being multiplied, completes in fewer iterations in general than multiplication by repeated addition of one of the operands. Division and the remainder operator were implemented using classic binary long division. Surprisingly enough, this algorithm runs in constant time, iterating over each of the 64 bits of the dividend and performing a number of bitwise operations on the quotient and remainder.

    Since I could not rely on the sprintf() family of functions to provide an easy way to convert 64 bit integers to strings or print them to files, I also provided two functions called Int64Str() and UInt64Str(), which convert a 64 bit integer to its string representation in any base between 2 (binary) and 16 (hexadecimal). These functions interpret the bits as signed or unsigned respectively.

    The result of all my efforts is a library that contains two headers, called Int64.h and UInt64.h, and two matching C files, which implement signed and unsigned integer operations, respectively. I bundled this code into Telodendria's general support library, called Cytoplasm—more on that in a future post, if time allows. I have also integrated support for using emulated 64 bit integers in Cytoplasm's JSON API, since it is quite common to send timestamps, IDs, and other large numbers via JSON.

    Now, the important takeaway here is not that you should be using emulated 64 bit integers. Cytoplasm takes care, whenever it can, to use native 64 bit integers when they are available. It is also intended to have compiler extensions patched in if a particular compiler or platform is being specifically targeted. The reason for this is that the emulated 64 bit integer operations are slow. I didn't do any benchmarks, but it doesn't take benchmarks to see; what an ALU can do in a single cycle and with a single instruction takes at most 5 instructions and a branch, in the case of bit shifting. Bitwise operators now take two instructions to complete instead of just one, effectively doubling the time it takes to execute them. Additionally, there is the overhead that comes with making a function call to perform these operations.

    At this point you might be asking “why even bother implementing emulated 64 bit integers?” That really is a great question, and for most people, the answer is that you shouldn't because your target platform or compiler most likely has C99 and int64_t support. But sometimes, you may not have such support. In Telodendria's case, 64 bit integers are not a gaurantee, but they are absolutely required to do its job properly, so having an implementation to fall back on is a very good thing because it makes Telodendria more portable.

    This project was also a great learning experience. While I have taken computer architecture classes that explain the concepts behind computer arithmetic, actually implementing those concepts myself further cements them in my mind, and allows me to have a reference for future use. If you want to check out the code, you can do so via anonymous CVS, or GitHub. If you have trouble finding the code, please feel free to contact me.

    I originally sat down to write this post so I could actually explain the algorithms I used, because there are few to no resources on the internet about how to math on integers larger than the native word size of your platform. While I really would love to explain my methodology and I fully intended to do so here, I have unfortunately run out of time to do that, so I will rely on the code to explain them for me. You should find the code to be succinct and elegant, hopefully more so than you'd expect for a C89 library that literally re-invents the wheel—or in this case, the ALU.

    1. tcc, pcc, though it claims it is targeting C99. 

    2. Another is that C doesn't mandate any exact-width types, which I know is so that it can run on computers with odd word sizes, but it is still annoying that the standard specifies the size of the integer types in terms of “at least” instead of “exactly”. C99, though it provides stdint.h, does nothing to solve this problem because the exact-width types can simply be left undefined according to the standard if they are not natively supported on a platform. 

    3. Timestamps (Matrix Specification). 

    4. On January 19, 2038, at 03:14:08 UTC, the date in seconds since the Unix epoch will no longer be able to fit in a 32-bit timestamp. It will overflow and wrap around to 1901. 

    5. Although for the sake of speed, these operations are often performed directly by the ALU in a more efficient manner than composing them of multiple instructions. 

    6. Ancient Egyptian Multiplication (Wikipedia). 

    © 2019-2024 Jordan Bancino.