As you may or may not already be aware, I am currently working on developing a new Matrix 1 homeserver implementation 2. In this post, I want to sketch out my implementation of the Matrix specificiation 3 that I plan on using, both to clarify my own thoughts, and because I am struggling to find resources explaining how Matrix homeservers are actually implemented. It's difficult to read the source code for the existing implementations because they are so large, so their code bases require such a deep understanding in order to even understand the concepts of how they work. So instead of writing all of the code first and just dumping it onto my CVS server, I feel like I can do a better service to the Matrix community by actually explaining more of the concepts than what the specification does so that the implementation becomes a lot easier to deduce for oneself. For me, writing code is the easy part once I understand a topic. So, without further ado, let's get into the technical details of the Matrix specification.
A Room in Matrix terminology is simply a directed acyclic graph (DAG) of Events. An Event is a—mostly—self-contained unit of data which contains a few pieces of information required by the homeserver, but otherwise has few restrictions on what can be contained in it. The meaning of an event is derived entirely by the client; the server uses a few pieces to maintain the relationships between and authentication chain of the events in a room, but otherwise does not care what an event contains.
Events are related to each other both chronologically, and by how they authorize each other to exist in a room. In a sense, a Room is actually two DAGs created out of the same set of events; one in which the edges of the DAG express a chronological relationship, where events are connected by an edge to the events that came before them, and the other where the edges express a state relationship, where events are connected by an edge to the events that authorize them to exist in the graph.
A Matrix homeserver traverses each of these DAGs primarily to perform the following two tasks:
State Resolution is the process by which the homeserver traverses a room's DAG to determine what the correct State of the room is at any given point in time, namely, at any given Event in the room. As we will see in a moment, this is actually a rather complicated process due to the distributed and decentralized nature of the protocol.
Linearization is the process by which the homeserver traverses a room's DAG and organizes events into a stream of events, one after another, for the client to process. While I have yet to find the details of this process in the Matrix specification, my current understanding is that this is just a topological sort, which is also used in state resolution.
The Matrix specification defines State as "a map of
(event_type, state_key) to
event_id." In other words, State is a collection of
key-value pairs where the key is the event type and a special state
key, and the value is an event ID. State is used to describe a room,
including giving it a name, topic, and aliases, as well as recording
what users are a part of the room. It is also used to authorize
events, because the State describes who has permissions to send
what types of events.
A room's State is updated when a State Event is sent to the room. Each state event must have a type and a state key, which is typically a blank string, indicating that it applies to the entire room, but in some specific cases is the ID of a user to indicate that the state only applies to that user, such as when a user joins or leaves a room. When the state event is sent, it updates the room state at that point in time to make its type and state key point to itself. The state event describes the changes in state to be applied. This is how Matrix stores all the information about a room.
State resolution algorithms are necessary because Matrix is a decentralized protocol where every homeserver that is participating in a room has its own copy of the room. The room does not belong to any one homeserver, rather all homeservers contribute to it equally. This can become problematic if the homeservers are unable to communicate with each other quickly enough, because it is likely that clients will still be sending messages to the server, and the server will still apply them to the room. This means that the room's state can potentially be updated indendently on each homeserver participating in the room, so when the homeservers are able to sync up, they need some way to arrive at a consistent state.
All of this information is easily derived from the Matrix specification itself. But what is less clear is how to go about actually implementing all this in code. Matrix is a very complex protocol with a lot of algorithms and data structures. These are explained in the specification with a practical degree of precision, but what is not explained is how to actually store the data and when to run the algorithms on it. Both of these are implementation details left to the developers of homeservers, as they should be.
What follows is by no means the only way to implement the specification, I'm sure, but it is the way I have gone about it in my mind, though I must admit that I have looked at very little code while thinking through this, and I have yet to write any of the code that implements this process. Nevertheless, I hope I can offer some clarification for those that want to understand how Matrix homeservers actually do the things that the specification says they need to do.
The most important thing to figure out is what a homeserver actually needs to store in the database apart from room state. As far as I have deduced, this is essentially just a list of events at the bottom—or top, depending on how you want to look at it—of each room's event DAG. I tend to want to call these the leaves, although the DAG is not exactly a tree so I am not sure if the term really applies, but either way, the homeserver needs to be able to keep track of all the tips of the forks in the room. Ideally there will only be one tip most of the time, but it's possible that there may be multiple when federation occurs. The homeserver records these events not only for traversal purposes, as required for state resolution and linearization, but also to know what the previous events are when a client sends an event. In order to be able to effectively attach client events to the DAG properly, we must have a list of the most recent events.
The algorithm for maintaining this list is simple; when an event arrives at the homeserver from another homeserver, it iterates over all of the events it specifies to be its previous events, and removes them from the list if they are present. It then adds the incoming event to the list. This works because it ensures that no "leaves" are lost, and no leaves that are no longer leaves remain in the list. Perhaps a visual representation is necessary to effectively demonstrate this concept.
Consider a Room with events
E3 that looks like this:
E1 | E2 | E3
Now consider that event
E4 arrives and specifies
E2 as its previous
event because the homeserver that
E4 originated on was not aware
E3 at the time. The DAG now looks like this:
E1 | E2---+ | | E3 E4
E2 is not in the list of most recent events, so
nothing is removed from it.
E4 is then added to the list, because
it is now a leaf. This covers the situation in which a fork is
created. Now, consider that event
E5 arrives at the server and
E3 as its previous event. The DAG now looks like this:
E1 | E2---+ | | E3 E4 | E5
E3 as its previous event,
E3 is removed from the
E5 is added.
E4 is still in the list. Now lets look at
what happens when an event
E6 arrives that resolves the fork by
E5 as its previous events. The DAG now
looks like this:
E1 | E2---+ | | E3 E4 | | E5 | | | E6---+
E5 were specified as previous events, they are
both removed from the room's stored list and then
E6 is added. The
fork has been resolved.
When a client sends an event to the homeserver, it specifies no previous events. The homeserver must determine what they are for itself. Since the homeserver keeps a list, it simply copies this list to the client event. If there is more than one previous event, the client's event resolves the forks as shown in the previous diagram.
Homeservers may also find it desirable to cache the state of the room at each event so that it does not have to continuously re-run the state resolution algorithm whenever the state is needed. As far as I know, all homeservers do this, but it is not strictly required by the specification, and it would be advantageous if the homeserver implementation could cope with previous states missing from the cache by simply recalculating them.
Other than the DAG's current leaves and the state at each event, a Matrix homeserver has to store nothing else for a room. Everything that clients and other servers will ask for can be derived from the resolution of the state at each of the current leaves, or the state at a particular event. This even includes the room version, which is derived simply by opening up the event at the m.room.create key in the room state and checking the version there.
Now that we know what we are storing, we can move on to how we actually ingress events. The Matrix specification is unclear as to what is supposed to happen when events are received at the API endpoints. There are two ways an event can arrive. It can arrive via the federation API from another homeserver, or it can arrive from a client. Both are handled slightly differently.
When a homeserver receives an event from another homeserver, the event already has a list of previous events, so the homeserver should perform the following tasks in order when an event arrives:
When a homeserver receives an event from a client, the event is not fully formed and ready to be added to the room, so the procedure is a little different:
As you can see, there are a few differences in the order that things must happen in depending on how the event arrives. Note that the size check is performed so late in the process because the size limit stated in the specification depends on all parts of the event being present, so we have to generate those parts first.
The process for ingressing events should now be clearer, although the state resolution algorithm is still a black box. It is important to note that the algorithm is not static; it depends entirely on what version of room we are dealing with. The Matrix specification defines many room versions each with their own nuances, and room version 2 introduces an entirely new state resolution algorithm. There are currently two state resolution algorithms that have to be implemented by a homeserver. It is not enough only to implement the latest algorithm because many rooms exist in the Matrix community that still use the old algorithm.
The state resolution algorithms are throughly documented in the Matrix specification for each room version, however these algorithms can be a little cryptic at first glance. I hope to explain the state resolution algorithms in terms of practical implementation in a a future blog post, so stay tuned for that. In that post, I will explain how to derive the inputs for the state resolution algorithm, what the output means, and then how to actually go about implementing the algorithms.
© 2019-2023 Jordan Bancino.