Levans' workshop

Experiments and thoughts about Machine Learning, Rust, and other stuff...

Compressing Synapse database


Anyone running a federating instance of the Matrix homeserver Synapse will likely have seen this: synapse is database-hungry. It tends to take a lot of space. In this post, I'm documenting how I shrunk my homeserver database from 100GB to a little under 8GB, during a long maintenance cleanup.

There are mostly three reasons the synapse database tends to get large over time:

  • Stuff that no longer needs to be kept around and should be deleted
  • Synapse is extremely cache-happy, and this takes a lot of space
  • Table bloat & Index bloat in PostgreSQL

Lets tackle all 3 of them, one after the other

Stuff that no longer needs to be kept around

Forgotten rooms

The first and immediate thing that can be deleted are rooms which no longer contain any of your local users. These rooms take space in your database, and no longer serve any purpose. At some point synapse may learn to garbage-collect them, but for now we need to do this by hand using the admin synapse APIs.

First of all we need to identify these rooms. We can retrieve the roomlist using the rooms API, like so:

curl --header "Authorization: Bearer <your access token>" \
    'https://matrix.my.home/_synapse/admin/v1/rooms?limit=300' > roomlist.json

You'll need to replace <your access token> by an access token of an admin account, and you can set limit=XXX to some value higher than your number of rooms, to ensure you get the whole list. If you are unsure of the total amount of rooms your HS has, the returned JSON contains a "total_rooms" key with that count.

Then, we can extract from that list the rooms with no local users:

jq '.rooms[] | select(.joined_local_members == 0) | .room_id' < roomlist.json > to_purge.txt

And then, we can purge these rooms one after the other using the purge API:

curl --header "Authorization: Bearer <your access token>" \
    -X POST -H "Content-Type: application/json" -d "{ \"room_id\": \"$room_id\" }" \
    'https://matrix.my.home/_synapse/admin/v1/purge_room'

Note that these last requests won't return until synapse effectively has purged the room, which may take a while. As such, if your synapse is behind a reverse proxy, you might want to run these directly from the server hosting it on the local socket, to avoid your reverse-proxy from time-out-ing synapse. You'll want to wait for one purge to have finished before starting the next, so as to not overload your synapse and database.

Old history of large rooms

Very large rooms such as #matrix:matrix.org tend to accumulate quite a lot of history, and you may want to get rid of it. You can do so using the purge history api:

curl --header "Authorization: Bearer <your access token>" \
    -X POST -H "Content-Type: application/json" \
    -d '{ "delete_local_events": false, "purge_up_to_ts": 1577836800000 }' \
    "https://matrix.my.home/_synapse/admin/v1/purge_history/$room_id"

Here, you can decide to also delete historical events from the users of your HS (for which you are the original source, meaning no-one will be able to retrieve them from you any more), and you can specify some timestamp in milliseconds up to which the history should be purged.

Optimizing synapse cache

Now that all the stuff to be removed has been removed, we can start tidying up the rest, and compressing Synapse cache. This step will need to be done on the server, as we'll directly access the PostgreSQL database.

The main table responsible for database bloat is state_groups_state, which is inefficiently managed. But luckily, the matrix team have developed a tool that can be used to selectively compress the state of a room: synapse-compress-state. You'll need to clone it and compile it (it's a rust program, just cargo build --release it and copy the resulting binary on your server).

Now, this tool needs to be run individually on rooms, so lets first identify which rooms are in need of such a compression. They are generally very large rooms and rooms bridged to IRC, which encounter a lot of state changes due to the numerous leaves and joins.

This SQL request will give you a list of your rooms and their number of group states:

SELECT room_id, count(*) AS count
    FROM state_groups_state
    GROUP BY room_id
    ORDER BY count DESC;

Overall, I considered rooms with a count larger than 100.000 to be in need of being cleaned up.

To analyze a room for compression, run the previously linked tool like so:

synapse-compress-state -t -o state-compressor.sql \
    -p "host=localhost user=<synapse db user> password=<synapse db password> dbname=<synapse db>" \
    -r "$room_id"

This command will not modify your database, but instead generate a file state-compressor.sql containing the changes that it would apply. It'll also give you a summary of how much it has compressed the state of the room (for some of my rooms it went down to 0.4% of the original size). If you are happy with the compression, you can then apply the changes:

psql -U '<synapse db user>' '<synapse db name>' < state-compressor.sql

And repeat to compress all of your rooms that need it. You can do all of this while your synapse instance is running.

3. Table bloat & Index bloat in PostgreSQL

NOTE: This last step will require that you stop your Synapse instance to be done.

After all those steps, your database size as reported by PostgreSQL using \d+ may not have changed much. This is because PostgreSQL does not automatically free all this space, but instead keeps it around to reuse it when needed. This is generally a good idea, but we are here to reclaim this space because it took too much!

Index bloat

The first part to handle is the index bloat. Basically, the B-Tree structure that PostgreSQL uses for its indexes can become very space inefficient if a lot of entries are removed in a table. To solve this the remedy is simple: rebuild all the indexes.

To do so, run this in PostgreSQL as the postgres superuser:

REINDEX (VERBOSE) DATABASE <synapse db>;

This will lock a large part of the database while it runs, hence why Synapse needs to be shut down in the meantime.

On my database, this REINDEX freed about 40GB of space.

Table bloat

Similarly, we can tell PostgreSQL to return to the operating system all the space it is no longer using, using the VACUUM command, again to be run as postgres superuser in the Synapse database:

VACUUM FULL VERBOSE;

PostgreSQL have several kinds of vacuum routines. The VACUUM FULL will lock the database and copy each table into new storage, freeing the previous one. This frees up space and compacts back the tables, solving fragmentation issues.

And voilà, with all this, your Synapse database should have significantly shrunk in size. If not, that means it was in a way better shape than mine. ;)