Better control of database read replicas #2874

dabeeeenster · 2023-10-23T09:18:09Z

Add the ability to specify read database regions for failover or performance.
I want to be able to set the order that Flagsmith uses for reads. Right now the code randomly picks a replica which isn’t super helpful.

zachaysan · 2023-12-14T19:55:48Z

I did a review online of common approaches and here are some of my findings:

Cross regional database reads are rare. Most of the documentation focusses on local replicas.
There are two main approaches for failover from one replica to another. The more prevalent way of approaching it is using a heartbeat connection to the replica in question (either before handing off to Django or by storing it locally as a cache that's refreshed once a second) and if the replica is still online, hand off the replica to the reader otherwise hand it off to the next available replica (or even the default database if there is none). The secondary approach uses middleware to intercept database queries, this approach looked inferior to the first solution.
Falling back from the primary database to a promoted read replica is not widely covered by online sources. I think it should be possible but we would be venturing into the unknown and would come with unknowns like WAL log non-replication between the primary and the secondary databases.

Given what I've read online so far, I think we can pretty easily handle this ticket the following way:

Implement a heartbeat connection to our replicas using a Django cache to avoid querying every cycle.
Create a secondary set of CROSS_REGION_REPLICA_DATABASE_URLS in settings.py. These would not be used unless the local regional replication databases have fallen over.
Keep the current REPLICA_DATABASE_URLS in settings.py which are the first line of querying for reads.
In order to support setting orders instead of randomly distributed the reads I suggest a new settings.py variable called REPLICA_READ_STRATEGY set to either DISTRIBUTED which is the current approach of spreading reads across replicas, or set to SEQUENTIAL which would try, sequentially, the REPLICA_DATABASE_URLS in order and then fallback to CROSS_REGION_REPLICA_DATABASE_URLS and follow them in order once they're all exhausted.

I'm not sure if we should complicate it more than that, but one of the downsides of this approach is that if the REPLICA_DATABASE_URLS have suffered loss to the point where only a single replica is remaining, the load may be high on it even though the CROSS_REGION_REPLICA_DATABASE_URLS may be on standby. We could consider introducing another new variable that specifies a minimum distributed replica pool, which could mix the two pools if necessary, but I doubt this strategy is really necessary.

zachaysan · 2024-02-14T14:27:40Z

Solved in #3300

dabeeeenster added improvement Improvement to the existing platform api Issue related to the REST API labels Nov 3, 2023

matthewelwell assigned zachaysan Dec 13, 2023

zachaysan mentioned this issue Jan 16, 2024

feat: Add support for replicas and cross region replicas #3300

Merged

5 tasks

zachaysan closed this as completed Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better control of database read replicas #2874

Better control of database read replicas #2874

dabeeeenster commented Oct 23, 2023

zachaysan commented Dec 14, 2023

zachaysan commented Feb 14, 2024

Better control of database read replicas #2874

Better control of database read replicas #2874

Comments

dabeeeenster commented Oct 23, 2023

zachaysan commented Dec 14, 2023

zachaysan commented Feb 14, 2024