Sunday 26 June 2022

Postgresql Read Replica

 PostgreSQL streaming read replications 


Amazon RDS users can create up to five DB instances using a snapshot of the source PostgreSQL DB instance.

 It uses the native streaming replication available on the PostgreSQL engine to asynchronously update the read replicas whenever there is a change to the source DB instance.


Amazon RDS writes all changes to the source DB instance, including data definition language (DDL). 

Replicas are read only and used to handle the read request workload.


The following diagram shows how Amazon RDS PostgreSQL performs replication between a source and a replica in the same AWS Region. 

In this example, Amazon RDS is hosting the source and read replica. The user can then stream replication directly between the source and read replica. The user has stored archived Write Ahead Logs (WALs) using an Amazon Simple Storage Service (Amazon S3) 

Then, if needed, the Amazon S3 backup can restore WALs in the read replica





Tuning Amazon RDS replication


You can tune the Amazon RDS PostgreSQL replication process to control how the replication behaves.

 This tuning can affect how far behind a replica is allowed to get before canceling newly submitted queries. 

For more information about tuning, see the Amazon RDS User Guide.


Select each + to learn more about Amazon RDS replication tuning commands


wal_keep_segments

This specifies the number of WAL file segments to keep on the primary instance.

Amazon RDS PostgreSQL archives segments to Amazon S3 and must download segments needed to sync a replica that has fallen behind.


hot_standby_feedback

This parameter is set on each replica. It controls whether the replica sends feedback to the primary database about queries running on the replica. 

This prevents the vacuum from removing tuples on the primary that are still needed to satisfy long-running queries on a replica.


max_standby_archive_delay

This command and the max_standby_streaming_delay command configure how long a replica will pause WAL replay to complete queries running on the replica.

If a query causes the delay to exceed the maximum, the query is canceled so that WAL replay can resume.


Aurora read replicas


Aurora further increases your horizontal scaling capability through the use of read replicas. 

Aurora uses a solid state drive (SSD) virtualized storage layer. The SSD facilitates the efficient handling of database workloads.


Aurora read replicas share the same underlying storage as the primary source instance. 

This helps reduce costs and eliminates the need to copy data to the read replicas.

 Horizontal scaling of Aurora read replicas is extremely fast, no matter how much you scale the size of the database. 

This is because there is no need to copy the data to the new read replica.


For more information about replication with Aurora, see Replication with Amazon Aurora.


To learn more about how read replicas work in Aurora, select each numerical button in order.








Cross-Region Replicas

Amazon RDS PostgreSQL also supports cross-Region replicas. 
You can create a read-only copy of your database in a separate AWS Region for global applications. 
This is useful when you want to serve read queries locally to your users to help improve performance and reduce the local response time.

You can also use these cross-Region replicas to move a failover into a separate Region for faster disaster recovery.

Cross-Region read replicas in Amazon RDS:

Improve your disaster recovery capabilities.
Scale read operations into an AWS Region closer to your users.
Help migrate from a data center in one AWS Region to a data center in another AWS Region.
Promote remote readers to a primary for faster recovery in the event of a disaster.

Cross-Region replication process

The following diagram shows how Amazon RDS performs replication from the PostgreSQL source instance in AWS Region 1. 
It streams between the source and the read replica through internet gateways to the read replica in AWS Region 2.

For more information about creating, configuring, and monitoring cross-Region replicas, 

see the RDS documentation and the AWS Database Blog for best practices with cross-Region replication.






Logical Replication

You can manually configure replication using PostgreSQL logical replication starting in PostgreSQL 10. For versions earlier than 10, 
you can use an external tool such as Bucardo or AWS Database Migration Service (AWS DMS).

Users build logical replication on top of streaming replication by applying a logical decoding process.
 This creates a set of logical changes from the binary changes recorded in the WAL.

PostgreSQL, like relational databases in general, supports two types of replication. With the first type, 
physical replication, you copy the exact binary data being written to disk on the primary server to the replica.
 Physical replication is accomplished with streaming replication such as with Amazon RDS read 
With the second type, logical replication, you create a logical copy by converting source database changes into a stream of updates 
that can be applied on the target database. Logical replication works with different major versions of PostgreSQL. 
You can use logical replication to perform an upgrade to a new version of PostgreSQL
 or replicate from an Amazon RDS PostgreSQL instance to an Amazon Aurora instance.

Publish-and-subscribe model

Logical replication uses a publish-and-subscribe model, in which subscribers pull data from the publications they subscribe to. 
Logical replication starts by copying a snapshot of the initial existing data on the publisher database.

When complete, the changes on the publisher (INSERT, UPDATE, and DELETE) are relayed to the subscriber in near real time.
 The subscriber applies data in the same order as they applied commits on the publisher.
 Following the same order helps ensure transactional 

Select each + to view additional information about the publish-and-subscribe model.
Publication
A publication, specified on a publisher node, is a set of changes generated from one or more database tables.

Subscription
A subscription, specified on a subscriber node, helps you define the connection to another database and one or more publications to which it subscribes.


Logical replication is built on the idea of a publish-and-subscribe model. You have a publisher, which is the source database. 
You have subscribers, which are the target databases. To configure logical replication, you create a publication on the publisher 
and create subscriptions on the subscribers. Then you tie them together and configure the tables that are part of that subscription.
Only the tables that you configure will be replicated from the publisher to the subscribers, and 
you can even choose to replicate only certain statements. For example,
you might want to replicate INSERT and UPDATE statements but not replicate DELETE statements. 
You can make all those replication decisions with the publication settings.

Next, review the two steps to configure replication on the publisher and subscriber database. To begin, select Start.

Creating publication and subscription
Follow these steps to configure replication on the publisher and subscriber databases.

Step 1: Create a publication
Define which tables you want to replicate by creating a publication on the source database.

Publications are created for:
All tables (FOR ALL TABLES)
Named tables (FOR TABLE x, y, z)
Publications are limited to certain statements (WITH (publish = ‘insert, update’))
To modify a publication, use ALTER PUBLICATION
Command example: CREATE PUBLICATION transpub FOR TABLE pgbench_accounts, pgbench_tellers, pgbench_branches;


Step 2: Create a subscription
Create a subscription on the target database to copy all existing data and begin the change replication process. 

Subscriptions are not configured to perform the initial data copy (WITH (copy_data=false));
If a publication uses ALTER on the source database, 
each subscription on the target database must be refreshed: ALTER SUBSCRIPTION transsub REFRESH PUBLICATION;
Command example: CREATE SUBSCRIPTION transsub CONNECTION 'dbname=gnb port=5432' PUBLICATION transpub;

Publish-and-subscribe process 

Explore the following graphic for more information about the publish-and-subscribe process. 
To navigate, select each numerical icon (in order).






Logical and physical replication differences

Logical replication differs from physical replication in several ways:

  • Users can replicate changes from an earlier version of PostgreSQL to a later version.
  • Users can choose a subset of tables to replicate.
  • Data can flow in multiple directions.
  • DDL is not replicated.
  • Logical replication requires manual configuration.




Logical replication has more flexibility because it applies logical SQL statements that represent the changes, as opposed to the physical binary changes that are specific to a major version. You can also choose a subset of tables instead of replicating the entire database. However, logical replication does not replicate DDL, so any schema changes must be manually applied to read replicas configured with logical replication. This is different from the PostgreSQL streaming replication in which DDL and schema changes are replicated. More manual configuration is required to configure a logical replication. 

Enabling logical replication for Amazon RDS

Logical replication works for replicating changes from PostgreSQL instances running on premises or self managed on Amazon Elastic Compute Cloud (Amazon EC2). It also works for upgrades in PostgreSQL versions.

Additional CPU overhead can affect performance. This additional overhead comes from converting changes made in the source database to a set of logical SQL commands run on a replica.

To review how to enable logical replication in Amazon RDS, select Start.

Amazon RDS: Enabling logical replication

Select Start to see how to enable logical replication for an Amazon RDS for PostgreSQL DB instance as a publisher.


Step 1: Required roles

The AWS user account requires the rds_superuser role to perform logical replication for the database on Amazon RDS. 

The user account also requires the rds_replication role to grant permissions to manage logical slots and stream data using logical slots.

Step 2: Set parameters

Set the rds.logical_replication static parameter to 1.

Step 3: Set connection security

Modify the inbound rules of the security group for the publisher instance (production) to allow the subscriber instance (replica) to connect. 

This is usually done by including the Internet Protocol (IP) address of the subscriber in the security group.


Step 4: Reboot the DB instance

Reboot the DB instance for the changes to the static parameter rds.logical_replication to take effect.



                                                                                                                                                                                   Logical replication with Aurora 

Aurora supports logical replication to publish changes to other PostgreSQL databases or replication clients starting with version 10.6 and later. To use logical replication with Aurora:

Confirm that your AWS user account has the rds_superuser role.
Set the rds.logical_replication parameter for a cluster parameter group, and restart the primary (writer) instance.
Modify the inbound rules of the security group for the publisher to allow the subscriber to connect. Include the IP address of the subscriber in the security group.

For more information about configuring logical replication, see the Aurora documentation.

Configure subscriber

Set the following parameters on the subscriber to these minimum values or higher.

ParameterMinimumRecommended
max_replication_slots
24
max_logical_replication_workers
24
max_worker_processes
212




Monitoring logical replication

PostgreSQL provides system views to monitor the status of replication (pg_stat_replication) and replication slots (pg_replication_slots) on the publisher. 

To monitor logical replication on the subscriber, use the pg_stat_subscription view.


Managing Database Connections



Managing connections

The creation of new database connections in PostgreSQL is expensive and uses valuable resources. 
PostgreSQL has built-in functionality for the reuse of connections.

PostgreSQL creates a new child process for every database session. These are user-backend processes. To avoid overloading your database,
 manage these connections for application requests and processes effectively.

Managing database connections is important because creating connections is expensive. 
If your application creates and closes connections continuously, you will have poor performance and response time for your users. To improve the performance of your application, make sure the application is using a connection pool and persistent connections across application requests. Be careful to have only as many connections as you really need to support your application. Each connection uses resources. If you open too many connections on the database, and keep them open in that connection pool, 
you could cause the database instance to restart if it runs out of memory resources.

Opening a connection

Every time you open a new database connection, resources must perform several key steps to successfully create a new connection. 
New connect requests and reconnecting dropped connections increase resource overhead. This resource overhead can impact your database performance.

The following image provides additional information about the connection process. To navigate, select the numerical icons (in order).




Connection pools

Creating a new connection takes time and resources, resulting in a slower response to user queries than reusing existing connections. 
To learn more about connection pools, scroll through the following slides.


xfy5Oe5JKhmSn242-pooling.png

Most application servers and programming language environments provide a client-side connection pool for the application to reuse a group of existing connections (a pool) rather than creating a new connection for each user request.


xFGEV4atLMmoSR-N-ConnectionPooling.png

For example, Java developers use open-source connection pool libraries such as Apache Commons DBCP, HikariCP, or C3PO. Ruby on Rails and .NET Npgsql developers use a connection pool built into their respective database drivers. 

Using these connection pools, each application server can access database resources and process user queries more quickly and efficiently.




OM9-6EmKKAcjjSlz-pooling3.png

As you scale your application, managing the total number of connections coming from multiple application servers becomes important.

Each application server might have a small pool of 10 database connections. However, as the application tier automatically scales from 10 to 100 to 1,000 application servers, the number of database connections grows exponentially with them. Pooling and reusing connections reduces the total database connections created.

Connection Pools 4

HvjW-2DZadwozmZy-RAM.svg

Each open connection in PostgreSQL, whether idle or active, consumes memory (about 10 MB), which reduces memory available for the database engine or file system caching.

You can control the number of incoming connections allowed to your database with the PostgreSQL parameter max_connections.


aHCdcI--TUHOMwri-ramvsconnex.svg

Ensure that the maximum connections configured on the database server and the working memory configured for each connection do not exceed available free memory. In a worst-case scenario, a database restart will occur. This restart happens if the database engine allows more connections to be opened than the available memory can accommodate.

For more information about connection management in PostgreSQL, see the PostgreSQL Wiki.


Database proxies

Add a database proxy to your database architecture to increase speed and reduce connection overhead.

Proxies provide a separate layer of database connection pools. This separate layer can reuse existing or duplicate connections. This proxy can also direct writes to the primary while offloading read requests to the read replicas.

Proxies are useful when you have many application servers or use a serverless application architecture to create a connection pool shared by all your applications.


Configuring and managing your connection pool using a client-side library, such as a Java, Python, or Ruby connection library, is appropriate when your application is small. However, as your application grows, or if you move toward a serverless application architecture, you should introduce a database proxy to manage that pool of connections in a shared layer in between your application and the database. Those connections can then be shared across multiple application servers and be reused as your application gets larger and the number of application servers continues to grow. Amazon RDS Proxy is a fully managed service that provides a connection pooling proxy layer with extensive functionality. In addition, there are two open-source PostgreSQL proxy servers—pgBouncer and pgPool 2—on Amazon EC2 that you can configure and manage on your own to support similar types of connection pooling functionality.


Amazon RDS Proxy

Amazon RDS Proxy is a highly manageable database proxy. It automatically connects to a new database instance while preserving application connections. When failovers occur, Amazon RDS Proxy routes requests directly to the new database instance. This reduces failover duration for Aurora and Amazon RDS databases by up to 66 percent.

Amazon RDS Proxy helps you manage elements such as database credentials, authentication, and access through integration with AWS Secrets Manager and AWS Identity and Access Management (IAM).

pgBouncer.png

PostgreSQL proxy servers 

PgBouncer and Pgpool-II are open-source PostgreSQL proxy servers that you can use to manage connections and traffic for your database on Amazon EC2.

One way to help increase efficiency with a database proxy is to use it to direct write queries to the primary DB instance and read-only queries to one or more read replicas.


PgBouncer

PgBouncer is an open-source connection pooling utility used in PostgreSQL to maintain a cached pool of connections. Applications interact with PostgreSQL through these connections.

A few features of PgBouncer include:

  • Provides a lightweight connection pooler
  • Queues requests
  • Helps improve availability

This graphic shows how PgBouncer works to pool connections.

pgbouncer.png

Pgpool-II

Pgpool-II is middleware that works between PostgreSQL servers and a PostgreSQL database client. For more information, see the Pgpool-II documentation.

pgpool.png

This diagram illustrates how Pgpool-II works as a proxy to direct application traffic.








No comments:

Post a Comment