Monday, 11 May 2026

PostgreSQL Interview question And Answer 2026

 Question: what is table partition in PostgreSQL


In PostgreSQL, table partitioning is a data management technique that splits a logically large table into smaller physical pieces called partitions. This improves query performance through "partition pruning"—where the database scans only relevant segments—and simplifies maintenance by allowing you to drop entire partitions instead of running expensive DELETE operations


Types of Partitioning
Modern PostgreSQL (v10+) primarily uses Declarative Partitioning, which supports three main strategies
  • Range Partitioning: Divides data into ranges defined by a key column (e.g., partitioning by date or numeric ID). Bounds are typically inclusive at the lower end and exclusive at the upper end.
  • List Partitioning: Groups data by explicitly listing which key values appear in each partition (e.g., partitioning by country code or status).
  • Hash Partitioning: Distributes rows across a fixed number of partitions using a hash of the partition key, which is useful for balancing data evenly when no natural range or list exists
  • Basic Implementation Steps
    To set up a partitioned table, follow this standard workflow

    1) Create the Parent Table: Define the structure and specify the strategy using PARTITION BY.
    CREATE TABLE orders ( order_id int, order_date date not null ) PARTITION BY RANGE (order_date);

    2) Define Child Partitions: Create individual tables that "belong" to the parent for specific value ranges.

    CREATE TABLE orders_2024 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

    3) Insert Data: You insert data directly into the parent table (orders), and PostgreSQL automatically routes the rows to the correct child partition based on the order_date.

    Key Benefits & Best Practices
    • Performance: Queries with filters on the partition key can skip unrelated partitions, drastically reducing I/O.
    • Data Retention: Instead of DELETE FROM table WHERE date < '2023-01-01', you can simply DROP TABLE old_partition, which is near-instant and avoids table bloat.
    • Management Tools: For automated partition creation and maintenance, the pg_partman extension is the industry-standard tool.
    • Rule of Thumb: Only partition when a table exceeds the physical memory (RAM) of the database server or grows beyond 10–20 GB
    For More Details

    https://www.postgresql.org/docs/current/ddl-partitioning.html
    https://docs.aws.amazon.com/dms/latest/oracle-to-aurora-postgresql-migration-playbook/chap-oracle-aurora-pg.storage.partition.html

    Question: what is type of index in PostgreSQL

    PostgreSQL provides several built-in index types, each designed for specific data structures and query patterns. By default, PostgreSQL uses the B-tree index

    Core Index Types
    • B-Tree: The most common and default index type. It is optimized for data that can be sorted and handles equality and range queries (\(<, \leq, =, \geq, >\)) efficiently.
    • Hash: Best for simple equality comparisons (\(=\)). They are generally not as flexible as B-trees but can be faster for exact matches on high-cardinality columns.
    • GiST (Generalized Search Tree): An infrastructure for many different indexing strategies. It is commonly used for geometric data types, full-text search, and "nearest-neighbor" queries.
    • SP-GiST (Space-Partitioned GiST): Optimized for non-balanced data structures (like quadtrees or radix trees) and data with natural clustering, such as IP addresses or spatial data.
    • GIN (Generalized Inverted Index): Designed for composite values where a single row may contain multiple keys. It is the go-to choice for JSONB, arrays, and full-text search.
    • BRIN (Block Range Index): Stores summaries (min/max values) for blocks of data. It is extremely space-efficient for very large tables where data is naturally ordered (e.g., a created_at timestamp)

    Specialized Indexing Techniques
    Beyond the basic types, you can apply these techniques to any compatible index method:
    • Expression Indexes: Built on the result of a function or expression rather than just a column value (e.g., lower(email)).
    • Partial Indexes: Covers only a subset of a table's rows based on a WHERE clause, saving space and improving speed for specific filtered queries.
    • Multicolumn Indexes: Defined on multiple columns. While B-tree and GiST support this, the effectiveness depends on the order of columns in the query.
    • Unique Indexes: Enforces that no two rows have the same value. Currently, only B-tree indexes can be declared as unique in PostgreSQL

    For More Details

    https://www.postgresql.org/docs/current/indexes-types.html
    https://www.postgresql.org/docs/current/indexes.html


    Question : step by step migration from oracle to postgreSQL


    Migrating from Oracle to PostgreSQL involves a structured lifecycle that spans assessment, schema conversion, and data transfer. Most successful migrations utilize specialized tools like Ora2Pg or cloud-native services to handle the significant syntactic and functional differences between the two systems

    1. Assessment and Planning
    Inventory your source environment to identify the scale and complexity of the migration.

  • Inventory Objects: List all schemas, tables, indexes, and PL/SQL packages.
  • Analyze Dependencies: Identify external application dependencies and third-party integrations.
  • Compatibility Check: Review data types and PL/SQL code that lack direct PostgreSQL counterparts.
  • Strategy Selection: Decide between a one-time bulk load (for smaller datasets) or incremental replication (CDC) to minimize downtime

  • 2. Schema Conversion
    This step translates Oracle-specific structures into PostgreSQL-compatible syntax. [1, 2]
    • Use Automated Tools: Tools like Ora2Pg or the AWS Schema Conversion Tool (SCT) can automate the conversion of tables, views, and sequences.
    • Manual Refactoring: Hand-convert complex PL/SQL objects—such as triggers, stored procedures, and packages—into PL/pgSQL.
    • Formatting Rules: Note that Oracle defaults to uppercase object names, while PostgreSQL defaults to lowercase; apply transformation rules to ensure consistency
    3. Data Migration
    Move the actual data rows from the source to the target destination. [1]
    • Disable Constraints: Temporarily drop foreign keys and secondary indexes on the target PostgreSQL instance to speed up data loading.
    • Initial Load: Use tools like COPY (via Ora2Pg) or AWS DMS to perform the initial bulk data transfer.
    • Sync Deltas: For large databases, use Change Data Capture (CDC) to keep the target in sync with the live Oracle source until cutover

    4. Application Refactoring and Testing
    Modify the application layer to interact with the new database. [1]
    • Query Updates: Update SQL queries to account for syntax differences, such as removing the DUAL table or replacing ROWNUM with LIMIT.
    • Functional Testing: Perform side-by-side validation to ensure schema integrity and identical query results.
    • Performance Tuning: Adjust PostgreSQL parameters like shared_buffers and rebuild indexes to match Oracle performance benchmarks

    5. Cutover and Post-Migration
    The final transition to the live PostgreSQL environment. [1, 2]
    • Final Sync: Stop write operations on the Oracle source and migrate the last delta of changed data.
    • Switch Traffic: Update application connection strings to point to the PostgreSQL instance.
    • Monitoring: Monitor system performance and logs closely during the initial hours of production.
    • Cleanup: Re-enable triggers and recreate any dropped constraints or indexes

    For More Details

    https://docs.aws.amazon.com/dms/latest/sbs/chap-rdsoracle2postgresql.steps.html


    Question :How to perform major upgrade in postgresql

    To perform a major upgrade in PostgreSQL, you must either migrate your data using a dump/restore process or use the pg_upgrade tool for an in-place migration. Major versions are not backward-compatible at the data directory level, so a simple binary swap is not sufficient

    1. Using pg_upgrade (Recommended for Speed)
    This is the fastest method, especially for large databases, as it can use hard links to avoid copying data files
  • Install New Version: Install the new PostgreSQL binaries alongside the old version.
  • Initialize New Cluster: Use initdb to create a new, empty data directory for the target version.
  • Stop Both Servers: Ensure neither the old nor the new PostgreSQL service is running.
  • Run Pre-Check: Execute pg_upgrade --check to identify potential incompatibilities, such as missing extensions or incompatible data types.
  • Perform Upgrade: Run the pg_upgrade command, specifying the old and new binary and data directories.
    • Pro Tip: Use the --link flag to complete the process in minutes by creating hard links rather than copying files.
  • Post-Upgrade Tasks: Start the new server and run the generated analyze_new_cluster.sh script to rebuild database statistics for optimal performance

  • 2. Logical Migration (Dump and Restore)
    This method is more flexible (e.g., when moving to a different OS) but takes longer as it exports and imports all data. [1, 2]
    1. Backup: Create a full backup using pg_dumpall (ideally using the pg_dumpall version from the new PostgreSQL release).
    2. Shutdown: Stop the old PostgreSQL service.
    3. Restore: Initialize the new cluster, start it, and import your data using psql -f outputfile
    3. Managed Services (AWS RDS / Azure / Google Cloud) [1]
    If you are using a cloud-managed service, the process is simplified through built-in automation:
    • AWS RDS: Use the "Modify" button in the console to select a new engine version. RDS will handle the pg_upgrade process and extension updates automatically.
    • Azure Database for PostgreSQL: Select "Upgrade" from the server overview. It performs an in-place upgrade using pg_upgrade and automatically takes a snapshot for safety.
    • Google Cloud SQL: Supports in-place upgrades but requires checking extension compatibility (e.g., upgrading pg_squeeze first)

    E;ssential Safety Checks
    • Backups: Always take a manual snapshot or full backup immediately before starting.
    • Extensions: Update extensions using ALTER EXTENSION ... UPDATE after the major upgrade is complete.
    • Disk Space: Ensure you have at least 10–20% free storage available for logs and temporary metadata
    For More Details

    https://www.postgresql.org/docs/current/pgupgrade.html
    https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.PostgreSQL.html
    https://www.postgresql.org/docs/current/upgrading.html


    Question : Matrics configuration in monitoring tool in postgresql


    Configuring metrics for PostgreSQL monitoring typically involves enabling internal extensions, adjusting server parameters, and integrating with external collectors

    1. Essential In-Database Extensions
    To track detailed query and resource performance, you must enable specific PostgreSQL extensions: [1]
    • pg_stat_statements: Essential for tracking execution statistics of all SQL statements (e.g., query runtime, frequency, and resource usage).
    • pg_buffercache: Used to examine the shared buffer cache in real-time, helping you monitor cache hit ratios.
    • auto_explain: Useful for logging execution plans of slow queries automatically

    2. Key Server Parameters for Monitoring
    Adjust these in your postgresql.conf file to ensure the database exposes the necessary data:
    • shared_preload_libraries: Add 'pg_stat_statements' to this list and restart the server.
    • track_activities: Set to on to monitor current commands being executed by any server process.
    • track_counts: Set to on to collect statistics on database activity.
    • log_duration: Set to on to record the duration of every completed statement.
    • track_io_timing: Enables monitoring of block I/O read and write times

    3. Core Metrics to Monitor
    Focus on these four categories for a comprehensive health check:
    • Throughput & Performance: Query latency, buffer cache hit ratio (aim for >95%), and transaction commits vs. rollbacks.
    • Resource Utilization: CPU usage, memory (cached vs. dirty), disk I/O, and storage capacity.
    • Connections: Total active, idle, and idle-in-transaction connections to prevent resource leaks.
    • Reliability: Replication lag, WAL (Write Ahead Log) generation rates, and vacuum/analyze statistics


    4. Common Monitoring Tools & Integration
    Tool Type [1, 2, 3, 4, 5, 6, 7]ExamplesConfiguration Approach
    Open SourcePrometheus + postgres_exporterUses a dedicated exporter to scrape metrics from pg_stat views and expose them to Prometheus.
    Enterprise SaaSDatadog, New Relic, SematextRequires installing an agent on the host that connects to Postgres via a dedicated monitoring user.
    Cloud ManagedAWS RDS Insights, Azure MetricsPre-configured metrics provided by the cloud vendor; accessible via their native consoles.
    Specializedpgwatch2, pgBadgerpgwatch2 is a self-contained monitoring tool; pgBadger is a log analyzer for detailed performance reports.



    For more Details

    https://www.postgresql.org/docs/current/monitoring.html

    Question : User permission in postgresql


    PostgreSQL manages permissions through a unified system of roles, which act as both users and groups. By default, a newly created user has no access to data and must be explicitly granted privileges for every database object they need to interact with


    1. The Core Concept: Roles
    In PostgreSQL, there is no technical difference between a "user" and a "group"—both are roles. [1]
    • Users: Roles with the LOGIN attribute.
    • Groups: Roles without the LOGIN attribute, used to group permissions together.
    • Superusers: Roles with the SUPERUSER attribute bypass all permission checks


    2. Managing Privileges (GRANT & REVOKE)
    Privileges are assigned to roles to control what they can do with specific objects like databases, schemas, and tables. [1]
    • Common Table Privileges: SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, and TRIGGER.
    • Schema Privileges: USAGE (to access objects within) and CREATE (to create new objects within).
    • Database Privileges: CONNECT, CREATE, and TEMPORARY. 
    Common SQL Examples:

    Common SQL Examples:
    sql
    -- Grant read-only access to a table
    GRANT SELECT ON table_name TO user_name;
    
    -- Grant full access to all tables in a schema
    GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO user_name;
    
    -- Revoke a privilege
    REVO

    KE
    UPDATE ON table_name FROM user_name;


    3. Role Inheritance
    Instead of granting permissions to every individual user, it is best practice to grant permissions to a "group" role and then make users members of that group.
    Users will automatically inherit the permissions of the group
    GRANT group_role TO user_name;

    4. Default Privileges
    By default, permissions only apply to objects that already exist. To ensure a user gets permissions on future objects created in a schema, you must use ALTER DEFAULT PRIVILEGES

    ALTER DEFAULT PRIVILEGES IN SCHEMA public 
    GRANT SELECT ON TABLES TO user_name;

    5. Essential Commands for Inspection [1]
    You can use psql meta-commands to check current permissions: [1]
    • \du: List all roles and their global attributes.
    • \z or \dp: List access privileges for tables, views, and sequences
    For More Details https://www.postgresql.org/docs/current/ddl-priv.html https://www.postgresql.org/docs/8.1/user-manag.html https://www.postgresql.org/docs/current/sql-grant.html https://aws.amazon.com/blogs/database/managing-postgresql-users-and-roles/

    Question: What is use of xcat and multiplexact
    Based on your query, there are two distinct topics here: xCAT (an infrastructure management tool) and MultiXact (a PostgreSQL internal mechanism).

    1. xCAT (Extreme Cluster Administration Toolkit)
    xCAT is not part of PostgreSQL itself, but rather an open-source tool used for managing high-performance computing (HPC) clusters. [1, 2, 3, 4]
    • Use in PostgreSQL: xCAT can use a PostgreSQL database as its backend to store cluster configuration data (nodes, networks, images) instead of the default SQLite.
    • Advantage: Using PostgreSQL enables advanced xCAT features like Service Nodes, allowing the management node to handle larger clusters more efficiently through centralized, scalable storage.
    • Setup: The pgsqlsetup command is used to configure this database

    2. MultiXact (Multi-Transaction Identifier)
    MultiXact is a crucial component of PostgreSQL's concurrency control system, specifically for managing row-level locks. [1, 2]
    • Use Case: When multiple transactions need to lock the same row simultaneously (e.g., several users trying to update the same record at the same time, or heavy foreign key usage), a single xmin or xmax field in the row header cannot hold all transaction IDs.
    • How it works: PostgreSQL replaces the individual transaction ID in the row's xmax field with a MultiXact ID. This ID points to a separate, immutable data structure (stored in pg_multixact) that tracks all transaction IDs that are currently holding a lock on that row.

  • Key Functions:
    • Row-Level Sharing: It allows SELECT FOR SHARE to be used by multiple concurrent users.
    • Foreign Key Consistency: It enables high-volume inserts on child tables while maintaining foreign key locks on a parent table.
  • Performance Impact: High MultiXact activity can create performance issues (contention) if many transactions are waiting on the same MultiXact lock

  • Summary Table
    Feature [1, 2, 3]What is it?Primary Use Case in Postgres
    xCATCluster Management SoftwareUsing Postgres as the backend DB for managing large HPC clusters.
    MultiXactPostgreSQL Internal MechanismManaging multiple concurrent locks on a single row (e.g., hot rows, foreign keys).

    Question :What is impact while performing vacuum full in postgresql

    VACUUM FULL in PostgreSQL provides maximum space reclamation by rewriting tables to disk and returning free space to the OS, but it acquires an ACCESS EXCLUSIVE lock, which blocks all concurrent reads (SELECT) and writes. It is slow, highly I/O intensive, and requires extra disk space (up to \(2\times\) the table size) to rebuild table indexes

    Key Impacts of VACUUM FULL:
    • Total Table Lock: The table is inaccessible for all applications for the duration of the command.
    • High Resource Usage: It causes heavy disk I/O, which can cripple performance for the rest of the database.
    • Table Size Reduction: Physically removes dead tuples and defragments data, shrinking the table to its smallest possible size.
    • Index Rebuilding: All indexes on the table are rebuilt, which is time-consuming.
    When to Use:
    Only for significant bloat, such as after deleting, updating, or inserting a massive amount of data in a maintenance window. [1]
    Alternatives for Production:
    To avoid downtime, use pg_repack to reclaim space without locking the table
    For More Details

    https://www.postgresql.org/docs/current/sql-vacuum.html
    https://www.postgresql.org/docs/current/routine-vacuuming.html
    https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.Autovacuum_Monitoring.Resolving_Performance.html
    Question: How does PITR work in RDS in postgresql

    Point-in-Time Recovery (PITR) in Amazon RDS for PostgreSQL works by combining automated daily snapshots with continuous transaction log (WAL) backups to restore your database to any specific second within your retention period

    How the Process Works
    1. Snapshot Baseline: RDS identifies the most appropriate daily automated snapshot taken before your requested recovery time.
    2. Instance Provisioning: A new DB instance is created using this snapshot. It does not overwrite your existing database.
    3. WAL Replay: RDS automatically retrieves the Write-Ahead Logs (WAL) stored in S3 and replays them sequentially onto the new instance until it reaches the exact second you specified.
    Key Technical Details
    • Log Frequency: Transaction logs are uploaded to Amazon S3 every 5 minutes.
    • Latest Restorable Time: Because of this upload frequency, the "Latest Restorable Time" is typically within the last 5 minutes of current time.
    • Retention Period: You can configure this window for up to 35 days.
    • Performance Impact: Replaying transaction logs can take time depending on the volume of changes between the daily snapshot and your recovery point
    Prerequisites for PITR
    To use PITR, you must have Automated Backups enabled (set to a retention period greater than 0 days). You can initiate a restore via the Amazon RDS Management Console or the AWS CLI


    For More Details

    https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html
    https://docs.aws.amazon.com/aws-backup/latest/devguide/point-in-time-recovery.html


    How do you perform Performance Tuning in PostgreSQL

    PostgreSQL performance tuning is a multi-layered process that involves optimizing database configuration, query design, and system maintenance. It typically follows these core pillar

    1. Database Configuration Tuning
    You can adjust system-level parameters in postgresql.conf to better utilize your hardware

  • shared_buffers: Set this to 25%–40% of total system RAM. It determines how much memory is dedicated to caching frequently accessed data.
  • work_mem: Allocated per sort or hash operation. For OLTP workloads, start with 16MB–64MB; for analytical workloads, increase this to 128MB+ to prevent slow disk-based sorting.
  • effective_cache_size: Set this to 50%–75% of total RAM. It doesn't allocate memory but helps the query planner estimate how much data is likely cached by the OS.
  • random_page_cost: If using SSDs, lower this from the default 4.0 to 1.1 to reflect the faster random access speed compared to HDDs.
  • checkpoint_completion_target: Increase this to 0.9 to spread disk writes over a longer period and avoid "write storms"

  • 2. Query and Index Optimization
    Slow queries are often the primary cause of performance issues. [1]
    • EXPLAIN ANALYZE: Prepend this to any query to see its execution plan, identify bottlenecks (like sequential scans), and view actual run times.
    • Index Strategy:
      • Create B-tree indexes for frequently joined or filtered columns.
      • Use specialized indexes like GIN for full-text search and JSONB, or GiST for spatial data.
      • Drop unused or duplicate indexes to reduce write overhead.
    • Query RefactoringReplace subqueries with JOIN operations and avoid SELECT * to reduce unnecessary data retrieval

    3. Maintenance and Monitoring
    • VACUUM and ANALYZE: Regularly run these (or rely on Autovacuum) to reclaim storage space from "dead tuples" and update table statistics used by the query planner.
    • pg_stat_statements: Enable this module to track execution statistics for all SQL statements executed on the server, allowing you to find the "top 10" most resource-intensive queries.
    • Connection Pooling: Use tools like PgBouncer if your application requires hundreds of concurrent connections, as native PostgreSQL connections have high overhead. [1, 2, 3]
    For a quick starting point based on your specific hardware, tools like PGTune can generate an optimized configuration file automatically.

    For more details
    https://www.postgresql.org/docs/current/performance-tips.html

    Question : If a table is dropped in PostgreSQL RDS and affecting production, how will you handle it?

    If a table is dropped in a production PostgreSQL RDS instance, the primary recovery method is to use Point-in-Time Recovery (PITR) to restore the data to a new temporary instance and then move the missing table back to production

    Immediate Action Steps
    1. Identify the Drop Time: Determine the exact timestamp just before the DROP TABLE command was executed.
    2. Perform Point-in-Time Recovery (PITR):
      • Navigate to the Amazon RDS Console and select your original instance.
      • Under Actions, choose Restore to Point in Time.
      • Select a restorable time slightly before the table was dropped.
      • This process creates a new, separate DB instance containing the dropped table.
    3. Extract the Dropped Table:
      • Connect to the newly restored instance using standard tools like pg_dump.
      • Run a command to export only the specific table:
        pg_dump -h [restored-instance-endpoint] -t [table_name] [database_name] > table_restore.sql.
    4. Restore to Production:
      • Import the .sql file back into your original production instance:
        psql -h [production-instance-endpoint] [database_name] < table_restore.sql.
    5. Verify and Cleanup: Ensure all indexes, constraints, and permissions for the restored table are correct. Once verified, delete the temporary restored RDS instance to avoid unnecessary costs

    Alternative Recovery Methods
    • Delayed Read Replica: If you have a delayed read replica configured, you can pause replication before the DROP statement reaches the replica and extract the table from there.
    • AWS Backup: If you use AWS Backup, you can restore from a specific recovery point directly through the AWS Backup console




     

    No comments:

    Post a Comment