/ Shayon Mukherjee / blog

Stop Relying on IF NOT EXISTS for Concurrent Index Creation in PostgreSQL

August 12, 2024
~3 mins

As a developer, you might have encountered situations where creating an index in PostgreSQL fails due to lock timeouts. In such scenarios, it’s tempting to use the IF NOT EXISTS as a quick fix and move on. However, this approach can lead to subtle and hard-to-debug issues in production environments.

Let’s understand how PostgreSQL handles concurrent index creation

When we initiate CREATE INDEX CONCURRENTLY, PostgreSQL first creates an entry for the index in the system catalogs (specifically in pg_index) and marks it as invalid. This happens before the actual index building process begins. Now, if a lock timeout occurs during the index building process (say a vacuum is running), the operation is aborted. However, the index entry in pg_index may already exist. PostgreSQL doesn’t automatically remove the partially created index when the operation is aborted due to a lock timeout. This is intentional, as it allows for potential recovery or manual intervention. Now, When you try to create the index again, PostgreSQL sees the existing entry in pg_index and throws an error saying the index already exists, even though it’s in an invalid state.

Enter IF NOT EXISTS

In such situations, it’s very tempting to wrap your index creations in IF NOT EXISTS. However, this is where the trouble begins. When you use IF NOT EXISTS and re-run your index creation, the task can silently complete while leaving behind an invalid index. This occurs because PostgreSQL quietly skips the index creation if an index already exists, even if it’s marked as invalid.

🚨 To make matters worse - if you’re not paying attention and use something like Ruby on Rails or a similar setup that runs schema migrations, those migrations can be silently marked as completed, leaving behind invalid indexes. You might only discover these later when query performance degrades from bad to worse, because PostgreSQL won’t use an invalid index.

Quick demo

To reproduce this issue, you can use the following SQL script:

-- Create a test table
CREATE TABLE test_table (
    id serial PRIMARY KEY,
    data text
);

-- Insert some sample data
INSERT INTO test_table (data)
SELECT 'Data ' || generate_series(1, 1000000);

-- Function to simulate long-running query
CREATE OR REPLACE FUNCTION simulate_long_query() RETURNS void AS $$
BEGIN
    PERFORM pg_sleep(30);  -- Sleep for 30 seconds
END;
$$ LANGUAGE plpgsql;

-- Set a short lock timeout
SET lock_timeout = '5s';

-- In session 1:
BEGIN;
SELECT simulate_long_query();

-- In session 2 (run this while session 1 is still executing):
CREATE INDEX CONCURRENTLY idx_test_data ON test_table (data);

-- This will likely time out due to the lock held by session 1

-- After the timeout in session 2, try to create the index again with IF NOT EXISTS:
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_test_data ON test_table (data);

-- Check the state of the index
SELECT indexrelid::regclass, indisvalid
FROM pg_index
WHERE indisvalid = false

A Better Approach: Drop and Recreate

Instead of relying on IF NOT EXISTS, a safer approach is to drop any existing index (valid or invalid) and then recreate it. Here’s an example helper method for Ruby on Rails application but concept still applies:

  def safe_add_index_with_retry(table, columns, options = {})
    index_name = options[:name] || index_name(table, columns)
    add_index(table, columns, **options)
  rescue ActiveRecord::LockWaitTimeout
    begin
      execute("DROP INDEX CONCURRENTLY IF EXISTS #{index_name}")
    rescue ActiveRecord::StatementInvalid => e
      Rails.logger.error("Failed to remove index: #{e.message}")
      raise
    end

    retries = 0
    begin
      add_index(table, columns, **options)
    rescue ActiveRecord::LockWaitTimeout => e
      raise e unless retries < 5
      retries += 1
      sleep(2**retries)
      retry
    end
  end

This method:

  1. Attempts to create the index
  2. If it fails due to a lock timeout, it tries to remove any existing index
  3. Then it retries the index creation with exponential backoff

Happy coding 👋

last modified August 12, 2024