Trigram index for ILIKE patterns not working as expected

I've got a simple but slow query:

SELECT DISTINCT title  

FROM ja_jobs

WHERE title ILIKE '%RYAN WER%'

AND clientid = 31239

AND time_job > 1457826264

ORDER BY title

LIMIT 10;

Explain analyze:

Limit  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)

  ->  Unique  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)

        ->  Sort  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)

              Sort Key: "title"

              Sort Method: quicksort  Memory: 25kB

              ->  Bitmap Heap Scan on "ja_jobs"  (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)

                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))

                    Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")

                    Rows Removed by Filter: 791

                    ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)

                          Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))

Total runtime: 2746.879 ms

Then, I created a trigram index:

CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);

Explain analyze after adding the index: (Yes, I analyze)

Limit  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)

  ->  Unique  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)

        ->  Sort  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)

              Sort Key: "title"

              Sort Method: quicksort  Memory: 25kB

              ->  Bitmap Heap Scan on "ja_jobs"  (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)

                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))

                    Rows Removed by Index Recheck: 4

                    ->  BitmapAnd  (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)

                          ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)

                                Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))

                          ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin"  (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)

                                Index Cond: (("title")::"text" ~~ '%RYAN WER%'::"text")

Total runtime: 3720.653 ms

As you can see, the index did not work.

Table public.ja_jobs:

CREATE TABLE public.ja_jobs (

  id bigint NOT NULL DEFAULT "nextval"('"ja_jobs_id_seq"'::"regclass"),

  refnum character varying(100) NOT NULL DEFAULT ''::character varying,

  clientid bigint NOT NULL DEFAULT 0,

  customerid bigint,

  time_job bigint,

  priority smallint NOT NULL DEFAULT 0,

  status character varying(255) NOT NULL DEFAULT 'active'::"bpchar",

  title character varying(100) NOT NULL DEFAULT ''::character varying,



  -- some other irrelevant columns

)

Indexes on public.ja_jobs:

Indexes:

    "ja_jobs_pkey" PRIMARY KEY, "btree" ("id")

    "ix_bill_customer_jobs" "btree" ("customerid", "bill_customer")

    "ix_clientid_jobs" "btree" ("clientid")

    "ix_customerid_job" "btree" ("customerid")

    "ix_ja_jobs_clientid_modified_date_created_date" "btree" ("clientid", "modified_date", "created_date")

    "ix_ja_jobs_gsdi_pk" "btree" (("id"::"text"))

    "ix_ja_jobs_trgm_gin" "gin" ("title" "gin_trgm_ops")

    "ix_job_customer_recent_jobs_lookaside_bill_customer" "btree" ("bill_customer", "modified_date")

    "ix_job_customer_recent_jobs_lookaside_clientid" "btree" ("clientid", "modified_date")

    "ix_job_customer_recent_jobs_lookaside_customer" "btree" ("customerid", "modified_date")

    "ix_jobs_charges_and_parts_sort" "btree" (("charges_count" + "parts_count"))

    "ix_jobs_client_times" "btree" ("clientid", "time_job", "time_arrival")

    "ix_jobs_fts_description_en" "gin" ("full_text_universal_cast"("description"))

    "ix_jobs_fts_full_address_en" "gin" ((((("full_text_universal_cast"("address"::"text") || "full_text_universal_cast"("suburb"::"text")) || "full_text_universal_cast"("city"::"text")) || "full_text_universal_cast"("stpr"::"text")) || "full_text_universal_cast"("postc

ode"::"text")))

    "ix_jobs_fts_job_number_en" "gin" ("full_text_universal_cast"("job_number"::"text"))

    "ix_jobs_fts_refnum_en" "gin" ("full_text_universal_cast"("refnum"::"text"))

    "ix_jobs_fts_title_en" "gin" ("full_text_universal_cast"("title"::"text"))

    "ix_jobs_full_address_street_first" "btree" (((((COALESCE("address"::character varying, ''::character varying)::"text" || COALESCE(' '::"text" || "suburb"::"text", ''::"text")) || COALESCE(' '::"text" || "city"::"text", ''::"text")) || COALESCE(' '::"text" || "postc

ode"::"text", ''::"text")) || COALESCE(' '::"text" || "stpr"::"text", ''::"text")))

    "ix_jobs_paying_customers" "btree" ((COALESCE("bill_customer", "customerid")))

    "ix_jobs_status_label_ids" "btree" ("status_label_id")

    "ix_jobs_top_by_client" "btree" ("id", "clientid")

    "ix_mobiuser_jobs" "btree" ("accepted_mobile_user")

    "ix_recurrenceid_jobs" "btree" ("recurrenceid")

    "ix_timejob_jobs" "btree" ("time_job")

    "ja_jobs_client_type" "btree" ("clientid", "jobtype")

    "ja_jobs_the_geom_idx" "gist" ("the_geom")

Question:

What can I do to improve the query? Why is the trigram index not working as expected?

UPDATE: Re-ran the explain analyze buffer

Limit  (cost=199669.37..199669.39 rows=10 width=20) (actual time=31523.690..31523.691 rows=1 loops=1)

  Buffers: shared hit=26947 read=101574 dirtied=438

  ->  Sort  (cost=199669.37..199669.40 rows=11 width=20) (actual time=31523.686..31523.686 rows=1 loops=1)

        Sort Key: "title"

        Sort Method: quicksort  Memory: 25kB

        Buffers: shared hit=26947 read=101574 dirtied=438

        ->  Bitmap Heap Scan on "ja_jobs"  (cost=4850.60..199669.18 rows=11 width=20) (actual time=11714.504..31523.640 rows=1 loops=1)

              Recheck Cond: (("clientid" = 2565) AND ("time_job" > 1382496599))

              Filter: (("title")::"text" ~~* '%Hislop%'::"text")

              Rows Removed by Filter: 207654

              Buffers: shared hit=26942 read=101574 dirtied=438

              ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..4850.60 rows=155054 width=0) (actual time=11670.956..11670.956 rows=215142 loops=1)

                    Index Cond: (("clientid" = 2565) AND ("time_job" > 1382496599))

                    Buffers: shared hit=121 read=5772

Total runtime: 31524.874 ms

After removing DISTINCT and the left %:

explain (analyze, buffers)

SELECT title  

FROM ja_jobs

WHERE title ILIKE 'Hislop 13035%'

AND clientid = 2565

AND time_job > 1382496599

ORDER BY title

LIMIT 10;





Limit  (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.479..3492.483 rows=1 loops=1)

  Buffers: shared hit=4940 read=448

  I/O Timings: read=83.285

  ->  Sort  (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.475..3492.477 rows=1 loops=1)

        Sort Key: "title"

        Sort Method: quicksort  Memory: 25kB

        Buffers: shared hit=4940 read=448

        I/O Timings: read=83.285

        ->  Bitmap Heap Scan on "ja_jobs"  (cost=391.62..2275.38 rows=9 width=20) (actual time=3492.460..3492.462 rows=1 loops=1)

              Recheck Cond: (("title")::"text" ~~* 'Hislop Street Clinic 2513035%'::"text")

              Filter: (("time_job" > 1382496599) AND ("clientid" = 2565))

              Buffers: shared hit=4940 read=448

              I/O Timings: read=83.285

              ->  Bitmap Index Scan on "ix_jobs_trgm_gin"  (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)

                    Index Cond: (("title")::"text" ~~* 'Hislop 13035%'::"text")

                    Buffers: shared hit=4939 read=448

                    I/O Timings: read=83.285

Total runtime: 3492.531 ms

As you can see, the query is hitting the new index but it's slower.

Then I removed ORDER BY but the query is still slow.

Also, I tried to use LIKE (with is much faster), but LIKE is case-sensitive, so for that reason I got no rows back. Can't use it.

edited 12 mins ago

Erwin Brandstetter

91k9170283

asked May 12 '16 at 2:07

user83914

Not enough information. Consider instructions in the tag info for [postgresql-performance]. Whatever else you do, consider upgrading to the current version Postgres 9.5. Multiple improvements have been added that might affect your query performance. And DISTINCT is a performance killer for LIMIT queries. I doubt you need DISTINCT at all.

– Erwin Brandstetter
May 12 '16 at 2:19

1

Sorry @ErwinBrandstetter - Question updated. We'll be migrating to Postgres 9.5.. but it'll take a while

– user83914
May 12 '16 at 2:30

add a comment |

I've got a simple but slow query:

SELECT DISTINCT title  

FROM ja_jobs

WHERE title ILIKE '%RYAN WER%'

AND clientid = 31239

AND time_job > 1457826264

ORDER BY title

LIMIT 10;

Explain analyze:

Limit  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)

  ->  Unique  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)

        ->  Sort  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)

              Sort Key: "title"

              Sort Method: quicksort  Memory: 25kB

              ->  Bitmap Heap Scan on "ja_jobs"  (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)

                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))

                    Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")

                    Rows Removed by Filter: 791

                    ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)

                          Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))

Total runtime: 2746.879 ms

Then, I created a trigram index:

CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);

Explain analyze after adding the index: (Yes, I analyze)

Limit  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)

  ->  Unique  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)

        ->  Sort  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)

              Sort Key: "title"

              Sort Method: quicksort  Memory: 25kB

              ->  Bitmap Heap Scan on "ja_jobs"  (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)

                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))

                    Rows Removed by Index Recheck: 4

                    ->  BitmapAnd  (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)

                          ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)

                                Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))

                          ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin"  (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)

                                Index Cond: (("title")::"text" ~~ '%RYAN WER%'::"text")

Total runtime: 3720.653 ms

As you can see, the index did not work.

Table public.ja_jobs:

CREATE TABLE public.ja_jobs (

  id bigint NOT NULL DEFAULT "nextval"('"ja_jobs_id_seq"'::"regclass"),

  refnum character varying(100) NOT NULL DEFAULT ''::character varying,

  clientid bigint NOT NULL DEFAULT 0,

  customerid bigint,

  time_job bigint,

  priority smallint NOT NULL DEFAULT 0,

  status character varying(255) NOT NULL DEFAULT 'active'::"bpchar",

  title character varying(100) NOT NULL DEFAULT ''::character varying,



  -- some other irrelevant columns

)

Indexes on public.ja_jobs:

Indexes:

    "ja_jobs_pkey" PRIMARY KEY, "btree" ("id")

    "ix_bill_customer_jobs" "btree" ("customerid", "bill_customer")

    "ix_clientid_jobs" "btree" ("clientid")

    "ix_customerid_job" "btree" ("customerid")

    "ix_ja_jobs_clientid_modified_date_created_date" "btree" ("clientid", "modified_date", "created_date")

    "ix_ja_jobs_gsdi_pk" "btree" (("id"::"text"))

    "ix_ja_jobs_trgm_gin" "gin" ("title" "gin_trgm_ops")

    "ix_job_customer_recent_jobs_lookaside_bill_customer" "btree" ("bill_customer", "modified_date")

    "ix_job_customer_recent_jobs_lookaside_clientid" "btree" ("clientid", "modified_date")

    "ix_job_customer_recent_jobs_lookaside_customer" "btree" ("customerid", "modified_date")

    "ix_jobs_charges_and_parts_sort" "btree" (("charges_count" + "parts_count"))

    "ix_jobs_client_times" "btree" ("clientid", "time_job", "time_arrival")

    "ix_jobs_fts_description_en" "gin" ("full_text_universal_cast"("description"))

    "ix_jobs_fts_full_address_en" "gin" ((((("full_text_universal_cast"("address"::"text") || "full_text_universal_cast"("suburb"::"text")) || "full_text_universal_cast"("city"::"text")) || "full_text_universal_cast"("stpr"::"text")) || "full_text_universal_cast"("postc

ode"::"text")))

    "ix_jobs_fts_job_number_en" "gin" ("full_text_universal_cast"("job_number"::"text"))

    "ix_jobs_fts_refnum_en" "gin" ("full_text_universal_cast"("refnum"::"text"))

    "ix_jobs_fts_title_en" "gin" ("full_text_universal_cast"("title"::"text"))

    "ix_jobs_full_address_street_first" "btree" (((((COALESCE("address"::character varying, ''::character varying)::"text" || COALESCE(' '::"text" || "suburb"::"text", ''::"text")) || COALESCE(' '::"text" || "city"::"text", ''::"text")) || COALESCE(' '::"text" || "postc

ode"::"text", ''::"text")) || COALESCE(' '::"text" || "stpr"::"text", ''::"text")))

    "ix_jobs_paying_customers" "btree" ((COALESCE("bill_customer", "customerid")))

    "ix_jobs_status_label_ids" "btree" ("status_label_id")

    "ix_jobs_top_by_client" "btree" ("id", "clientid")

    "ix_mobiuser_jobs" "btree" ("accepted_mobile_user")

    "ix_recurrenceid_jobs" "btree" ("recurrenceid")

    "ix_timejob_jobs" "btree" ("time_job")

    "ja_jobs_client_type" "btree" ("clientid", "jobtype")

    "ja_jobs_the_geom_idx" "gist" ("the_geom")

Question:

What can I do to improve the query? Why is the trigram index not working as expected?

UPDATE: Re-ran the explain analyze buffer

Limit  (cost=199669.37..199669.39 rows=10 width=20) (actual time=31523.690..31523.691 rows=1 loops=1)

  Buffers: shared hit=26947 read=101574 dirtied=438

  ->  Sort  (cost=199669.37..199669.40 rows=11 width=20) (actual time=31523.686..31523.686 rows=1 loops=1)

        Sort Key: "title"

        Sort Method: quicksort  Memory: 25kB

        Buffers: shared hit=26947 read=101574 dirtied=438

        ->  Bitmap Heap Scan on "ja_jobs"  (cost=4850.60..199669.18 rows=11 width=20) (actual time=11714.504..31523.640 rows=1 loops=1)

              Recheck Cond: (("clientid" = 2565) AND ("time_job" > 1382496599))

              Filter: (("title")::"text" ~~* '%Hislop%'::"text")

              Rows Removed by Filter: 207654

              Buffers: shared hit=26942 read=101574 dirtied=438

              ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..4850.60 rows=155054 width=0) (actual time=11670.956..11670.956 rows=215142 loops=1)

                    Index Cond: (("clientid" = 2565) AND ("time_job" > 1382496599))

                    Buffers: shared hit=121 read=5772

Total runtime: 31524.874 ms

After removing DISTINCT and the left %:

explain (analyze, buffers)

SELECT title  

FROM ja_jobs

WHERE title ILIKE 'Hislop 13035%'

AND clientid = 2565

AND time_job > 1382496599

ORDER BY title

LIMIT 10;





Limit  (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.479..3492.483 rows=1 loops=1)

  Buffers: shared hit=4940 read=448

  I/O Timings: read=83.285

  ->  Sort  (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.475..3492.477 rows=1 loops=1)

        Sort Key: "title"

        Sort Method: quicksort  Memory: 25kB

        Buffers: shared hit=4940 read=448

        I/O Timings: read=83.285

        ->  Bitmap Heap Scan on "ja_jobs"  (cost=391.62..2275.38 rows=9 width=20) (actual time=3492.460..3492.462 rows=1 loops=1)

              Recheck Cond: (("title")::"text" ~~* 'Hislop Street Clinic 2513035%'::"text")

              Filter: (("time_job" > 1382496599) AND ("clientid" = 2565))

              Buffers: shared hit=4940 read=448

              I/O Timings: read=83.285

              ->  Bitmap Index Scan on "ix_jobs_trgm_gin"  (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)

                    Index Cond: (("title")::"text" ~~* 'Hislop 13035%'::"text")

                    Buffers: shared hit=4939 read=448

                    I/O Timings: read=83.285

Total runtime: 3492.531 ms

As you can see, the query is hitting the new index but it's slower.

Then I removed ORDER BY but the query is still slow.

Also, I tried to use LIKE (with is much faster), but LIKE is case-sensitive, so for that reason I got no rows back. Can't use it.

edited 12 mins ago

Erwin Brandstetter

91k9170283

asked May 12 '16 at 2:07

user83914

Not enough information. Consider instructions in the tag info for [postgresql-performance]. Whatever else you do, consider upgrading to the current version Postgres 9.5. Multiple improvements have been added that might affect your query performance. And DISTINCT is a performance killer for LIMIT queries. I doubt you need DISTINCT at all.

– Erwin Brandstetter
May 12 '16 at 2:19

1

Sorry @ErwinBrandstetter - Question updated. We'll be migrating to Postgres 9.5.. but it'll take a while

– user83914
May 12 '16 at 2:30

add a comment |

I've got a simple but slow query:

SELECT DISTINCT title  

FROM ja_jobs

WHERE title ILIKE '%RYAN WER%'

AND clientid = 31239

AND time_job > 1457826264

ORDER BY title

LIMIT 10;

Explain analyze:

Limit  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)

  ->  Unique  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)

        ->  Sort  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)

              Sort Key: "title"

              Sort Method: quicksort  Memory: 25kB

              ->  Bitmap Heap Scan on "ja_jobs"  (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)

                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))

                    Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")

                    Rows Removed by Filter: 791

                    ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)

                          Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))

Total runtime: 2746.879 ms

Then, I created a trigram index:

CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);

Explain analyze after adding the index: (Yes, I analyze)

Limit  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)

  ->  Unique  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)

        ->  Sort  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)

              Sort Key: "title"

              Sort Method: quicksort  Memory: 25kB

              ->  Bitmap Heap Scan on "ja_jobs"  (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)

                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))

                    Rows Removed by Index Recheck: 4

                    ->  BitmapAnd  (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)

                          ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)

                                Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))

                          ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin"  (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)

                                Index Cond: (("title")::"text" ~~ '%RYAN WER%'::"text")

Total runtime: 3720.653 ms

As you can see, the index did not work.

Table public.ja_jobs:

CREATE TABLE public.ja_jobs (

  id bigint NOT NULL DEFAULT "nextval"('"ja_jobs_id_seq"'::"regclass"),

  refnum character varying(100) NOT NULL DEFAULT ''::character varying,

  clientid bigint NOT NULL DEFAULT 0,

  customerid bigint,

  time_job bigint,

  priority smallint NOT NULL DEFAULT 0,

  status character varying(255) NOT NULL DEFAULT 'active'::"bpchar",

  title character varying(100) NOT NULL DEFAULT ''::character varying,



  -- some other irrelevant columns

)

Indexes on public.ja_jobs:

Indexes:

    "ja_jobs_pkey" PRIMARY KEY, "btree" ("id")

    "ix_bill_customer_jobs" "btree" ("customerid", "bill_customer")

    "ix_clientid_jobs" "btree" ("clientid")

    "ix_customerid_job" "btree" ("customerid")

    "ix_ja_jobs_clientid_modified_date_created_date" "btree" ("clientid", "modified_date", "created_date")

    "ix_ja_jobs_gsdi_pk" "btree" (("id"::"text"))

    "ix_ja_jobs_trgm_gin" "gin" ("title" "gin_trgm_ops")

    "ix_job_customer_recent_jobs_lookaside_bill_customer" "btree" ("bill_customer", "modified_date")

    "ix_job_customer_recent_jobs_lookaside_clientid" "btree" ("clientid", "modified_date")

    "ix_job_customer_recent_jobs_lookaside_customer" "btree" ("customerid", "modified_date")

    "ix_jobs_charges_and_parts_sort" "btree" (("charges_count" + "parts_count"))

    "ix_jobs_client_times" "btree" ("clientid", "time_job", "time_arrival")

    "ix_jobs_fts_description_en" "gin" ("full_text_universal_cast"("description"))

    "ix_jobs_fts_full_address_en" "gin" ((((("full_text_universal_cast"("address"::"text") || "full_text_universal_cast"("suburb"::"text")) || "full_text_universal_cast"("city"::"text")) || "full_text_universal_cast"("stpr"::"text")) || "full_text_universal_cast"("postc

ode"::"text")))

    "ix_jobs_fts_job_number_en" "gin" ("full_text_universal_cast"("job_number"::"text"))

    "ix_jobs_fts_refnum_en" "gin" ("full_text_universal_cast"("refnum"::"text"))

    "ix_jobs_fts_title_en" "gin" ("full_text_universal_cast"("title"::"text"))

    "ix_jobs_full_address_street_first" "btree" (((((COALESCE("address"::character varying, ''::character varying)::"text" || COALESCE(' '::"text" || "suburb"::"text", ''::"text")) || COALESCE(' '::"text" || "city"::"text", ''::"text")) || COALESCE(' '::"text" || "postc

ode"::"text", ''::"text")) || COALESCE(' '::"text" || "stpr"::"text", ''::"text")))

    "ix_jobs_paying_customers" "btree" ((COALESCE("bill_customer", "customerid")))

    "ix_jobs_status_label_ids" "btree" ("status_label_id")

    "ix_jobs_top_by_client" "btree" ("id", "clientid")

    "ix_mobiuser_jobs" "btree" ("accepted_mobile_user")

    "ix_recurrenceid_jobs" "btree" ("recurrenceid")

    "ix_timejob_jobs" "btree" ("time_job")

    "ja_jobs_client_type" "btree" ("clientid", "jobtype")

    "ja_jobs_the_geom_idx" "gist" ("the_geom")

Question:

What can I do to improve the query? Why is the trigram index not working as expected?

UPDATE: Re-ran the explain analyze buffer

Limit  (cost=199669.37..199669.39 rows=10 width=20) (actual time=31523.690..31523.691 rows=1 loops=1)

  Buffers: shared hit=26947 read=101574 dirtied=438

  ->  Sort  (cost=199669.37..199669.40 rows=11 width=20) (actual time=31523.686..31523.686 rows=1 loops=1)

        Sort Key: "title"

        Sort Method: quicksort  Memory: 25kB

        Buffers: shared hit=26947 read=101574 dirtied=438

        ->  Bitmap Heap Scan on "ja_jobs"  (cost=4850.60..199669.18 rows=11 width=20) (actual time=11714.504..31523.640 rows=1 loops=1)

              Recheck Cond: (("clientid" = 2565) AND ("time_job" > 1382496599))

              Filter: (("title")::"text" ~~* '%Hislop%'::"text")

              Rows Removed by Filter: 207654

              Buffers: shared hit=26942 read=101574 dirtied=438

              ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..4850.60 rows=155054 width=0) (actual time=11670.956..11670.956 rows=215142 loops=1)

                    Index Cond: (("clientid" = 2565) AND ("time_job" > 1382496599))

                    Buffers: shared hit=121 read=5772

Total runtime: 31524.874 ms

After removing DISTINCT and the left %:

explain (analyze, buffers)

SELECT title  

FROM ja_jobs

WHERE title ILIKE 'Hislop 13035%'

AND clientid = 2565

AND time_job > 1382496599

ORDER BY title

LIMIT 10;





Limit  (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.479..3492.483 rows=1 loops=1)

  Buffers: shared hit=4940 read=448

  I/O Timings: read=83.285

  ->  Sort  (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.475..3492.477 rows=1 loops=1)

        Sort Key: "title"

        Sort Method: quicksort  Memory: 25kB

        Buffers: shared hit=4940 read=448

        I/O Timings: read=83.285

        ->  Bitmap Heap Scan on "ja_jobs"  (cost=391.62..2275.38 rows=9 width=20) (actual time=3492.460..3492.462 rows=1 loops=1)

              Recheck Cond: (("title")::"text" ~~* 'Hislop Street Clinic 2513035%'::"text")

              Filter: (("time_job" > 1382496599) AND ("clientid" = 2565))

              Buffers: shared hit=4940 read=448

              I/O Timings: read=83.285

              ->  Bitmap Index Scan on "ix_jobs_trgm_gin"  (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)

                    Index Cond: (("title")::"text" ~~* 'Hislop 13035%'::"text")

                    Buffers: shared hit=4939 read=448

                    I/O Timings: read=83.285

Total runtime: 3492.531 ms

As you can see, the query is hitting the new index but it's slower.

Then I removed ORDER BY but the query is still slow.

Also, I tried to use LIKE (with is much faster), but LIKE is case-sensitive, so for that reason I got no rows back. Can't use it.

edited 12 mins ago

Erwin Brandstetter

91k9170283

asked May 12 '16 at 2:07

user83914

I've got a simple but slow query:

SELECT DISTINCT title  

FROM ja_jobs

WHERE title ILIKE '%RYAN WER%'

AND clientid = 31239

AND time_job > 1457826264

ORDER BY title

LIMIT 10;

Explain analyze:

Limit  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)

  ->  Unique  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)

        ->  Sort  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)

              Sort Key: "title"

              Sort Method: quicksort  Memory: 25kB

              ->  Bitmap Heap Scan on "ja_jobs"  (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)

                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))

                    Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")

                    Rows Removed by Filter: 791

                    ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)

                          Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))

Total runtime: 2746.879 ms

Then, I created a trigram index:

CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);

Explain analyze after adding the index: (Yes, I analyze)

Limit  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)

  ->  Unique  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)

        ->  Sort  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)

              Sort Key: "title"

              Sort Method: quicksort  Memory: 25kB

              ->  Bitmap Heap Scan on "ja_jobs"  (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)

                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))

                    Rows Removed by Index Recheck: 4

                    ->  BitmapAnd  (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)

                          ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)

                                Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))

                          ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin"  (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)

                                Index Cond: (("title")::"text" ~~ '%RYAN WER%'::"text")

Total runtime: 3720.653 ms

As you can see, the index did not work.

Table public.ja_jobs:

CREATE TABLE public.ja_jobs (

  id bigint NOT NULL DEFAULT "nextval"('"ja_jobs_id_seq"'::"regclass"),

  refnum character varying(100) NOT NULL DEFAULT ''::character varying,

  clientid bigint NOT NULL DEFAULT 0,

  customerid bigint,

  time_job bigint,

  priority smallint NOT NULL DEFAULT 0,

  status character varying(255) NOT NULL DEFAULT 'active'::"bpchar",

  title character varying(100) NOT NULL DEFAULT ''::character varying,



  -- some other irrelevant columns

)

Indexes on public.ja_jobs:

Indexes:

    "ja_jobs_pkey" PRIMARY KEY, "btree" ("id")

    "ix_bill_customer_jobs" "btree" ("customerid", "bill_customer")

    "ix_clientid_jobs" "btree" ("clientid")

    "ix_customerid_job" "btree" ("customerid")

    "ix_ja_jobs_clientid_modified_date_created_date" "btree" ("clientid", "modified_date", "created_date")

    "ix_ja_jobs_gsdi_pk" "btree" (("id"::"text"))

    "ix_ja_jobs_trgm_gin" "gin" ("title" "gin_trgm_ops")

    "ix_job_customer_recent_jobs_lookaside_bill_customer" "btree" ("bill_customer", "modified_date")

    "ix_job_customer_recent_jobs_lookaside_clientid" "btree" ("clientid", "modified_date")

    "ix_job_customer_recent_jobs_lookaside_customer" "btree" ("customerid", "modified_date")

    "ix_jobs_charges_and_parts_sort" "btree" (("charges_count" + "parts_count"))

    "ix_jobs_client_times" "btree" ("clientid", "time_job", "time_arrival")

    "ix_jobs_fts_description_en" "gin" ("full_text_universal_cast"("description"))

    "ix_jobs_fts_full_address_en" "gin" ((((("full_text_universal_cast"("address"::"text") || "full_text_universal_cast"("suburb"::"text")) || "full_text_universal_cast"("city"::"text")) || "full_text_universal_cast"("stpr"::"text")) || "full_text_universal_cast"("postc

ode"::"text")))

    "ix_jobs_fts_job_number_en" "gin" ("full_text_universal_cast"("job_number"::"text"))

    "ix_jobs_fts_refnum_en" "gin" ("full_text_universal_cast"("refnum"::"text"))

    "ix_jobs_fts_title_en" "gin" ("full_text_universal_cast"("title"::"text"))

    "ix_jobs_full_address_street_first" "btree" (((((COALESCE("address"::character varying, ''::character varying)::"text" || COALESCE(' '::"text" || "suburb"::"text", ''::"text")) || COALESCE(' '::"text" || "city"::"text", ''::"text")) || COALESCE(' '::"text" || "postc

ode"::"text", ''::"text")) || COALESCE(' '::"text" || "stpr"::"text", ''::"text")))

    "ix_jobs_paying_customers" "btree" ((COALESCE("bill_customer", "customerid")))

    "ix_jobs_status_label_ids" "btree" ("status_label_id")

    "ix_jobs_top_by_client" "btree" ("id", "clientid")

    "ix_mobiuser_jobs" "btree" ("accepted_mobile_user")

    "ix_recurrenceid_jobs" "btree" ("recurrenceid")

    "ix_timejob_jobs" "btree" ("time_job")

    "ja_jobs_client_type" "btree" ("clientid", "jobtype")

    "ja_jobs_the_geom_idx" "gist" ("the_geom")

Question:

What can I do to improve the query? Why is the trigram index not working as expected?

UPDATE: Re-ran the explain analyze buffer

Limit  (cost=199669.37..199669.39 rows=10 width=20) (actual time=31523.690..31523.691 rows=1 loops=1)

  Buffers: shared hit=26947 read=101574 dirtied=438

  ->  Sort  (cost=199669.37..199669.40 rows=11 width=20) (actual time=31523.686..31523.686 rows=1 loops=1)

        Sort Key: "title"

        Sort Method: quicksort  Memory: 25kB

        Buffers: shared hit=26947 read=101574 dirtied=438

        ->  Bitmap Heap Scan on "ja_jobs"  (cost=4850.60..199669.18 rows=11 width=20) (actual time=11714.504..31523.640 rows=1 loops=1)

              Recheck Cond: (("clientid" = 2565) AND ("time_job" > 1382496599))

              Filter: (("title")::"text" ~~* '%Hislop%'::"text")

              Rows Removed by Filter: 207654

              Buffers: shared hit=26942 read=101574 dirtied=438

              ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..4850.60 rows=155054 width=0) (actual time=11670.956..11670.956 rows=215142 loops=1)

                    Index Cond: (("clientid" = 2565) AND ("time_job" > 1382496599))

                    Buffers: shared hit=121 read=5772

Total runtime: 31524.874 ms

After removing DISTINCT and the left %:

explain (analyze, buffers)

SELECT title  

FROM ja_jobs

WHERE title ILIKE 'Hislop 13035%'

AND clientid = 2565

AND time_job > 1382496599

ORDER BY title

LIMIT 10;





Limit  (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.479..3492.483 rows=1 loops=1)

  Buffers: shared hit=4940 read=448

  I/O Timings: read=83.285

  ->  Sort  (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.475..3492.477 rows=1 loops=1)

        Sort Key: "title"

        Sort Method: quicksort  Memory: 25kB

        Buffers: shared hit=4940 read=448

        I/O Timings: read=83.285

        ->  Bitmap Heap Scan on "ja_jobs"  (cost=391.62..2275.38 rows=9 width=20) (actual time=3492.460..3492.462 rows=1 loops=1)

              Recheck Cond: (("title")::"text" ~~* 'Hislop Street Clinic 2513035%'::"text")

              Filter: (("time_job" > 1382496599) AND ("clientid" = 2565))

              Buffers: shared hit=4940 read=448

              I/O Timings: read=83.285

              ->  Bitmap Index Scan on "ix_jobs_trgm_gin"  (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)

                    Index Cond: (("title")::"text" ~~* 'Hislop 13035%'::"text")

                    Buffers: shared hit=4939 read=448

                    I/O Timings: read=83.285

Total runtime: 3492.531 ms

As you can see, the query is hitting the new index but it's slower.

Then I removed ORDER BY but the query is still slow.

Also, I tried to use LIKE (with is much faster), but LIKE is case-sensitive, so for that reason I got no rows back. Can't use it.

postgresql index postgresql-9.2 postgresql-performance pattern-matching

edited 12 mins ago

Erwin Brandstetter

91k9170283

asked May 12 '16 at 2:07

user83914

edited 12 mins ago

Erwin Brandstetter

91k9170283

asked May 12 '16 at 2:07

user83914

edited 12 mins ago

Erwin Brandstetter

91k9170283

edited 12 mins ago

Erwin Brandstetter

91k9170283

edited 12 mins ago

Erwin Brandstetter

91k9170283

asked May 12 '16 at 2:07

user83914

asked May 12 '16 at 2:07

user83914

asked May 12 '16 at 2:07

user83914

Not enough information. Consider instructions in the tag info for [postgresql-performance]. Whatever else you do, consider upgrading to the current version Postgres 9.5. Multiple improvements have been added that might affect your query performance. And DISTINCT is a performance killer for LIMIT queries. I doubt you need DISTINCT at all.

– Erwin Brandstetter
May 12 '16 at 2:19

1

Sorry @ErwinBrandstetter - Question updated. We'll be migrating to Postgres 9.5.. but it'll take a while

– user83914
May 12 '16 at 2:30

add a comment |

Not enough information. Consider instructions in the tag info for [postgresql-performance]. Whatever else you do, consider upgrading to the current version Postgres 9.5. Multiple improvements have been added that might affect your query performance. And DISTINCT is a performance killer for LIMIT queries. I doubt you need DISTINCT at all.

– Erwin Brandstetter
May 12 '16 at 2:19

1

Sorry @ErwinBrandstetter - Question updated. We'll be migrating to Postgres 9.5.. but it'll take a while

– user83914
May 12 '16 at 2:30

Not enough information. Consider instructions in the tag info for [postgresql-performance]. Whatever else you do, consider upgrading to the current version Postgres 9.5. Multiple improvements have been added that might affect your query performance. And DISTINCT is a performance killer for LIMIT queries. I doubt you need DISTINCT at all.

– Erwin Brandstetter
May 12 '16 at 2:19

Sorry @ErwinBrandstetter - Question updated. We'll be migrating to Postgres 9.5.. but it'll take a while

– user83914
May 12 '16 at 2:30

add a comment |

2 Answers
2

active

oldest

votes

You have a lot of indexes. I doubt you need all of them. Check whether all of them are in use. Instructions in the manual, chapter Examining Index Usage.

If your system is configured to gather statistics, it will be particularly revealing to study:

SELECT * FROM pg_stat_user_indexes

These statistics are also displayed in pgAdmin.

Some indexes are particularly odd, like: "ix_ja_jobs_gsdi_pk" "btree" (("id"::"text")) - why would anyone cast a bigint id to text for indexing?

Useless indexes are not going to impair read performance (much), but they are a burden on write performance and general maintenance.

The major difficulty for your query is to estimate the selectivity of your various predicates. That's comparatively simple for the bigint columns clientid and time_job, but hard for pattern matching (title ILIKE 'Hislop 13035%').

In your case (UPDATE 3), Postgres estimates to find 482 rows matching the pattern, but it turns out to be just a single row:

Bitmap Index Scan on "ix_jobs_trgm_gin" (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)

Tuning the query depends on the complete picture: Cardinalities, data distribution, hardware, load, concurrency, query frequencies, priorities ...

It may help to increase the statistics target for involved columns:

ALTER ja_jobs

   ALTER clientid SET STATISTICS 1000

 , ALTER time_job SET STATISTICS 1000

 , ALTER title    SET STATISTICS 1000;

Then: ANALYZE ja_jobs; But don't expect much. Details:

Check statistics targets in PostgreSQL

Estimating selectivity of free-floating LIKE patterns is hard. Left-anchored is easier - you are mixing these two rather different cases freely: ILIKE '%RYAN WER%' (with leading wildcard) is much more complex than ILIKE 'Hislop 13035%'. Overview:

Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

There have been improvements for the latest version, but the biggest improvements come with Postgres 9.6 (currently beta) and its new version of the pg_trgm module. Consider the release notes here. Related:

Trigram search gets much slower as search string gets longer

There are various other ways to improve performance, depending on all the information that's not in your question. Maybe a multicolumn index, or a partial index or a CTE in your query.

All the general advice for performance tuning applies as well:

Configuring PostgreSQL for read performance

http://wiki.postgresql.org/wiki/Performance_Optimization

You already removed the expensive DISTINCT. If you don't need ORDER BY title, it may change the query plan completely to drop that, too: With both of these removed, Postgres is free to pick the first 10 matches and ignore the rest. Else, all matches have to be found and considered. can be much more expensive. Try:

SELECT title

FROM   ja_jobs

WHERE  title ILIKE 'Hislop 13035%'

AND    clientid = 2565

AND    time_job > 1382496599

LIMIT  10;  -- no ORDER BY

If you actually only deal with left-anchored LIKE pattern (trailing wildcards like 'Hislop 13035%', but not: '%RYAN WER%'), then you can use a very fast varchar_pattern_ops index. Detailed explanation:

Operator "~<~" uses varchar_pattern_ops index while normal ORDER BY clause doesn't?

So:

CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs

(clientid, title varchar_pattern_ops, time_job);

Index columns in this order. Equality first, range later. Explanation:

Multicolumn index and performance

You can extend this solution to cover ILIKE with a functional element

CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs

(clientid, lower(title) varchar_pattern_ops, time_job);

And adapt your query:

SELECT title

FROM   ja_jobs

WHERE  lower(title) LIKE lower('Hislop 13035%')

AND    clientid = 2565

AND    time_job > 1382496599

LIMIT  10;

Index on column with data type citext not used

edited May 23 '17 at 12:40

Community♦

answered May 14 '16 at 3:01

Erwin Brandstetter

91k9170283

Thanks for the answer. I've tried to use my query with GIN index + LIKE - It 's much faster, but I don't have the same data.

– user83914
May 14 '16 at 6:12

@JohnThomaz: Sorry, there's not enough information to understand.

– Erwin Brandstetter
May 14 '16 at 13:59

Could I use lower()?

– user83914
May 16 '16 at 22:37

@JohnThomaz: Sure you can use lower(). The question is: what for? Trigram indexes support LIKE and ILIKE. Also: CREATE INDEX ix_ja_jobs_trgm_gin, but the query plan in "UPDATE 3" shows: ix_jobs_trgm_gin. Same index?

– Erwin Brandstetter
May 17 '16 at 1:05

1

Yep.. that worked perfectly fine.. and fast.. thanks a lot!

– user83914
May 18 '16 at 20:36

|
show 2 more comments

The two analyses differ in the text comparison, the first shows ~~*, i.e. case insensitive (ILIKE) whilst the second shows only ~~, i.e. case sensitive (check if you didn't run LIKE by mistake the second time).

answered May 12 '16 at 9:02

Ziggy Crueltyfree Zeitgeister

4,1771819

not sure.. I've re-ran the query anyway. Please check my updated question and the UPDATE 2 field please.

– user83914
May 12 '16 at 9:10

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f138288%2ftrigram-index-for-ilike-patterns-not-working-as-expected%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You have a lot of indexes. I doubt you need all of them. Check whether all of them are in use. Instructions in the manual, chapter Examining Index Usage.

If your system is configured to gather statistics, it will be particularly revealing to study:

SELECT * FROM pg_stat_user_indexes

These statistics are also displayed in pgAdmin.

Some indexes are particularly odd, like: "ix_ja_jobs_gsdi_pk" "btree" (("id"::"text")) - why would anyone cast a bigint id to text for indexing?

Useless indexes are not going to impair read performance (much), but they are a burden on write performance and general maintenance.

In your case (UPDATE 3), Postgres estimates to find 482 rows matching the pattern, but it turns out to be just a single row:

Bitmap Index Scan on "ix_jobs_trgm_gin" (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)

Tuning the query depends on the complete picture: Cardinalities, data distribution, hardware, load, concurrency, query frequencies, priorities ...

It may help to increase the statistics target for involved columns:

ALTER ja_jobs

   ALTER clientid SET STATISTICS 1000

 , ALTER time_job SET STATISTICS 1000

 , ALTER title    SET STATISTICS 1000;

Then: ANALYZE ja_jobs; But don't expect much. Details:

Check statistics targets in PostgreSQL

Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Trigram search gets much slower as search string gets longer

There are various other ways to improve performance, depending on all the information that's not in your question. Maybe a multicolumn index, or a partial index or a CTE in your query.

All the general advice for performance tuning applies as well:

Configuring PostgreSQL for read performance

http://wiki.postgresql.org/wiki/Performance_Optimization

SELECT title

FROM   ja_jobs

WHERE  title ILIKE 'Hislop 13035%'

AND    clientid = 2565

AND    time_job > 1382496599

LIMIT  10;  -- no ORDER BY

Operator "~<~" uses varchar_pattern_ops index while normal ORDER BY clause doesn't?

So:

CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs

(clientid, title varchar_pattern_ops, time_job);

Index columns in this order. Equality first, range later. Explanation:

Multicolumn index and performance

You can extend this solution to cover ILIKE with a functional element

CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs

(clientid, lower(title) varchar_pattern_ops, time_job);

And adapt your query:

SELECT title

FROM   ja_jobs

WHERE  lower(title) LIKE lower('Hislop 13035%')

AND    clientid = 2565

AND    time_job > 1382496599

LIMIT  10;

Index on column with data type citext not used

edited May 23 '17 at 12:40

Community♦

answered May 14 '16 at 3:01

Erwin Brandstetter

91k9170283

Thanks for the answer. I've tried to use my query with GIN index + LIKE - It 's much faster, but I don't have the same data.

– user83914
May 14 '16 at 6:12

@JohnThomaz: Sorry, there's not enough information to understand.

– Erwin Brandstetter
May 14 '16 at 13:59

Could I use lower()?

– user83914
May 16 '16 at 22:37

@JohnThomaz: Sure you can use lower(). The question is: what for? Trigram indexes support LIKE and ILIKE. Also: CREATE INDEX ix_ja_jobs_trgm_gin, but the query plan in "UPDATE 3" shows: ix_jobs_trgm_gin. Same index?

– Erwin Brandstetter
May 17 '16 at 1:05

1

Yep.. that worked perfectly fine.. and fast.. thanks a lot!

– user83914
May 18 '16 at 20:36

|
show 2 more comments

You have a lot of indexes. I doubt you need all of them. Check whether all of them are in use. Instructions in the manual, chapter Examining Index Usage.

If your system is configured to gather statistics, it will be particularly revealing to study:

SELECT * FROM pg_stat_user_indexes

These statistics are also displayed in pgAdmin.

Some indexes are particularly odd, like: "ix_ja_jobs_gsdi_pk" "btree" (("id"::"text")) - why would anyone cast a bigint id to text for indexing?

Useless indexes are not going to impair read performance (much), but they are a burden on write performance and general maintenance.

In your case (UPDATE 3), Postgres estimates to find 482 rows matching the pattern, but it turns out to be just a single row:

Bitmap Index Scan on "ix_jobs_trgm_gin" (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)

Tuning the query depends on the complete picture: Cardinalities, data distribution, hardware, load, concurrency, query frequencies, priorities ...

It may help to increase the statistics target for involved columns:

ALTER ja_jobs

   ALTER clientid SET STATISTICS 1000

 , ALTER time_job SET STATISTICS 1000

 , ALTER title    SET STATISTICS 1000;

Then: ANALYZE ja_jobs; But don't expect much. Details:

Check statistics targets in PostgreSQL

Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Trigram search gets much slower as search string gets longer

There are various other ways to improve performance, depending on all the information that's not in your question. Maybe a multicolumn index, or a partial index or a CTE in your query.

All the general advice for performance tuning applies as well:

Configuring PostgreSQL for read performance

http://wiki.postgresql.org/wiki/Performance_Optimization

SELECT title

FROM   ja_jobs

WHERE  title ILIKE 'Hislop 13035%'

AND    clientid = 2565

AND    time_job > 1382496599

LIMIT  10;  -- no ORDER BY

Operator "~<~" uses varchar_pattern_ops index while normal ORDER BY clause doesn't?

So:

CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs

(clientid, title varchar_pattern_ops, time_job);

Index columns in this order. Equality first, range later. Explanation:

Multicolumn index and performance

You can extend this solution to cover ILIKE with a functional element

CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs

(clientid, lower(title) varchar_pattern_ops, time_job);

And adapt your query:

SELECT title

FROM   ja_jobs

WHERE  lower(title) LIKE lower('Hislop 13035%')

AND    clientid = 2565

AND    time_job > 1382496599

LIMIT  10;

Index on column with data type citext not used

edited May 23 '17 at 12:40

Community♦

answered May 14 '16 at 3:01

Erwin Brandstetter

91k9170283

Thanks for the answer. I've tried to use my query with GIN index + LIKE - It 's much faster, but I don't have the same data.

– user83914
May 14 '16 at 6:12

@JohnThomaz: Sorry, there's not enough information to understand.

– Erwin Brandstetter
May 14 '16 at 13:59

Could I use lower()?

– user83914
May 16 '16 at 22:37

@JohnThomaz: Sure you can use lower(). The question is: what for? Trigram indexes support LIKE and ILIKE. Also: CREATE INDEX ix_ja_jobs_trgm_gin, but the query plan in "UPDATE 3" shows: ix_jobs_trgm_gin. Same index?

– Erwin Brandstetter
May 17 '16 at 1:05

1

Yep.. that worked perfectly fine.. and fast.. thanks a lot!

– user83914
May 18 '16 at 20:36

|
show 2 more comments

You have a lot of indexes. I doubt you need all of them. Check whether all of them are in use. Instructions in the manual, chapter Examining Index Usage.

If your system is configured to gather statistics, it will be particularly revealing to study:

SELECT * FROM pg_stat_user_indexes

These statistics are also displayed in pgAdmin.

Some indexes are particularly odd, like: "ix_ja_jobs_gsdi_pk" "btree" (("id"::"text")) - why would anyone cast a bigint id to text for indexing?

Useless indexes are not going to impair read performance (much), but they are a burden on write performance and general maintenance.

In your case (UPDATE 3), Postgres estimates to find 482 rows matching the pattern, but it turns out to be just a single row:

Bitmap Index Scan on "ix_jobs_trgm_gin" (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)

Tuning the query depends on the complete picture: Cardinalities, data distribution, hardware, load, concurrency, query frequencies, priorities ...

It may help to increase the statistics target for involved columns:

ALTER ja_jobs

   ALTER clientid SET STATISTICS 1000

 , ALTER time_job SET STATISTICS 1000

 , ALTER title    SET STATISTICS 1000;

Then: ANALYZE ja_jobs; But don't expect much. Details:

Check statistics targets in PostgreSQL

Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Trigram search gets much slower as search string gets longer

There are various other ways to improve performance, depending on all the information that's not in your question. Maybe a multicolumn index, or a partial index or a CTE in your query.

All the general advice for performance tuning applies as well:

Configuring PostgreSQL for read performance

http://wiki.postgresql.org/wiki/Performance_Optimization

SELECT title

FROM   ja_jobs

WHERE  title ILIKE 'Hislop 13035%'

AND    clientid = 2565

AND    time_job > 1382496599

LIMIT  10;  -- no ORDER BY

Operator "~<~" uses varchar_pattern_ops index while normal ORDER BY clause doesn't?

So:

CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs

(clientid, title varchar_pattern_ops, time_job);

Index columns in this order. Equality first, range later. Explanation:

Multicolumn index and performance

You can extend this solution to cover ILIKE with a functional element

CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs

(clientid, lower(title) varchar_pattern_ops, time_job);

And adapt your query:

SELECT title

FROM   ja_jobs

WHERE  lower(title) LIKE lower('Hislop 13035%')

AND    clientid = 2565

AND    time_job > 1382496599

LIMIT  10;

Index on column with data type citext not used

edited May 23 '17 at 12:40

Community♦

answered May 14 '16 at 3:01

Erwin Brandstetter

91k9170283

You have a lot of indexes. I doubt you need all of them. Check whether all of them are in use. Instructions in the manual, chapter Examining Index Usage.

If your system is configured to gather statistics, it will be particularly revealing to study:

SELECT * FROM pg_stat_user_indexes

These statistics are also displayed in pgAdmin.

Some indexes are particularly odd, like: "ix_ja_jobs_gsdi_pk" "btree" (("id"::"text")) - why would anyone cast a bigint id to text for indexing?

Useless indexes are not going to impair read performance (much), but they are a burden on write performance and general maintenance.

In your case (UPDATE 3), Postgres estimates to find 482 rows matching the pattern, but it turns out to be just a single row:

Bitmap Index Scan on "ix_jobs_trgm_gin" (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)

Tuning the query depends on the complete picture: Cardinalities, data distribution, hardware, load, concurrency, query frequencies, priorities ...

It may help to increase the statistics target for involved columns:

ALTER ja_jobs

   ALTER clientid SET STATISTICS 1000

 , ALTER time_job SET STATISTICS 1000

 , ALTER title    SET STATISTICS 1000;

Then: ANALYZE ja_jobs; But don't expect much. Details:

Check statistics targets in PostgreSQL

Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Trigram search gets much slower as search string gets longer

There are various other ways to improve performance, depending on all the information that's not in your question. Maybe a multicolumn index, or a partial index or a CTE in your query.

All the general advice for performance tuning applies as well:

Configuring PostgreSQL for read performance

http://wiki.postgresql.org/wiki/Performance_Optimization

SELECT title

FROM   ja_jobs

WHERE  title ILIKE 'Hislop 13035%'

AND    clientid = 2565

AND    time_job > 1382496599

LIMIT  10;  -- no ORDER BY

Operator "~<~" uses varchar_pattern_ops index while normal ORDER BY clause doesn't?

So:

CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs

(clientid, title varchar_pattern_ops, time_job);

Index columns in this order. Equality first, range later. Explanation:

Multicolumn index and performance

You can extend this solution to cover ILIKE with a functional element

CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs

(clientid, lower(title) varchar_pattern_ops, time_job);

And adapt your query:

SELECT title

FROM   ja_jobs

WHERE  lower(title) LIKE lower('Hislop 13035%')

AND    clientid = 2565

AND    time_job > 1382496599

LIMIT  10;

Index on column with data type citext not used

edited May 23 '17 at 12:40

Community♦

answered May 14 '16 at 3:01

Erwin Brandstetter

91k9170283

edited May 23 '17 at 12:40

Community♦

edited May 23 '17 at 12:40

Community♦

edited May 23 '17 at 12:40

Community♦

answered May 14 '16 at 3:01

Erwin Brandstetter

91k9170283

answered May 14 '16 at 3:01

Erwin Brandstetter

91k9170283

answered May 14 '16 at 3:01

Erwin Brandstetter

91k9170283

Thanks for the answer. I've tried to use my query with GIN index + LIKE - It 's much faster, but I don't have the same data.

– user83914
May 14 '16 at 6:12

@JohnThomaz: Sorry, there's not enough information to understand.

– Erwin Brandstetter
May 14 '16 at 13:59

Could I use lower()?

– user83914
May 16 '16 at 22:37

@JohnThomaz: Sure you can use lower(). The question is: what for? Trigram indexes support LIKE and ILIKE. Also: CREATE INDEX ix_ja_jobs_trgm_gin, but the query plan in "UPDATE 3" shows: ix_jobs_trgm_gin. Same index?

– Erwin Brandstetter
May 17 '16 at 1:05

1

Yep.. that worked perfectly fine.. and fast.. thanks a lot!

– user83914
May 18 '16 at 20:36

|
show 2 more comments

Thanks for the answer. I've tried to use my query with GIN index + LIKE - It 's much faster, but I don't have the same data.

– user83914
May 14 '16 at 6:12

@JohnThomaz: Sorry, there's not enough information to understand.

– Erwin Brandstetter
May 14 '16 at 13:59

Could I use lower()?

– user83914
May 16 '16 at 22:37

@JohnThomaz: Sure you can use lower(). The question is: what for? Trigram indexes support LIKE and ILIKE. Also: CREATE INDEX ix_ja_jobs_trgm_gin, but the query plan in "UPDATE 3" shows: ix_jobs_trgm_gin. Same index?

– Erwin Brandstetter
May 17 '16 at 1:05

1

Yep.. that worked perfectly fine.. and fast.. thanks a lot!

– user83914
May 18 '16 at 20:36

Thanks for the answer. I've tried to use my query with GIN index + LIKE - It 's much faster, but I don't have the same data.

– user83914
May 14 '16 at 6:12

@JohnThomaz: Sorry, there's not enough information to understand.

– Erwin Brandstetter
May 14 '16 at 13:59

Could I use lower()?

– user83914
May 16 '16 at 22:37

@JohnThomaz: Sure you can use lower(). The question is: what for? Trigram indexes support LIKE and ILIKE. Also: CREATE INDEX ix_ja_jobs_trgm_gin, but the query plan in "UPDATE 3" shows: ix_jobs_trgm_gin. Same index?

– Erwin Brandstetter
May 17 '16 at 1:05

Yep.. that worked perfectly fine.. and fast.. thanks a lot!

– user83914
May 18 '16 at 20:36

|
show 2 more comments

answered May 12 '16 at 9:02

Ziggy Crueltyfree Zeitgeister

4,1771819

not sure.. I've re-ran the query anyway. Please check my updated question and the UPDATE 2 field please.

– user83914
May 12 '16 at 9:10

add a comment |

answered May 12 '16 at 9:02

Ziggy Crueltyfree Zeitgeister

4,1771819

not sure.. I've re-ran the query anyway. Please check my updated question and the UPDATE 2 field please.

– user83914
May 12 '16 at 9:10

add a comment |

answered May 12 '16 at 9:02

Ziggy Crueltyfree Zeitgeister

4,1771819

answered May 12 '16 at 9:02

Ziggy Crueltyfree Zeitgeister

4,1771819

answered May 12 '16 at 9:02

Ziggy Crueltyfree Zeitgeister

4,1771819

answered May 12 '16 at 9:02

Ziggy Crueltyfree Zeitgeister

4,1771819

answered May 12 '16 at 9:02

Ziggy Crueltyfree Zeitgeister

4,1771819

not sure.. I've re-ran the query anyway. Please check my updated question and the UPDATE 2 field please.

– user83914
May 12 '16 at 9:10

add a comment |

not sure.. I've re-ran the query anyway. Please check my updated question and the UPDATE 2 field please.

– user83914
May 12 '16 at 9:10

not sure.. I've re-ran the query anyway. Please check my updated question and the UPDATE 2 field please.

– user83914
May 12 '16 at 9:10

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vryjdfkk