One query with multiple JOINs vs multiple queries

I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.

I'm considering two approaches: single query with multiple joins or two separate but simpler queries.

At a high level, and simplifying the structure, my schema is:

CREATE EXTENSION "uuid-ossp";

CREATE EXTENSION IF NOT EXISTS postgis;





CREATE TABLE addresses (

    id bigint NOT NULL,

    latitude double precision,

    longitude double precision,

    line1 character varying NOT NULL,

    "position" geography(Point,4326),

    CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))

);



CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");



CREATE TABLE locations (

    id bigint NOT NULL,

    uuid uuid DEFAULT uuid_generate_v4() NOT NULL,

    address_id bigint NOT NULL

);



CREATE TABLE shops (

    id bigint NOT NULL,

    name character varying NOT NULL,

    location_id bigint NOT NULL

);



CREATE TABLE inventories (

    id bigint NOT NULL,

    shop_id bigint NOT NULL,

    status character varying NOT NULL

);

The addresses table holds the geographical data. The position column is calculated from the lat-lng columns when the rows are inserted or updated.

Each address is associated to one location.

Each address may have many shops, and each shop will have one inventory.

I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.

The tables have a few hundreds of thousands of rows.

With that in place, my main use case can be satisfied by this single query, which searches for addresses within 1000 meters from a central geographical point (10.0, 10.0) and returns data from all the tables:

SELECT

    s.id AS shop_id,

    s.name AS shop_name,

    i.status AS inventory_status,

    l.uuid AS location_uuid,

    a.line1 AS addr_line,

    a.latitude AS lat,

    a.longitude AS lng

FROM addresses a

JOIN locations l ON l.address_id = a.id

JOIN shops s ON s.location_id = l.id

JOIN inventories i ON i.shop_id = s.id

WHERE ST_DWithin(

    a.position,                             -- the position of each address

    ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle

    1000,                                   -- radius distance in meters

    true

);

This query works, and EXPLAIN ANALYZE shows that it does correctly use the GIST index.

However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:

--- only search for the addresses

SELECT

    a.id as addr_id,

    a.line1 AS addr_line,

    a.latitude AS lat,

    a.longitude AS lng

FROM addresses a

WHERE ST_DWithin(

    a.position,                             -- the position of each address

    ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle

    1000,                                   -- radius distance in meters

    true

);



--- get the rest of the data

SELECT

    s.id AS shop_id,

    s.name AS shop_name,

    i.status AS inventory_status,

    l.id AS location_id,

    l.uuid AS location_uuid

FROM locations l

JOIN shops s ON s.location_id = l.id

JOIN inventories i ON i.shop_id = s.id

WHERE

    l.address_id IN (1, 2, 3, 4, 5)  -- potentially thousands of values

;

where the values in l.address_id IN (1, 2, 3, 4, 5) come from the first query.

The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.

I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.

What about memory usage? Or resource contention on the tables? (e.g. locks)

asked 4 mins ago

tompave

101

New contributor

add a comment |

I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.

I'm considering two approaches: single query with multiple joins or two separate but simpler queries.

At a high level, and simplifying the structure, my schema is:

CREATE EXTENSION "uuid-ossp";

CREATE EXTENSION IF NOT EXISTS postgis;





CREATE TABLE addresses (

    id bigint NOT NULL,

    latitude double precision,

    longitude double precision,

    line1 character varying NOT NULL,

    "position" geography(Point,4326),

    CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))

);



CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");



CREATE TABLE locations (

    id bigint NOT NULL,

    uuid uuid DEFAULT uuid_generate_v4() NOT NULL,

    address_id bigint NOT NULL

);



CREATE TABLE shops (

    id bigint NOT NULL,

    name character varying NOT NULL,

    location_id bigint NOT NULL

);



CREATE TABLE inventories (

    id bigint NOT NULL,

    shop_id bigint NOT NULL,

    status character varying NOT NULL

);

The addresses table holds the geographical data. The position column is calculated from the lat-lng columns when the rows are inserted or updated.

Each address is associated to one location.

Each address may have many shops, and each shop will have one inventory.

I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.

The tables have a few hundreds of thousands of rows.

SELECT

    s.id AS shop_id,

    s.name AS shop_name,

    i.status AS inventory_status,

    l.uuid AS location_uuid,

    a.line1 AS addr_line,

    a.latitude AS lat,

    a.longitude AS lng

FROM addresses a

JOIN locations l ON l.address_id = a.id

JOIN shops s ON s.location_id = l.id

JOIN inventories i ON i.shop_id = s.id

WHERE ST_DWithin(

    a.position,                             -- the position of each address

    ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle

    1000,                                   -- radius distance in meters

    true

);

This query works, and EXPLAIN ANALYZE shows that it does correctly use the GIST index.

However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:

--- only search for the addresses

SELECT

    a.id as addr_id,

    a.line1 AS addr_line,

    a.latitude AS lat,

    a.longitude AS lng

FROM addresses a

WHERE ST_DWithin(

    a.position,                             -- the position of each address

    ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle

    1000,                                   -- radius distance in meters

    true

);



--- get the rest of the data

SELECT

    s.id AS shop_id,

    s.name AS shop_name,

    i.status AS inventory_status,

    l.id AS location_id,

    l.uuid AS location_uuid

FROM locations l

JOIN shops s ON s.location_id = l.id

JOIN inventories i ON i.shop_id = s.id

WHERE

    l.address_id IN (1, 2, 3, 4, 5)  -- potentially thousands of values

;

where the values in l.address_id IN (1, 2, 3, 4, 5) come from the first query.

The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.

I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.

What about memory usage? Or resource contention on the tables? (e.g. locks)

asked 4 mins ago

tompave

101

New contributor

add a comment |

I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.

I'm considering two approaches: single query with multiple joins or two separate but simpler queries.

At a high level, and simplifying the structure, my schema is:

CREATE EXTENSION "uuid-ossp";

CREATE EXTENSION IF NOT EXISTS postgis;





CREATE TABLE addresses (

    id bigint NOT NULL,

    latitude double precision,

    longitude double precision,

    line1 character varying NOT NULL,

    "position" geography(Point,4326),

    CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))

);



CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");



CREATE TABLE locations (

    id bigint NOT NULL,

    uuid uuid DEFAULT uuid_generate_v4() NOT NULL,

    address_id bigint NOT NULL

);



CREATE TABLE shops (

    id bigint NOT NULL,

    name character varying NOT NULL,

    location_id bigint NOT NULL

);



CREATE TABLE inventories (

    id bigint NOT NULL,

    shop_id bigint NOT NULL,

    status character varying NOT NULL

);

The addresses table holds the geographical data. The position column is calculated from the lat-lng columns when the rows are inserted or updated.

Each address is associated to one location.

Each address may have many shops, and each shop will have one inventory.

I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.

The tables have a few hundreds of thousands of rows.

SELECT

    s.id AS shop_id,

    s.name AS shop_name,

    i.status AS inventory_status,

    l.uuid AS location_uuid,

    a.line1 AS addr_line,

    a.latitude AS lat,

    a.longitude AS lng

FROM addresses a

JOIN locations l ON l.address_id = a.id

JOIN shops s ON s.location_id = l.id

JOIN inventories i ON i.shop_id = s.id

WHERE ST_DWithin(

    a.position,                             -- the position of each address

    ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle

    1000,                                   -- radius distance in meters

    true

);

This query works, and EXPLAIN ANALYZE shows that it does correctly use the GIST index.

However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:

--- only search for the addresses

SELECT

    a.id as addr_id,

    a.line1 AS addr_line,

    a.latitude AS lat,

    a.longitude AS lng

FROM addresses a

WHERE ST_DWithin(

    a.position,                             -- the position of each address

    ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle

    1000,                                   -- radius distance in meters

    true

);



--- get the rest of the data

SELECT

    s.id AS shop_id,

    s.name AS shop_name,

    i.status AS inventory_status,

    l.id AS location_id,

    l.uuid AS location_uuid

FROM locations l

JOIN shops s ON s.location_id = l.id

JOIN inventories i ON i.shop_id = s.id

WHERE

    l.address_id IN (1, 2, 3, 4, 5)  -- potentially thousands of values

;

where the values in l.address_id IN (1, 2, 3, 4, 5) come from the first query.

The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.

I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.

What about memory usage? Or resource contention on the tables? (e.g. locks)

asked 4 mins ago

tompave

101

New contributor

I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.

I'm considering two approaches: single query with multiple joins or two separate but simpler queries.

At a high level, and simplifying the structure, my schema is:

CREATE EXTENSION "uuid-ossp";

CREATE EXTENSION IF NOT EXISTS postgis;





CREATE TABLE addresses (

    id bigint NOT NULL,

    latitude double precision,

    longitude double precision,

    line1 character varying NOT NULL,

    "position" geography(Point,4326),

    CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))

);



CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");



CREATE TABLE locations (

    id bigint NOT NULL,

    uuid uuid DEFAULT uuid_generate_v4() NOT NULL,

    address_id bigint NOT NULL

);



CREATE TABLE shops (

    id bigint NOT NULL,

    name character varying NOT NULL,

    location_id bigint NOT NULL

);



CREATE TABLE inventories (

    id bigint NOT NULL,

    shop_id bigint NOT NULL,

    status character varying NOT NULL

);

The addresses table holds the geographical data. The position column is calculated from the lat-lng columns when the rows are inserted or updated.

Each address is associated to one location.

Each address may have many shops, and each shop will have one inventory.

I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.

The tables have a few hundreds of thousands of rows.

SELECT

    s.id AS shop_id,

    s.name AS shop_name,

    i.status AS inventory_status,

    l.uuid AS location_uuid,

    a.line1 AS addr_line,

    a.latitude AS lat,

    a.longitude AS lng

FROM addresses a

JOIN locations l ON l.address_id = a.id

JOIN shops s ON s.location_id = l.id

JOIN inventories i ON i.shop_id = s.id

WHERE ST_DWithin(

    a.position,                             -- the position of each address

    ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle

    1000,                                   -- radius distance in meters

    true

);

This query works, and EXPLAIN ANALYZE shows that it does correctly use the GIST index.

However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:

--- only search for the addresses

SELECT

    a.id as addr_id,

    a.line1 AS addr_line,

    a.latitude AS lat,

    a.longitude AS lng

FROM addresses a

WHERE ST_DWithin(

    a.position,                             -- the position of each address

    ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle

    1000,                                   -- radius distance in meters

    true

);



--- get the rest of the data

SELECT

    s.id AS shop_id,

    s.name AS shop_name,

    i.status AS inventory_status,

    l.id AS location_id,

    l.uuid AS location_uuid

FROM locations l

JOIN shops s ON s.location_id = l.id

JOIN inventories i ON i.shop_id = s.id

WHERE

    l.address_id IN (1, 2, 3, 4, 5)  -- potentially thousands of values

;

where the values in l.address_id IN (1, 2, 3, 4, 5) come from the first query.

The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.

I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.

What about memory usage? Or resource contention on the tables? (e.g. locks)

postgresql join postgis

asked 4 mins ago

tompave

101

New contributor

asked 4 mins ago

tompave

101

New contributor

asked 4 mins ago

tompave

101

New contributor

asked 4 mins ago

tompave

101

asked 4 mins ago

tompave

101

New contributor

tompave is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

tompave is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f232347%2fone-query-with-multiple-joins-vs-multiple-queries%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

tompave is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

tompave is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vryjdfkk