PostgreSQL: how to partition 1000 tenants





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







0















Setup



We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.



Problem



1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?



More details



Currently, the separation between our tenants is via an account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.



Here are a few facts about the event table:




  • It contains about 300M rows + a few of composite/partial indexes.

  • The count of events by account_id is very uneven, 5% accounts have 50% of the data.

  • There is a timestamp field and a few others (JSONB, author_id, etc..)

  • Write operations: inserts mostly and deletes (per account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.

  • Selects are either for a specific event (by account_id + id) or all events in a given period of time. Period of time is not always set. account_id is always present in the query.


Possible solutions



Partionining by account_id:



Pros:




  • Deletes will be fast because of DROP TABLE.

  • Queries should also be pretty decent since all queries contains WHERE account_id=123


Cons:




  • 1000 partitions too much for Postgres?

  • Uneven distribution of events per account creating a few super big partitions and a few small ones.


Partitioning by timestamp:



Pros:




  • Recent data is usually mostly accessed and will make queries with a timestamp faster.

  • More predictable/even distribution of events.


Cons:




  • Deletion of a single account might touch a lot of partitions - not a big concern.

  • Always needs to contain a filter by timestamp - which is not always possible.









share























  • Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."

    – Patrick Mevzek
    1 min ago




















0















Setup



We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.



Problem



1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?



More details



Currently, the separation between our tenants is via an account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.



Here are a few facts about the event table:




  • It contains about 300M rows + a few of composite/partial indexes.

  • The count of events by account_id is very uneven, 5% accounts have 50% of the data.

  • There is a timestamp field and a few others (JSONB, author_id, etc..)

  • Write operations: inserts mostly and deletes (per account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.

  • Selects are either for a specific event (by account_id + id) or all events in a given period of time. Period of time is not always set. account_id is always present in the query.


Possible solutions



Partionining by account_id:



Pros:




  • Deletes will be fast because of DROP TABLE.

  • Queries should also be pretty decent since all queries contains WHERE account_id=123


Cons:




  • 1000 partitions too much for Postgres?

  • Uneven distribution of events per account creating a few super big partitions and a few small ones.


Partitioning by timestamp:



Pros:




  • Recent data is usually mostly accessed and will make queries with a timestamp faster.

  • More predictable/even distribution of events.


Cons:




  • Deletion of a single account might touch a lot of partitions - not a big concern.

  • Always needs to contain a filter by timestamp - which is not always possible.









share























  • Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."

    – Patrick Mevzek
    1 min ago
















0












0








0








Setup



We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.



Problem



1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?



More details



Currently, the separation between our tenants is via an account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.



Here are a few facts about the event table:




  • It contains about 300M rows + a few of composite/partial indexes.

  • The count of events by account_id is very uneven, 5% accounts have 50% of the data.

  • There is a timestamp field and a few others (JSONB, author_id, etc..)

  • Write operations: inserts mostly and deletes (per account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.

  • Selects are either for a specific event (by account_id + id) or all events in a given period of time. Period of time is not always set. account_id is always present in the query.


Possible solutions



Partionining by account_id:



Pros:




  • Deletes will be fast because of DROP TABLE.

  • Queries should also be pretty decent since all queries contains WHERE account_id=123


Cons:




  • 1000 partitions too much for Postgres?

  • Uneven distribution of events per account creating a few super big partitions and a few small ones.


Partitioning by timestamp:



Pros:




  • Recent data is usually mostly accessed and will make queries with a timestamp faster.

  • More predictable/even distribution of events.


Cons:




  • Deletion of a single account might touch a lot of partitions - not a big concern.

  • Always needs to contain a filter by timestamp - which is not always possible.









share














Setup



We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.



Problem



1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?



More details



Currently, the separation between our tenants is via an account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.



Here are a few facts about the event table:




  • It contains about 300M rows + a few of composite/partial indexes.

  • The count of events by account_id is very uneven, 5% accounts have 50% of the data.

  • There is a timestamp field and a few others (JSONB, author_id, etc..)

  • Write operations: inserts mostly and deletes (per account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.

  • Selects are either for a specific event (by account_id + id) or all events in a given period of time. Period of time is not always set. account_id is always present in the query.


Possible solutions



Partionining by account_id:



Pros:




  • Deletes will be fast because of DROP TABLE.

  • Queries should also be pretty decent since all queries contains WHERE account_id=123


Cons:




  • 1000 partitions too much for Postgres?

  • Uneven distribution of events per account creating a few super big partitions and a few small ones.


Partitioning by timestamp:



Pros:




  • Recent data is usually mostly accessed and will make queries with a timestamp faster.

  • More predictable/even distribution of events.


Cons:




  • Deletion of a single account might touch a lot of partitions - not a big concern.

  • Always needs to contain a filter by timestamp - which is not always possible.







postgresql





share












share










share



share










asked 4 mins ago









Alex PlugaruAlex Plugaru

1162




1162













  • Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."

    – Patrick Mevzek
    1 min ago





















  • Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."

    – Patrick Mevzek
    1 min ago



















Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."

– Patrick Mevzek
1 min ago







Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."

– Patrick Mevzek
1 min ago












0






active

oldest

votes












Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f235108%2fpostgresql-how-to-partition-1000-tenants%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f235108%2fpostgresql-how-to-partition-1000-tenants%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Ronny Ackermann

Köttigit

MySQL 8.0.15 starts normally but any connection hangs