Use Sha vs md5 or Hash in Snowflake-db
Let me preface by saying that I am not using this for storing passwords or any other sensitive info -- I simply want a row-level sha/hash that I can use later or to check for unique records. My tables will be on the long side, in the range of 0.1 - 10 trillion rows.
I am using a Snowflake datawarehouse, and thus my options are SHA1, SHA2, MD5 (each with binary options), and HASH.
I guess I would like to minimize the chance of collisions (given the long tables) while not burning my compute credits needlessly.
Which one is the best option given my use case?
hashing snowflake
bumped to the homepage by Community♦ 6 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
Let me preface by saying that I am not using this for storing passwords or any other sensitive info -- I simply want a row-level sha/hash that I can use later or to check for unique records. My tables will be on the long side, in the range of 0.1 - 10 trillion rows.
I am using a Snowflake datawarehouse, and thus my options are SHA1, SHA2, MD5 (each with binary options), and HASH.
I guess I would like to minimize the chance of collisions (given the long tables) while not burning my compute credits needlessly.
Which one is the best option given my use case?
hashing snowflake
bumped to the homepage by Community♦ 6 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
I can't speak to HASH, but the speed of SHA1, SHA2 (SHA256/SHA512) and MD5 vary depending on implementation, hardware and architecture (64 vs 32 bit). Can you run any simple experiments on the Snowflake platform to solve the performance part of your question?
– Antonius Bloch
Dec 19 '17 at 19:49
add a comment |
Let me preface by saying that I am not using this for storing passwords or any other sensitive info -- I simply want a row-level sha/hash that I can use later or to check for unique records. My tables will be on the long side, in the range of 0.1 - 10 trillion rows.
I am using a Snowflake datawarehouse, and thus my options are SHA1, SHA2, MD5 (each with binary options), and HASH.
I guess I would like to minimize the chance of collisions (given the long tables) while not burning my compute credits needlessly.
Which one is the best option given my use case?
hashing snowflake
Let me preface by saying that I am not using this for storing passwords or any other sensitive info -- I simply want a row-level sha/hash that I can use later or to check for unique records. My tables will be on the long side, in the range of 0.1 - 10 trillion rows.
I am using a Snowflake datawarehouse, and thus my options are SHA1, SHA2, MD5 (each with binary options), and HASH.
I guess I would like to minimize the chance of collisions (given the long tables) while not burning my compute credits needlessly.
Which one is the best option given my use case?
hashing snowflake
hashing snowflake
asked Dec 19 '17 at 17:06
Serban TanasaSerban Tanasa
1085
1085
bumped to the homepage by Community♦ 6 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 6 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
I can't speak to HASH, but the speed of SHA1, SHA2 (SHA256/SHA512) and MD5 vary depending on implementation, hardware and architecture (64 vs 32 bit). Can you run any simple experiments on the Snowflake platform to solve the performance part of your question?
– Antonius Bloch
Dec 19 '17 at 19:49
add a comment |
I can't speak to HASH, but the speed of SHA1, SHA2 (SHA256/SHA512) and MD5 vary depending on implementation, hardware and architecture (64 vs 32 bit). Can you run any simple experiments on the Snowflake platform to solve the performance part of your question?
– Antonius Bloch
Dec 19 '17 at 19:49
I can't speak to HASH, but the speed of SHA1, SHA2 (SHA256/SHA512) and MD5 vary depending on implementation, hardware and architecture (64 vs 32 bit). Can you run any simple experiments on the Snowflake platform to solve the performance part of your question?
– Antonius Bloch
Dec 19 '17 at 19:49
I can't speak to HASH, but the speed of SHA1, SHA2 (SHA256/SHA512) and MD5 vary depending on implementation, hardware and architecture (64 vs 32 bit). Can you run any simple experiments on the Snowflake platform to solve the performance part of your question?
– Antonius Bloch
Dec 19 '17 at 19:49
add a comment |
1 Answer
1
active
oldest
votes
No matter what you pick ...
Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.
https://docs.snowflake.net/manuals/sql-reference/constraints-overview.html
So you have to... Given Snowflake Db table structures and how micro-partitions are pruned/scanned, I suspect something like the query below could get very slow.
It will most likely scan entire table for the rows inserted.
insert into T
Select * from (Select ... union Select ... union Select ... union ...) x
where x.hash not in (Select hash from T)
Using clustering keys may speed up the check for unique, but at the cost of much more data writes.
With native clustering you will need to write something closer to
insert into T
select *
from (Select ... union Select ... union Select ... union ...) s
left Join T t1
-- f1,f2 ... are part of a natural unique key
on s.f1 = t1.f1
and s.f2 = t1.f2
...
and s.hash = t.hash
where t.hash is null
Good Luck
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f193554%2fuse-sha-vs-md5-or-hash-in-snowflake-db%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
No matter what you pick ...
Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.
https://docs.snowflake.net/manuals/sql-reference/constraints-overview.html
So you have to... Given Snowflake Db table structures and how micro-partitions are pruned/scanned, I suspect something like the query below could get very slow.
It will most likely scan entire table for the rows inserted.
insert into T
Select * from (Select ... union Select ... union Select ... union ...) x
where x.hash not in (Select hash from T)
Using clustering keys may speed up the check for unique, but at the cost of much more data writes.
With native clustering you will need to write something closer to
insert into T
select *
from (Select ... union Select ... union Select ... union ...) s
left Join T t1
-- f1,f2 ... are part of a natural unique key
on s.f1 = t1.f1
and s.f2 = t1.f2
...
and s.hash = t.hash
where t.hash is null
Good Luck
add a comment |
No matter what you pick ...
Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.
https://docs.snowflake.net/manuals/sql-reference/constraints-overview.html
So you have to... Given Snowflake Db table structures and how micro-partitions are pruned/scanned, I suspect something like the query below could get very slow.
It will most likely scan entire table for the rows inserted.
insert into T
Select * from (Select ... union Select ... union Select ... union ...) x
where x.hash not in (Select hash from T)
Using clustering keys may speed up the check for unique, but at the cost of much more data writes.
With native clustering you will need to write something closer to
insert into T
select *
from (Select ... union Select ... union Select ... union ...) s
left Join T t1
-- f1,f2 ... are part of a natural unique key
on s.f1 = t1.f1
and s.f2 = t1.f2
...
and s.hash = t.hash
where t.hash is null
Good Luck
add a comment |
No matter what you pick ...
Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.
https://docs.snowflake.net/manuals/sql-reference/constraints-overview.html
So you have to... Given Snowflake Db table structures and how micro-partitions are pruned/scanned, I suspect something like the query below could get very slow.
It will most likely scan entire table for the rows inserted.
insert into T
Select * from (Select ... union Select ... union Select ... union ...) x
where x.hash not in (Select hash from T)
Using clustering keys may speed up the check for unique, but at the cost of much more data writes.
With native clustering you will need to write something closer to
insert into T
select *
from (Select ... union Select ... union Select ... union ...) s
left Join T t1
-- f1,f2 ... are part of a natural unique key
on s.f1 = t1.f1
and s.f2 = t1.f2
...
and s.hash = t.hash
where t.hash is null
Good Luck
No matter what you pick ...
Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.
https://docs.snowflake.net/manuals/sql-reference/constraints-overview.html
So you have to... Given Snowflake Db table structures and how micro-partitions are pruned/scanned, I suspect something like the query below could get very slow.
It will most likely scan entire table for the rows inserted.
insert into T
Select * from (Select ... union Select ... union Select ... union ...) x
where x.hash not in (Select hash from T)
Using clustering keys may speed up the check for unique, but at the cost of much more data writes.
With native clustering you will need to write something closer to
insert into T
select *
from (Select ... union Select ... union Select ... union ...) s
left Join T t1
-- f1,f2 ... are part of a natural unique key
on s.f1 = t1.f1
and s.f2 = t1.f2
...
and s.hash = t.hash
where t.hash is null
Good Luck
edited Aug 29 '18 at 22:31
Anthony Genovese
1,6792924
1,6792924
answered Aug 29 '18 at 18:20
Brian SBrian S
1
1
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f193554%2fuse-sha-vs-md5-or-hash-in-snowflake-db%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I can't speak to HASH, but the speed of SHA1, SHA2 (SHA256/SHA512) and MD5 vary depending on implementation, hardware and architecture (64 vs 32 bit). Can you run any simple experiments on the Snowflake platform to solve the performance part of your question?
– Antonius Bloch
Dec 19 '17 at 19:49