[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [WEB SECURITY] Hashing and entropy




Oliver,

Clearly you've taken some time with this analysis, and I appreciate that. I am also not a cryptanalyst, but I do not think I agree with a couple of your conclusions about the cryptographic basis here. I will try to highlight these places. As Amit notes, (and linked helpfully), and as was supported by Glenn's comments, the actual case of the amount of real entropy in each CC # is woefully small. This makes it a very poor basis for a secure hash.

Adding a long salt to each CC# which is different for each CC and storing it together with the card number helps protect against rainbow tables. It does not mitigate the entropy issue if the salt is also recovered. Performing a large number of hash rounds may also help resist rainbow tables, as people are unlikely to have precomputed the hashes of e.g. 932 rounds of a hash function. Note that the moment this implementation becomes widespread (i.e. is implemented in some sort of widely-used library) some attackers will begin computing tables of 932 rounds of said hash function ;-).

Regarding symmetric encryption, I am not sure I follow your explanation as being valid. It is my understanding that for a symmetric encryption system:

ciphertext = encrypt(plaintext, encryption_key)

and

plaintext = decrypt(ciphertext, encryption_key)

Therefore:

plaintext = decrypt( encrypt(plaintext, encryption_key), encryption_key)

Put in shorter mathematical terms:
z = f(x,y)
x = g(z, y)
and
x = (g(f,y),y)

There are two ways that I believe this model differs from the one you detailed. Firstly, the fact that the encryption is termed 'symmetric' implies that the same key is used for the decryption sequence as for the encryption sequence, it does not imply that x = f( f(x,y), y) (in other words, that performing the forward encryption function a second time on the ciphertext with the same key yields the original plaintext). The joke which I chuckle at in Nico Golde's signature line (he is a frequent poster to oss-security, another security discussion list) about double rot-13 encryption is a humorous exception to this guidance. Secondly, your example seemed to imply that the encryption and decryption was performed without the use of any input aside from the plaintext/ciphertext. You pass only one variable to f() in your examples. If it was your intention that f(x) be effectively defined as g(x, y) where y is the encryption key, this can work, but was not apparent from your post.

I would take exception to the statement that the encryption key in reversible encryption (whether symmetric or asymmetric) need be stored in the same place as the ciphertext. In the case of an asymmetric key system in particular, the private key could reside within a business logic server while the (enciphered) data exists within a database on a (potentially architecturally different) machine.

When dealing with symmetric key cryptography which uses a single key to encipher all CC #'s:

1) The card numbers are readily indexed/compared
2) The entropy problem is fixed. The symmetric key can be created through secure sampling of 'random' sources for an entropy pool, and does not depend on data entropy.
3) It is possible to define a format which pads the CC# with random numbers, and provides an initial-byte index to the start of the real CC # to protect against partial known plaintext attacks
4) If you use a long, highly random key, and it is not compromised via other means, it becomes very computationally expensive to brute force this


However:
1) The key must be stored *somewhere* and may dramatically increase the chance of compromise
2) All cards are broken at once if the same key is used.


As I said, I am not a cryptanalyst, so I apologize if I've botched any of this.

-Nathanael

Lavery, Oliver wrote:

Phew, finally get a break to respond to the thread I started ... sorry.

Yes, as several people pointed out 16 digits is wrong. I actually checked this quickly with a reference while writing the earlier post and only managed to confuse myself :P

Cards are 14-19 digits, with 2-6 consumed by BIN and one for the lunh code. That handy wiki page Amit posted has a worst case scenario of a 14 digit card with a three digit BIN and a Lunh, giving us 10 base 10 digits. *shudder*

I'd like to figure out how big the total search space is for all BINs, but don't have the time this lunch break...


Why Hash? ---------

A few people asked this. I've seen it justified by business requirements around comparing stored card numbers, which is a pretty obvious requirement for a lot of applications if you think about it. For example:

- Application needs to be able to verify a new transaction against the card on file

- Application needs to be able to create a list of all transactions for a single card. (Afaik more than one person can have the exact same CC # in the case of a shared card with secondary cardholders)

Hashing isn't actually the problem, it's these requirements that lead to broken implementations. If you use a different random salt to store every card number, even if the salt is stored with the hash, the attack gets harder as each hash has to be broken independently. This could be better than encryption since it isn't reversible. Unfortunately you also loose the ability to compare card numbers, however.

Collisions aren't as much a problem; collisions happen when two inputs hash to the same value ... not something that's affected by the entropy in the input afaict, it's just a product of the algorithm and it's output length.


Who would be silly enough to do this?! --------------------------------------

From PCI 1.1:
3.4 Render PAN, at minimum, unreadable anywhere it is stored (including data on portable digital media, backup media, in logs, and data received from or stored by wireless networks) by using
any of the following approaches:
• Strong one-way hash functions (hashed indexes)
• Truncation
• Index tokens and pads (pads must be securely stored) [What?!]
• Strong cryptography with associated key management processes and procedures.



It's not silly, it's a best practice!


Encryption is better! ---------------------

I'm not sure this is really true. Correct me if I'm wrong, IANACryptographer, but my thinking is that for any cryptographic function f(), and CC# x:

y = f(x)
y' = f(x)

if y == y' then you're in trouble, provided the attacker can compute f().

Symmetric crypto is probably not something you want to use to protect CC#s in an online transaction processing system since the key has to be stored with the data. Never mind that lots of people do this. If an attacker can get the cyphertext, it's reasonable to assume he can get the key. (Yeah you can hide it somewhere in memory, so I root your box, dump the memory, and use a single known plaintext to find the key in there or somesuch)

Assymetric crypto is better, since you only need to keep the public key online. However, if the crypto is implemented such that comparing card numbers still works, then you have the exact same problem as with hashing. An attacker who has the CC #s will reasonably also have the public key, and can break the encryption through brute force by encrypting all possible plaintexts with the key. It's just slower.

Why do you need to keep the key online? If it's an OLTP application like a payment processor or an ecom site, than the alternative when new CC#s arrive is to store them in plaintext and encrypt them only when the key is available. Maybe there's some way around this?


Solution? ---------

Which gets us back to possible solutions. Thanks for the awesome blog link Bil.

Maybe the question is just how much computational complexity would be enough for CC#s, and whether or not it's feasible for a system to implement it.

Although a much better solution is to never, ever have to do comparisons online, I think. Hashing or Asymmetric crypto implemented such that the same card number will not always encrypt to the same ciphertext seems like the only really strong security to me...


Cheers, ~ol


-----Original Message----- From: Glenn.Everhart@xxxxxxxxx [mailto:Glenn.Everhart@xxxxxxxxx] Sent: Fri 20/06/2008 14:12 To: aksecurity@xxxxxxxxx; nhoyle@xxxxxxxxxxxxx Cc: websecurity@xxxxxxxxxxxxx Subject: RE: [WEB SECURITY] Hashing and entropy

I figure 30 to 38 bits if you know the bank. Why does anyone want to hash
card numbers???

Figuring out what number a hash corresponds to is comparable to doing the
same with SSN (also 9 digits) which would be the 30 bit case.

Heck's bells: you might as well hash someone's age in years and expect that to
obscure it! (That's a bit easier but illustrates the problem.)


If you encrypt or add a decently long salt, you can at least have enough entropy
to make 100 or so bits' worth, or more if you do it better. You have to keep the
extra entropy secret but at least it is harder than just hashing a few billion
trials to match hashes.


-----Original Message-----
From: Amit Klein [mailto:aksecurity@xxxxxxxxx]
Sent: Friday, June 20, 2008 3:34 PM
To: Nathanael Hoyle
Cc: websecurity@xxxxxxxxxxxxx
Subject: Re: [WEB SECURITY] Hashing and entropy


Nathanael Hoyle wrote: > Oliver Lavery wrote: > > <snip> >> The parameters of the problem as I see it are: >> >> 1) credit card numbers have roughly 10**16 distinct values, well >> below 2**128 (MD5) much less 2**160 (SHA1) >> > > I would point out that the practical range may be narrowed by an > attacker. IIRC, all Visa card numbers start with a 4, and all > MasterCard ones start with a 5. These two cases account for a huge > proportion of all CC numbers. In either of these cases, the range is > 10**15 * 2, which is notably smaller.

It's much worse - check out http://en.wikipedia.org/wiki/Credit_card_numbers

-Amit

----------------------------------------------------------------------------
Join us on IRC: irc.freenode.net #webappsec

Have a question? Search The Web Security Mailing List Archives:
http://www.webappsec.org/lists/websecurity/

Subscribe via RSS:
http://www.webappsec.org/rss/websecurity.rss [RSS Feed]

Join WASC on LinkedIn
http://www.linkedin.com/e/gis/83336/4B20E4374DBA



-----------------------------------------
This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and
any attachments are believed to be free of any virus or other
defect that might affect any computer system into which it is
received and opened, it is the responsibility of the recipient to
ensure that it is virus free and no responsibility is accepted by
JPMorgan Chase & Co., its subsidiaries and affiliates, as
applicable, for any loss or damage arising in any way from its use.
If you received this transmission in error, please immediately
contact the sender and destroy the material in its entirety,
whether in electronic or hard copy format. Thank you.

----------------------------------------------------------------------------
Join us on IRC: irc.freenode.net #webappsec

Have a question? Search The Web Security Mailing List Archives:
http://www.webappsec.org/lists/websecurity/

Subscribe via RSS:
http://www.webappsec.org/rss/websecurity.rss [RSS Feed]

Join WASC on LinkedIn
http://www.linkedin.com/e/gis/83336/4B20E4374DBA




----------------------------------------------------------------------------
Join us on IRC: irc.freenode.net #webappsec

Have a question? Search The Web Security Mailing List Archives: http://www.webappsec.org/lists/websecurity/

Subscribe via RSS: http://www.webappsec.org/rss/websecurity.rss [RSS Feed]

Join WASC on LinkedIn
http://www.linkedin.com/e/gis/83336/4B20E4374DBA



Brought to you by http://www.webappsec.org
Search this site