Given a hash value, an you determine what algorithm produced it? Or what algorithm probably produced it?
Obviously if a hash value is 128 bits long, then a 128-bit algorithm produced it. Such a hash value might have been produced by MD5, but not by SHA-1, because the former produces 128-bit hashes and the latter a 160-bit hash.
The more interesting question is this: given an n-bit hash, can you tell which n-bit hash algorithm it probably came from? You will find contradictory answers online. Some people confidently say no, that a hash value is for practical purposes a random selection from its range. Others say yes, and have point to software that reports which algorithm the hash probably came from.
Some hashing software has a custom output format, such as sticking a colon at a fixed position inside the hash value. Software such as hashid
uses regular expressions to spot these custom formats.
But assuming you have a hash value that is simply a number, you cannot know which hash algorithm produced it, other than narrowing it to a list of known algorithms that produce a number of that length. If you could know, this would indicate a flaw in the hashing algorithm.
So, for example, a 160-bit hash value could come from SHA-1, or it could come from RIPEMD-160, Haval-160, Tiger-160, or any other hash function that produces 160-bit output.
To say which algorithm probably produced the hash, you need context, some sort of modeling assumption. In general SHA-1 is the most popular 160-bit hash algorithm, so if you have nothing else to go on, your best guess for the origin of a 160-bit hash value would be SHA-1. But RIPEMD is part of the Bitcoin standard, and so if you find a 160-bit hash value in the context of Bitcoin, it’s more likely to be RIPEMD. There must be contexts in which Haval-160 and Tiger-160 are more likely, though I don’t know what those contexts would be.
Barring telltale formatting, software that tells you which hash functions most likely produced a hash value is simply reporting the most popular hash functions for the given length.
For example, I produced a 160-bit hash of “hello world” using RIMEMD-160
echo -n "hello world" | openssl dgst -rmd160
then asked hashid
where it came from.
hashid '98c615784ccb5fe5936fbc0cbe9dfdb408d92f0f' Analyzing '98c615784ccb5fe5936fbc0cbe9dfdb408d92f0f' [+] SHA-1 [+] Double SHA-1 [+] RIPEMD-160 [+] Haval-160 [+] Tiger-160 [+] HAS-160 [+] LinkedIn [+] Skein-256(160) [+] Skein-512(160)
I got exactly the same output when I hashed “Gran Torino” and “Henry V” because the output doesn’t depend on the hashes per se, only their length.
Related posts
The post Identifying hash algorithms first appeared on John D. Cook.