Improving Crypto Transparency With Zero-Knowledge Proof

TL;DR
A zero-knowledge proof allows one party (a verifier) to determine the validity of a statement given by another party (the prover) without any knowledge of the statement’s content. For example, Binance may want to prove it has backed its users’ funds fully in reserves without revealing all individual user balances.
A “Proof of Reserves” could be constructed with a Merkle tree that protects against falsification of its internal data, in this case, its total net customer balances, being liabilities of the exchange to its users. This can then be combined with a zk-SNARK (a zero-knowledge proof protocol) that ensures users can check their balance forms part of the total net user asset balance without knowing individual balances.
IntroductionIn light of market events, the security of crypto assets in custody has become a critical topic. Blockchain users highly value transparency and openness but also support privacy and confidentiality. This creates a dilemma when proving reserves of funds held by custodians. Often, there is a trade-off between transparency, trust, and data confidentiality.
However, this doesn’t have to be the case. By combining zero-knowledge proof protocols like zk-SNARKs with Merkle trees, we can find an effective solution for all parties.
What Is Zero-Knowledge Proof?A zero-knowledge proof allows one party (a verifier) to determine the validity of a statement given by another party (the prover) without any knowledge of the statement’s content. Let’s look at a simple example.
You have a locked safe that only you know the solution to. The safe, for the sake of the example, cannot be picked, forced, or opened in any other way than by knowing the combination. This fact is also established, verified, and known by your friend participating in the experiment.
You state you know the combination to your friend, but you don’t want to give it away or open the box in front of them. On top of the box is a hole that your friend can put a note through. To make this a zero-knowledge proof, your friend shouldn’t have any extra information about the process other than the given statement.
You can prove to your friend that you know the combination by opening the box, telling them what was written on the note, and closing it again. At no point have you, however, revealed the combination.
For a more advanced example, see our What Is Zero-knowledge Proof and How Does It Impact Blockchain? article.
Why Do We Use Zero Knowledge Proof?Zero-knowledge proofs are suitable for proving something without revealing sensitive information or details. This could be the case if you don’t want to hand over your financial or personal information that could be inappropriately used.
In crypto, you could prove you own a private key without revealing it or digitally signing something. A cryptocurrency exchange may also want to prove the status of its reserves without revealing confidential information about its users, including their individual account balances. 
For these examples (and many others), a zero-knowledge proof would use algorithms that take a data input and return “true” or “false” as an output. 
Defining Zero-Knowledge Proofs in Technical TermsA zero-knowledge proof, in technical terms, follows a specific structure with certain criteria. We’ve already covered the prover and verifier roles, but there are also three criteria a zero-knowledge proof should cover:
Completeness. If the statement is true, a verifier will be convinced by the provided proof, without the need for any other information or verification.
Soundness. If the statement is false, a verifier won’t be convinced of a statement’s truth by the provided proof.
Zero-knowledge. If the statement is true, the verifier doesn’t learn any information other than the statement being true.
What Is a zk-SNARK?A zk-SNARK (Zero-Knowledge Succinct Non-Interactive Argument of Knowledge) is a proof protocol that follows the zero-knowledge principles previously outlined. With a zk-SNARK, you could prove that you know the original hashed value (discussed further below) without revealing what that is. You could also prove the validity of a transaction without revealing any information about the specific amounts, values, or addresses involved.
zk-SNARKs are commonly used and discussed within the blockchain and cryptocurrency world. But you may wonder why someone would bother using a zk-SNARK when they could use a simple public and private key pair method to secure the information. However, we would not be able to implement the mathematical proof to ensure no negative balances are included and the sum of the Merkle tree. 
In the case of an exchange’s reserves, we want to prove 1:1 backing of customers' balances without the identifiers and balances of each account being made public. In addition, the zk-SNARK technology makes falsifying data even more unlikely.
What Is a Merkle Tree?Presenting the summed funds of Binance users’ accounts requires working with a large data set. One way to present this large amount of data cryptographically is to use a Merkle tree. A vast amount of information can be efficiently stored within it, and its cryptographic nature makes its integrity easily verifiable.
Hash functionsTo succinctly encode an input, a Merkle tree depends on the use of hash functions. In short, hashing is the process of generating a fixed-size output from an input of variable size. In other words, when an input of any length is hashed through an algorithm, it will produce an encrypted fixed-length output.
So long as the input remains the same, the output will too. This means we can take huge amounts of transactional data and hash it into a manageable output. The output will be radically different if any information is changed in the input.
For example, we could take the content of 100 books and input them into the SHA-256 hash function. It would then provide something like this as an output:
801a9be154c78caa032a37b4a4f0747f1e1addb397b64fa8581d749d704c12ea
If we then changed a single character of the input (those 100 books), the hash would be completely different, like so:
abc5d230121d93a93a25bf7cf54ab71e8617114ccb57385a87ff12872bfda410
That’s an important property of hash functions because it allows for easy verification of data accuracy. If anyone replicates the process of hashing those same 100 books using the SHA-256 algorithm, they will get the exact same hash as the output. If the output is different, we can affirm with certainty that the input was changed. This means there’s no need to individually or manually check for differences between the inputs, which can be labor-intensive.
Merkle trees in the cryptocurrency worldWhen storing transaction data on a blockchain, each new transaction is submitted through a hash function, which generates unique hash values. Imagine we have eight transactions (A to H) that we individually hash to get their hashed outputs. These are what we call the Merkle leaf nodes. In the image below, you can see the unique hash value of each letter: hA for A, hB for B, hC for C, etc.
We can then take pairs of hashed outputs, combine them, and receive a new hashed output. The hashes of hA and hB hashed together, for example, would give us a new hashed output of hAB known as a Merkle branch. Note that each time a new output is generated, it comes with a fixed length and size, according to the hash function used.
Now, we have the data of two transactions (e.g., A and B) combined in one hash (hAB). Note that if we change any information from A or B and repeat the process, our hashed output hAB would be completely different.
The process continues as we combine new pairs of hashes to hash them again (see the image below). We hash hAB with hCD to get a unique hash hABCD and do the same with hEF and hGH to get hEFGH. In the end, we receive a single hash representing the hashed outputs of all previous transactions’ hashes. In other words, the hashed output hABCDEFGH represents all the information that came before it.
The graph displayed above is called a Merkle tree, and the hashed output hABCDEFGH is the Merkle root. We use Merkle roots in block headers, as they cryptographically summarize all transaction data in a block in a succinct manner. We can also quickly verify if any data has been tampered with or changed within the block.
The Limitations of Merkle TreesLet’s return to our CEX reserves example. A CEX wants to prove the 1:1 backing of all its customers’ assets and builds a Merkle tree that hashes together its customer UIDs with their net asset holdings (netting off assets and liabilities) at a token level. Once released (and signed to prove ownership over the Merkle root provided), an individual user would have no way of checking if the Merkle tree is valid without accessing all its inputs.
An exchange may have missed including some inputs. It could also create fake accounts with negative balances to alter the total liability. For example, although customers’ assets may total $1,000,000, a fake account could be added with a balance of -$500,000. This would create a reserves target of only $500,000.
The case for proof of reserves is different from a block’s Merkle root, as users can see all the transactions a block contains on a blockchain explorer. A CEX, however, won’t want to disclose each account balance for security and data privacy reasons. Customers too would not be happy with their account balances being made public. In this case, the CEX cannot prove that user balances add up to the correct total without making other user balances visible.
One solution that exchanges may consider employing is using a trusted third-party auditor. The auditor can check the individual accounts and reserves before finally attesting to the validity of the Merkle root provided. However, for users, this method requires trust in the auditor and the data used for the audit. You don’t have to rely on a third party when you can trust the data.
Combining zk-SNARKs With Merkle TreesThe above issue is a perfect case for using zk-SNARKs. We want to prove that reserves fully cover user liabilities and aren’t falsified. However, for privacy and security reasons, we don’t want to show the verifier the exact makeup of user balances and reserves. 
By using a zk-SNARK, a crypto exchange can prove that all Merkle tree leaf nodes’ balance sets (i.e., user account balances) contribute to the exchange’s claimed total user asset balance. Each user can easily access their leaf node as having been included in the process. The zk-SNARK also ensures any Merkle tree generated doesn’t contain users with a negative total net asset balance (which would imply falsification of data, as all loans are over-collateralized). Also used is a calculation of Binance’s global state, i.e., a list of the total net balance of each asset each Binance customer holds.
Let’s take a look at how Binance approaches the situation. To begin, Binance defines the constraints of the computation it wishes to prove and defines them as a programmable circuit. Below is the set of three constraints Binance uses in its model. 
For each user’s balance set (Merkle tree leaf node), our circuit ensures that:
A user’s asset balances are included in the calculation of the sum of the total net user balances with Binance.
The total net balance of the user is greater than or equal to zero.
The change of Merkle tree root is valid (i.e., not using falsified information) after updating a user’s information to the leaf node hash.
Binance can then generate a zk-SNARK proof for the Merkle tree’s construction according to the circuit. This entails the exchange executing the heavy computation of hashing users’ IDs and balances while ensuring the proof passes the constraints.
A verifier will examine the proof (and its publicly released open-source code) to be convinced that the computation is executed with all constraints met. The verification computation takes an extremely short time compared to the proving time.
At each Proof of Reserves release, the exchange will publish:
1. The Merkle proof for each user.
2. The zk-SNARK proof and public input (a hash of the list of the total net balance of each asset and Merkle root) of the circuit for all users.
Interested parties can verify the Merkle proof, ensuring their individual balances contributed to the Merkle tree root. They can also verify the zk-SNARK proof to ensure the construction of the Merkle tree meets the constraints defined in the circuit. For a more detailed explanation of the zk-SNARK solution and its performance, refer to our How zk-SNARKs Improve Binance’s Proof-of-Reserves System blog.
Closing Thoughtszk-SNARKs provide the technology needed to ensure both data integrity and privacy at the same time. Its application for proving reserves and increasing CEX transparency should help build trust in the blockchain industry. For many, a development like this has been long awaited and comes at a pivotal time for CEXs.
This is the first version of our zk-SNARK, and we are looking forward to receiving community feedback so we can continue to improve the system.
Further Reading(Blog) How zk-SNARKs Improve Binance’s Proof-of-Reserves System
(Academy) Proof of Reserves (PoR)
(Academy) What Is Proof of Reserves and How it Works on Binance
(Announcement) Binance Releases Proof of Reserves System