By Mitchel Zilbershteyn
This article covers the intersection of blockchain and AI, specifically focusing on the potential benefits and technical applications of using blockchain technology in the field. It discusses how blockchain can address current issues within the AI space such as the centralization of machine learning, data quality, and content creation. It also explores the business case for blockchain in AI, including its potential to revolutionize industries like healthcare and finance by ensuring data privacy and enabling secure collaboration. Additionally, it highlights technical applications such as zero-knowledge proofs for machine learning, multi-party computation, and proof of training data, showcasing how blockchain can enhance scalability, security, and trust in AI systems.
Table of Contents
Blockchains and the Future of AI
Decentralization
AI today is at risk of being centralized and controlled by a group of large corporations.
- as an alternative, blockchains such as Ethereum offer credibly neutral systems of data which fuel open-source innovation
- the combination of open-source AI and blockchain technology therefore becomes an area with huge potential for trying new methods of technology and for investing
Crypto x AI
Crypto middleware can drastically improve inputs across the supply side of AI by establishing efficient markets for compute and data, as well as tools for attestation or privacy.
- efficient markets - leverages crypto’s peer-to-peer exchange nature, which makes it ideal for facilitating transactions in these markets
- attestation - offers immutable verification of the data
- privacy - all blockchain data and transactions are encrypted
Current issues with using decentralized applications involve a poor UX design that poses as a barrier to entry for those that are not as technically savvy.
- generative AI could be the missing connection for cryptocurrency, revolutionizing user interface and experience, and sparking a significant surge of new technical advancements
Quality over Quantity
The quality of the data that goes into building machine learning models can deteriorate over time as more and more data is added into them, creating models that are trained on lower quality data and that can contain inherent biases
- blockchain allows users to share, own, or monetize their data for training and tuning specific models
- independent data monetization tools for users will encourage them to share higher quality data with the ML model, creating an AI that is trained on verifiable and legitimate data sources
- such a system will lead to the creation of smaller and open source models that are more refined and can rival larger models in output accuracy
Capturing the AI content creation lifecycle
AI content creation has been, and will exponentially increase to be, a huge portion of AI usage.
- blockchain offers a prime opportunity to capture the entire cycle of content creation through establishing ownership and immutable provenance of digital assets through the imprinting of those assets onto the network (think: NFTs)
- creation - a digital asset can be tokenized on the blockchain, meaning that an encrypted representation of that asset can be published on the blockchain
- storage - this proves ownership of the asset by the person that minted it on the chain, where the tokenization process basically says ‘here is a hash that represents this asset and here is the hash of the address that created it, therefore here is the mathematical proof that this address owns this asset’
- monetization - since users own full rights to their data, they can choose to monetize it through things like digital marketplaces, written content business models, social media, gaming, and financial infrastructure
- security - the blockchain creates immutable proof about the creation and ownership of the asset in a way that allows all content to be verified and that prevents the creation of fake assets
- this system will let users take AI content, including images, text, videos, code, and more, and upload it to a decentralized network that gives them not only the proof of complete and independent ownership but an easy way to monetize those assets
Technical Applications
Zero Knowledge proofs and Machine Learning
Zero-knowledge proofs can be used in blockchain technology to combine multiple transactions into one proof as opposed to individual single transactions that go on the chain. By using zero-knowledge proofs, it is also possible to confirm the validity of a group of transactions without disclosing the specific details of each transaction.
This ability enables efficient data processing and decreases the computational burden of verifying individual transactions. Instead of paying a set network fee to verify each individual transaction, that network fee covers all the transactions within the rollup, significantly decreasing the cost per transaction. By grouping several transactions into a single proof, blockchain systems can also handle larger data sets more efficiently, enhancing scalability and performance.
ZK proofs enable efficient and concise verification of compute-intensive processes, such as running a machine learning model off chain. This allows the end product, such as the model's inference, to be consumed on chain by smart contracts (programs on the chain) in the form of a ZK proof.
- by grouping several transactions into a single proof, blockchain systems can handle larger data sets more efficiently, enhancing scalability and performance while also significantly lowering data storage costs since these models tend to require a lot of training data
- this has the potential to create sustainable economies of scale in a way that is decentralized and more secure than traditional computing systems, where instead of only huge companies being able to scale their operations, smaller companies can incentivize scaling by having more people join their network in exchange for revenue share
ZK cryptography can be used to verify that a specific model or pool of data was in fact used in generating inferences when called via an API. It can also conceal the specific weights or data consumed by a model in client-sensitive industries like healthcare or insurance.
- companies can even collaborate more effectively by exchanging data or IP, benefiting from shared learnings while still keeping their resources proprietary
Multi-Party Computation (MPC)
MPC is a cryptographic technique that lets multiple parties work together to calculate a function using their private information without sharing it with each other. When used with blockchain technology, MPC offers a safe and reliable way to perform calculations on private data.
- Two hospitals want to collaborate and analyze their patient data without sharing the raw data with each other, where hospital A has patients’ medical conditions, and hospital B has their genetic traits. Both hospitals want to identify correlations between medical conditions and genetic traits without violating patient privacy.
- Hospital A and Hospital B can jointly compute a function that analyzes the data without revealing any specific patient information to the other party. The computation is performed on encrypted data, and only the results are shared.
- They can then calculate the average age of patients with a specific medical condition, taking into account the genetic traits. The calculation is done collaboratively, with each hospital contributing their private data and the computation being performed in a privacy-preserving manner. The final result, the average age, is shared without exposing any individual patient's information.
How can MPC be implemented on the blockchain?
- Blockchain preserves the nature of MPC’s by ensuring the privacy of sensitive data, but it does so in a way that is decentralized.
- in a centralized network, there is a single entity or authority that has access to and control over the private data and inputs
- this centralized control raises concerns about data privacy and potential misuse or unauthorized access to sensitive information
- in contrast, a decentralized network ensures that each participant's private data remains confidential and secure, as the computation is distributed among multiple parties without any single entity having complete access to the data
- By leveraging the decentralized properties of blockchain, such as immutability and transparency, MPC on blockchain provides additional security guarantees.
- it enables the computation of private data even in the presence of possible adversarial or malicious entities
- the decentralized nature of blockchains goes well with the privacy-focused abilities of MPC, making a strong and private system for calculations
How does MPC work in the context of AI?
- MPC on blockchain technology can be utilized to carry out secure and private training of AI models.
- it enables the computation of AI models using sensitive data while ensuring that the data remains confidential and undisclosed to any of the parties involved
- this aspect holds significant importance, especially in applications that heavily rely on data privacy, such as healthcare systems or financial institutions
- Two hospitals want to collaborate and analyze their patient data without sharing the raw data with each other, where hospital A has patients’ medical conditions, and hospital B has their genetic traits. Both hospitals want to identify correlations between medical conditions and genetic traits without violating patient privacy.
- Each hospital securely uploads its encrypted dataset onto the blockchain, ensuring that the data remains confidential and inaccessible to unauthorized parties. The MPC protocol implemented on the blockchain allows the hospitals to jointly compute the AI model's training process without revealing their individual datasets. The computation is performed on the encrypted data stored on the blockchain.
- Through the MPC protocol, the hospitals collaborate to train the AI model while keeping their data private. The AI model learns from the aggregated information without exposing any specific patient's data. The trained AI model's weights and parameters are securely stored on the blockchain, establishing a trustless and verifiable record of the model's training process.
- The AI model can then be accessed on the blockchain in the form of a ZK proof that contains all of the rolled up information within it in a way that is private and secure.
- When the healthcare system needs to make predictions for a new patient, the AI model can utilize the trained weights from the blockchain and perform inference on the patient's data without revealing any sensitive information.
Proof of Training Data
- Record Creation: When information or data is added to the blockchain, it is recorded in a block along with other transactions or data entries. This can be done by creating a transaction that includes the relevant information or by utilizing smart contracts (code that runs on the blockchain) to store and manage the data.
- Hashing: The information or data added to the blockchain is typically hashed. A hash function takes the input data and produces a unique, fixed-length string of characters. This hash serves as a digital fingerprint of the data.
- Block Validation: The block containing the hashed data is added to the blockchain. The block is validated by network participants (nodes) through a consensus mechanism, such as proof-of-work or proof-of-stake. This ensures that the block is legitimate and agreed upon by the network.
- Immutability: Once the block is added to the blockchain and validated, it becomes part of an immutable chain of blocks. The blocks are linked together using cryptographic hashes, creating a continuous and tamper-proof record of transactions or data entries.
- Verification: To verify the existence of something on the blockchain, you can compare the hash of the data in question with the corresponding hash stored on the blockchain. By recalculating the hash of the data and comparing it with the recorded hash on the blockchain, you can determine if the data has been tampered with or if it matches the original entry.
Proof-of-Training-Data is a protocol on the blockchain network that allows a model trainer to demonstrate the training data used to generate a set of model weights to a Verifier in order to prove its accuracy and source.
- It is able to effectively address the challenge of verifying the authenticity of the data used to train an AI model.
- since model training typically involves a large amount of data, some of which may be private or copyrighted, it is crucial to ensure the integrity and trustworthiness of the training data
- By using blockchain technology, Proof-of-Training-Data can be implemented in a decentralized and trustworthy way.
- the blockchain acts as a permanent record that keeps track of the entire training process, including the data used
- through cryptographic techniques like ZK proofs, the model trainer can provide evidence to the Verifier without revealing the actual data
- this enables the Verifier to check the reliability of the training data by reviewing it on the blockchain and checking if the data used to train the model matches with the data on the network itself