Artificial Intelligence
Artificial intelligence is another rapidly evolving technology that has become a driving force in business. However, there is a looming threat to this industry: centralization.
AI today is primarily controlled by the likes of OpenAI, Meta, Google, and other massive corporations. While they have been effective at bringing this product to market and making it readily available to the public, private ownership creates a veil of secrecy about how AI models are being trained. When you ask ChatGPT a question and it gives you a response back, there is no way to verify exactly where the data it’s using came from or how accurate it is.
This training secrecy can give rise to issues such as:
- Inherent social/political bias in AI responses (like when Google Gemini made white historical figures as people of color)
- Misleading claims or information about specific topics (like when Google AI said it’s healthy to eat rocks)
- High amounts of censorship (like how multiple AI engines refuse to answer politically motivated questions)
The quality of the output relies directly on the quality of the data it is being trained on - garbage in, garbage out.
This is exactly where decentralization can play a huge role in benefitting AI: decentralized data markets.
Currently, acquiring training data to make high quality models is a complicated process that many people are unable to undertake. Companies like OpenAI are facing dozens of lawsuits where they are being accused of not fairly compensating those who’s data they used to train ChatGPT. Some companies are paying out millions of dollars to buy up user data to train their own models - like how Google cut a $60 million per year deal with Reddit to use Reddit’s user data to train their AI models.
Smaller companies and individuals that want to train their own AI models generally are not able to gain access to this huge scope of data, creating a huge advantage for large companies that possess the resources required to do so.
Source: OECD AI Policy Observatory
Blockchain can create a fair and open market for buying and selling AI training data. Just like how decentralized exchanges are made for a peer-to-peer system of swapping cryptocurrencies, they can be repurposed for use as an AI training data exchange.
Companies such as Kaggle already offer dataset marketplaces where users can buy and sell their own datasets on various topics, but blockchain can take this a step further by creating immutable proof of exactly where the data in that dataset came from. Blockchain is able to accomplish this by publishing a public data record on-chain that can track the origins of any dataset. Each on-chain dataset will come with metadata that shows exactly who published this data, where it originally came from, and any other required information to more effectively credit the source.
This data tracking system has the potential to make AI models that are not only much more accurate in their responses, but that are also able to more effectively credit those that created the data it is trained on.
For example, if you go to GPT4 and ask it to write a paper on frogs and it generates a response, you have no idea whether the data it used to generate that response came from a published scientific journal or from Wikipedia. With an AI model backed by blockchain, each response could contain metadata that shows the exact sources it pulled data from to generate that response. This would create a much more effective data reference system, where users are still able to use AI to generate responses but then they could back those responses with the actual data from which that response was generated.
💡 Touch Grass
Grass Protocol is a L2 rollup that acts as a data scraper to generate AI training data. Here’s how the process works:
- Grass nodes operate by collecting data from the Internet and scraping it to turn it into proper organized datasets
- Metadata that contains information on the origin of the data, the time it was collected, and which node collected it is attached to each new dataset to create a system where all of the data collected can be correctly referenced when used
- Nodes are then paid in cryptocurrency for their work, which acts as the incentive mechanism
Check out the Grass Protocol blog to learn more about how this process works.
While training data is a critical part of AI, the ability to actually train an AI model is equally as important.
Training AI models is a very compute intensive process that requires expensive hardware and a lot of energy. Sam Altman even hinted that GPT4 cost over $100 million to train, which shows the massive amount of effort that needs to go into training high quality models. Luckily, blockchain does not only help with creating an effective sourcing mechanism for training data, but it can also be used to distribute the cost and effort of training an AI model.
Instead of relying on a huge data center containing millions of dollars worth of computer hardware, blockchain utilizes its decentralized properties to split up the workload across an entire network. This system lets anyone participate in training the model by delegating their own computing power to the network and receiving cryptocurrency in exchange. To participate, users would simply install the designated software on their device and let it run, similar to how Bitcoin mining works.
💡 Bittensor
An example of a popular project already following this process is Bittensor, a network that is pioneering the concept of decentralized AI.
Bittensor works in a very unique way - it is split up into a bunch of smaller networks, or subnets, where each subnet operates individually and is responsible for a specific aspect of AI modeling. For example, Subnet 1 is responsible for text prompting, Subnet 7 is responsible for storage solutions, and Subnet 27 is responsible for computing. Subnet 0 acts as the main subnet of the network and is responsible for all of the governance aspects of the Bittensor network. All the subnets work together to create a censorship resistant, open source, and community owned AI modeling system that is available for anyone to use.
Bittensor has its own token called TAO, which is the main currency of the network. Node operators are paid out in TAO, and users can directly stake TAO on subnets to contribute to the security of the network and earn interest.
Check out the Bittensor documentation for a more detailed breakdown of this system.
To dive deeper into the role of cryptocurrency and blockchain within AI, check out this report by MV Capital on Crypto x AI.
Source: Bittensor
Blockchain within AI offers new and innovative solutions that can greatly enhance our ability to create accurate, non-biased, and censorship resistant AI models. The open and collaborative nature of blockchain lets anyone participate, and more importantly earn money, by contributing computing power or liquidity to the network. It is critical for AI to follow this route and to avoid becoming a technology solely controlled by powerful companies.
📋 Practice Question