On IPLD
This was in my drafts from a few months back, right when I got really excited about IPFS, content addressing data, and the potential future applications that will be built on the protocol.
The InterPlanetary File System, or IPFS, is a protocol and network designed to create a permanent and decentralized method of storing and sharing files. It is similar to the way a BitTorrent swarm exchanges data, except that it is entirely decentralized. Data is addressed by a cryptographic hash, which allows for data to be stored and shared by anyone in the network. Data is self-authenticating, which allows for data to be verified and the integrity of the data to be ensured. Furthermore, data in the IPFS network is distributed in a way that mimics the structure of the internet. This means that the data is replicated across multiple computers and nodes that are connected to the internet. It means that the network does not depend on any centralized server for storage or sharing of files. Instead, the network depends on the peers that are connected to the network, which then store and share the data. All nodes on the network have the same data and same files, and all nodes work together to ensure that the network continues to function.
Content addressing is a true computational advancement in the way that we think about adding and retrieve content on the web. We can take existing databases and use the various parts of the IPFS protocol to build clusters of nodes that are serving content in the form of IPLD structures.
IPLD enables futureproof, immutable, secure graphs of data the essentially are giants Merkle Trees. The advantages of converting a relational database or key value db into a merkle tree are endless. The main difference between IPFS and IPLD is that IPFS is concerned with the storage of large data, or files, whereas IPLD is concerned with the storage of data structures.
The data and named links gives the collection of IPFS objects the structure of a Merkle DAG — DAG meaning Directed Acyclic Graph, and Merkle to signify that this is a cryptographically authenticated data structure that uses cryptographic hashes to address content.
This protocol enables an entirely new way to search and retrieve data. It is no longer about where a file is located on a website. It is now about what exact piece of data you are looking for. If I send you an email with a link in it and 30 days later you click the link, how can you be certain that the data you are looking at is the same as what I original sent you? You can’t.
With IPLD you can use content addressing to know for certain that a piece of content has not changed. You can can traverse the IPLD object and seamlessly pick out piece of the data. By using IPLD once that data is locally cached you can use the application offline.
You can work on an application with other peers around you as well using a shared state. But why is this is important for businesses?
Content addressing for companies will ensure a number of open standards. We can now take fully encrypted private networks that are content addressed.
This is a future proof system. Hashes that are currently used in databased can be broken but now we have multi-hashing protocols.
We can build blockchains that use IPLD and libp2p.
The IPLD resolver uses two main pieces; the bitswap and the blocks-service.
Bitswap is transferring blocks and blocks-services is determining what needs to be fetched based on what is currently in the local cache and what needs to be fetched. This prevents duplication and increase efficiencies in the system.
We will be creating a resolver for the enterprise that enables them to take their existing database tables and convert them into giant merkle trees that are interoperable. IPLD is like a global adapter for cryptographic protocols.
Creating the Enterprise Forrest
Here is where it gets interesting. The enterprise uses a few different types of database; relational SQL databases, noSQL distributed databases.
Salesforce is a another type of database that we can take and convert into a Merkle Tree.
S3, is another type of database,
I call this tables to trees.
We are going to take systems that are onPrem siloed or centralized to a cloud provider and turn them into merkle trees using IPLD content addressing.
The IPLD resolver is an internal DAG API module:
We can create a plug and play system of resolvers. This is where a company can take their existing relational database and keep it.
We will resolve the database and run a blockchain in parallel. This blockchain will be built using two protocols that are from the IPFS project: ipld and libp2p
The IPLD Resolver for the enterprise will consist of
.put
.get
.remove
.support.add
.support.rm
We will take any enterprise database and build out the content addressed merkle tree on it.
Content Addressing enterprise content on IPFS-Clusters
Enterprise can consist of 10,000 nodes 50,000 nodes 100,000 nodes and the IPLD object has to be under 1 MB.
That all will be hosting the merkle tree mirror of the relational database.
This can also enable offline operation for the network. Essentially they have their own protocol that is mirroring their on-premise system.
We will be starting with dag-mySQL, dag-noSQL, dag-apex
The MySQL hash function that exist on the on premise system stays when implemented. If that hash is ever broken there is no way to upgrade the system with up completely migrating it.
This makes data migration a lot easier or not even necessary in the future. Once the data is content addressed and creates the merkle tree, we can then start traversing the data.
We will also build interfaces that can interact with the IPLD data
IPLD is a format that can enable version control. The resolver will essentially take any database, any enterprise implementation, and convert it into a merkle tree.
We are essentially planting the seeds (product) and watering them (services). Once these trees are all in place that can communicating because they are all using the same data format.
Future Proofing your Business
We are creating distributed, authenticated, hash-linked data structures. IPLD is a common hash-chain format for distributed data structures.
Each node in these merkle trees will have a unique Content Identified – a format for these hash-links.
This is a database agnostic path notation any hash – any data format.
This will have a multihash – multiple cryptographic hashes, multicodec – multiple serialization formats, multibase – multiple base encodings
Again, why is the important for businesses? The most important is transparency and security, this is a tamper proof, tamper evident database that can be shared, traversed, replicated, distributed, cached, encrypted and you know now exactly WHAT you are linking to, not where. You know which hash function to verify with. You know what base it is in.
This new protocol enables cryptographic systems to interoperate over a p2p network that serves hash linked data.
IPFS 0.4.5 includes that dag command that can be used to traverse IPLD objects.
Now to write the dag-sql resolver.
Take any existing relational database and you can now traverse that database content addressing.
Content Addressing your database to a cluster of IPFS Cluster nodes on a private encrypted network.
Deterministic head of the cluster then writes new entries.
We use the Ethereum network to assign a key pair for your users to leverage the mobile interface. You can sign in via fingerprint or by facial recognition using the Microsoft Cognitive Toolkit. Your database will run in parallel , you will keep your on premise system and have a content addressed. Content Addressing a filesystem or any type of database creates a Merkle DAG. With this Merkle DAG we can format your data in a way that is secure, immutable, tamper proof, futureproof and able to communicate with other underlying network protocols and apllication stacks. We can effectively create a blockchain network out of your exisiting database the runs securely on a cluster of p2p nodes. I am planting business merkle dag seeds in the merkle forrest. Patches of these forrest will be able to communicate with other protocols via and hash and in any format.
This is the way that the internet will work going into the future. A purely decentralized web of business trees.
Continued:
On Graph Data Types
DAG:
In a graph model, each vertex consists of:
A unique identifier a set of outgoing edges a set of incoming edges a collection of properties (key-value pairs) Each edge consists of:
A unique identifier The vertex at which the edge starts (the tail vertex) The vertex at which the edge ends ( the head vetex) A label to describe the kind of relationahship between the two vertices A collection of properties (key-value pairs) Important aspects of the model:
Any vertex can have an edge connecting it with an other vertex. There is no schema that restricts which kinds of things can or cannot be associated.
Given an vertex, you can efficiently find both its incoming and its outgoing edges, and thus traverse the graph – ie. follow a a path through a chain of vertices- both forward and backward. ie ie why you can traverse a hashed blockchain with a resolver.
By using different lables for different kinds of relationships, you can store several different kinds of information in a single graph, while still maintaining a clean data model.