Hashchains for Immutable Data Storage using the IOTA Tangle

7 min readSep 11, 2020

Introduction

In recent times the transaction fees on the major blockchains have exploded making one of their core usecases, immutable data storage, extremely unattractive. This has caused many people, including myself, to restart the search for alternative solutions.
The IOTA Tangle with it’s fee-less architecture would seem to be the perfect candidate, if it was not for the fact that the nodes in the network only store data for a certain amount of time(around 30 days) and than delete it to make space for new data. This allows to nodes to remain lightweight and be compatible with the targeted IoT hardware.

There are a few initiatives to bring permanent data storage to IOTA, the two most notable are: the official IOTA permanode “Chronicle” which will permanently store all transactions sent to the network; and Olaf van Wijk’s “AION” which is a selective permanode that will be able to only store transactions selected by the user, together with a proof that they are valid.

Want to read this story later? Save it in Journal.

Chronicle is the more mature of the two projects and will be a very useful solution for industry applications. It also will allow for companies to provide “storage-as-a-service” to more casual user.
AION is still in research, but once production ready it will be the most ideal solution for permanent data storage as it allows user to only store the data they care about. And as such it will most likely make other IOTA perma-storage solutions obsolete (the one presented here included).

The Hashchain mechanism was developed as a “quickfix” for the meantime, it does NOT solve the problem of permanently storing IOTA transactions, but it does enable verifiably-immutable storage of selected data without fees.

The Hashchain mechanism

The Hashchain is a solution to create a chain of proofs about the integrity of data stored in a database. It is heavily inspired by the principles and mechanisms of traditional blockchains as it uses a chain of data-blocks linked by hashes, but in a much more simplistic way.

To store data on the Hashchain it is sent to the server over an API (public or private) where it is timestamped and collected in the “mempool”.
Every interval T (can be seconds, minutes or days depending on usecase) a “block” is created by hashing all of the data in the mempool together with metadata like the current timestamp and the hash of the previous block. This new block is sent to the IOTA Tangle in a 0-value transaction, the returned transaction hash is appended to the block header and everything is stored in the local database. From here it can be queried through an API (public or private).

This process creates a chain of data-blocks in the database that, initially, will also be fully stored on the Tangle and the immutability of the data in the database can be verified by matching it with the data on Tangle.
But after some time the data will be deleted by all nodes in the IOTA-Network and no longer be available on the Tangle. Therefore it will no longer be possible to verify the integrity of the entire data directly. This is the point where the Blockchain-like architecture of the Hashchain come into play, as under the right circumstances, it will still be possible to verify the integrity of the data in the database.

Verifying the Hashchain

In order to verify that some data in the database is valid (not altered since first added) we need to start at the genesis block and first hash the data and metadata of each block to verify that the block-hashes themselves are valid. If so, we can move on to verify that the “previous hash” stored in each block matches the hash of the block issued before it. If this is also the case for all blocks in the chain we have a cryptographically “correct” Hashchain.
But given that it is stored on a single server, with not proof-of-work involved, it would still be possible for the database owner to just create a correct chain at any point in time with any data he likes.

To verify that a Hashchain is not only correct but also “valid” we need to find a block that contains a transaction-hash to a transaction that is still stored “deep” in the Tangle and that contains the same data, metadata and block-hash as we found in the database. If we find this we can be sure that the Hashchain is valid at least up to that block.
The reason for this is that if the data in any block before would have been changed, than its block-hash would also change, and so would the “previous hash” in the subsequent block, and that would in turn change the block-hash of that block. The change would therefore be “propagated” forward in the Hashchain until a block is reached that is still on the Tangle and the Tangle itself is secured by proof-of-work (and the coordinator, for now), so in order to change a transaction that is deep in the Tangle, i.e. that has been referenced by many transaction that came afterwards, one would need to rebuilt the entire Tangle, which is assumed to not be possible.

So if the Tangle is immutable, and the Tangle contains a valid cryptographic-link, by means of hashes, to the data stored in the database one can guarantee that said data has also stayed unaltered.

It is important to note that the transaction used to create the link needs to be deep enough in the Tangle (currently this means after the last milestone), as if it is not its could still theoretically be changed.
If a very small T parameter is chosen, a few seconds for example, than one should not trust the most recent blocks.
If a larger T parameter is chosen, a few hours or days for example, than one should not trust the last block if it has just been issued a few seconds ago.

Avoiding Forks

Similarly as with blockchains it could be possible that the database owner decides to fork the Hashchain and for some time build multiple valid hash-chains containing different, maybe even conflicting, data.
The database owner could than show different users different valid chains, and therefore different data, and it would not be possible for them to know.
To avoid this all the IOTA-Transactions should be sent to the same Address, and this address should be provided together with the Hashchain, as a sort of Chain-ID. This way, in if the database owner tries to fork the chain, the verifier can easily detect that there is more than one chain being written to that address, a be suspicious of the database owners’s intent.

The limitation of this is that if the database owner was building multiple Hashchains in the past and then dropped all except one, which he is currently building, and the transactions of the past fork(s) are no longer in the Tangle, it is impossible to know that they ever existed.

This means that the Hashchain mechanism can guarantee that the data in the chain has not been changed since it was added and that it is the only version of the data currently being built upon, i.e there are currently no active forks. But it can not guarantee that in the past there have never been any forks.

Prototype

I developed a Proof-of-Concept implementation of the Hashchain that can be found here: https://github.com/AleBuser/iota-hashchain.

The Prototype is written in stable Rust. It uses the Diesel crate and a PostgreSQL database to store the Hashchain. The server receiving and answering requests is built with the Actix web framework. The Interactions with the IOTA Tangle are made with the (old) iota-lib-rs crate.

Limitations of the Prototype

The throughput of the Hashchain prototype is depending on the T parameter, which itself is limited by the amount of IOTA transactions that can be issued.

In the current implementation the entire mempool content is sent in a single IOTA transaction, this means that if too much data is added to the mempool over a short period of time it can become to big to be sent in a single transaction and it will be sent in a bundle. The Node will only store the transaction hash of the latest bundle. Which won’t allow the built-in verification script to validate all of the data.

This could be fix by using a more complex verifier that can recompose the original block from the multiple transactions in the bundle.