In this article we are looking into why and how to import the Bitcoin blockchain, and also share tips and tricks. There are bits and pieces of information out there, therefore this paper is an attempt to bring them together and share our own experience.
Before diving into technical details, let’s start with the question: why import the blockchain. Hier is a list of possible reasons:
– perform analytics: after import, analytics will hopefully be more efficient, than accessing the blockchain directly, because you can query your database faster. For example, given an address what transactions were used in combination with that address.
– find malicious, fraudulent transactions: this speaks for itself, since there have been reported several cases of stolen crypto assets.
– extract data which has transaction fees.
– have multiple wallets, that connect to the blockchain directly.
– be independent from an external service, that might not have all the data you need.
– instantly compute the balance of one address.
How to read the blockchain
There are two possibilities to read Bitcoin blockchain: read directly in binary format or use bitcoin-core (free and open-source software that serves as a Bitcoin node (the set of which form the bitcoin network) and provides a Bitcoin wallet which fully verifies payments).
Directly in binary format
First option is to read the Bitcoin blockchain in binary format directly.
The big advantage is that this method is quick.
However, the big disadvantage is that this method requires a lot of work:
- your program has to be synchronized all the time with the binary format,
- It has to read the data in the proper way and,
- parse the data correctly and ensure no mistakes.
Second option is to read the Bitcoin blockchain by using bitcoin-core.
Basically, your program makes an http connection to the bitcoin-core program, called bitcoin-d (“d” stands for demon). Bitcoin-d downloads blockchain and handels RPC-requests (“RPC” means Remote Procedure Calls).
Then, you need a client to connect to the bitcoin-d. You could use bitcoin-cli(“cli” stands for command line interface), which is a command line interface for bitcoin-d, connects to bitcoin-d, request stuff, create wallets, etc. You can query the data at bitcoin-cli and get it in Jason format. Or write your own client, which we did, because bitcoin-cli has limited functionality and we needed more in-depth blockchain analysis.
For example to compute a transaction fee, you need to query data that is randomly located over the blockchain. As a database we use MySQL.
As you can see, an advantage of this method is that you get a Jason string back which is easy to parse.
However, a disadvantage is the dependency on the bitcoin-core which is not very fast when reading large amount of data.
Algorithm for Bitcoin blockchain import
Let’s list steps for Bitcoin blockchain import using bitcoin-d.
Here, watch out for two problems:
- Firstly, you will need input from other transactions that are connected to this processed transaction. We have measured that there is 60% chance that this transaction will be used again in the future. The best way to implement it is by using cache (meaning to store transactions that have been processed in the cache). We have seen speedup of 10x, from 1.5 hours to 10 min per 1000 blocks.
- The second problem is that you don’t know what transactions will be needed in advance and at some point the cache will be full. In this case, a solution would be to randomly remove 50% of transactions.
Tips and tricks
In this section we list the tips and tricks that we found useful ourseves by importing the Bitcoin blockchain.
Tip 1: Make sure that bitcoin-d is setup correctly to allow RPC connections.
Tip 2: Make sure you have the option “txindex=1” (can retrieve the data ).
Tip 3: Fast drive (SSD) is important for cache and to read the data quickly.
Tip 4: Mapping transaction to address generates a lot of data, for which it is not worth to build the database. Therefore, send these data asynchronously, while reading new blocks.
Tip 5: How to manage the cache? We use FIFO: first in first out, no longer random. Also, a technical implementation detail: it is better to use list of blocks instead of a vector (or a linked list), because removing the first element of a vector is slow for very large vectors and requires memory copy.
Tip 6: Be careful with the timeouts, sometimes a query can take a long time. Af first it might look like bitcoin-d is hanging, but it is just in a process of setting up. It will abort, if the program is taking too long.
Tip 7: For some strange reason, import of block 256961 was taking longer than 30 sec, every time we have tried. Then we found out that 30 sec was the timeout of my program. After increasing timeout to 1 min, it worked.
Here we shared our experience by Bitcoin blockchain import. Basically, what worked for us was connecting to the bitcoin-core and writing our own client, since we needed advanced analytics on the Bitcoin blockchain.
Ones you have imported Bitcoin, the import of other cryptocurrencies such as Litecoin, Bitcoin Cash should be a piece of cake. Let us know in the comments, what was your experience with the blockchain import.