CIDv1 with C and more
Generating CIDv1 IPFS Content Identifiers in C: A Deep Dive
The InterPlanetary File System (IPFS) is a peer-to-peer hypermedia protocol that aims to make the web faster, safer, and more open. Each file or piece of data stored on IPFS is represented by a unique string of characters called a Content Identifier (CID). In this article, we’ll delve deeper into the construction of CID version 1 (CIDv1) using the provided C code.
The Structure of CIDv1
CIDv1 has a specific structure that can be represented as:
<cidv1> ::= <multibase-prefix><multicodec-cidv1><multicodec-content-type><multihash-content-address>
Where:
<multibase-prefix>
: Specifies the base encoding being used, such as base32, base58, etc. For our code, we’re using base32, which has a ‘b’ prefix.<multicodec-cidv1>
: This indicates that the CID is version 1.<multicodec-content-type>
: Specifies the type of content being addressed, such as raw data, IPFS block, etc. In our case, we’re addressing raw data.<multihash-content-address>
: Contains the hash function used (e.g., SHA-256) followed by the length and the actual hash value of the content.
Code Walkthrough
Now, let’s go through the create_cid_v1_from_string
function, which generates a CIDv1 from a given string:
char* create_cid_v1_from_string(const char* data) {
...
}
Step 1: Compute the SHA-256 hash
The first step is to compute the SHA-256 hash of the input string:
uint8_t hash[SHA256_DIGEST_LENGTH];
sha256(data, strlen(data), hash);
Step 2: Construct the CID
Next, we construct a byte array to hold our CID:
uint8_t pre_encoded_cid[SHA256_DIGEST_LENGTH + 4];
The extra 4 bytes are for the version, codec, hash function, and hash length:
pre_encoded_cid[0] = 0x01; // cidv1
pre_encoded_cid[1] = 0x55; // raw multicodec
pre_encoded_cid[2] = 0x12; // sha256
pre_encoded_cid[3] = 0x20; // sha256 digest length
Then, we copy the actual hash value:
memcpy(pre_encoded_cid + 4, hash, SHA256_DIGEST_LENGTH);
Step 3: Base32 Encode the CID
Before encoding, we calculate the required buffer size:
size_t input_length = SHA256_DIGEST_LENGTH + 4;
size_t buffer_size = (input_length + 4) / 5 * 8 + 1;
Then, we allocate memory for the encoded CID and add the ‘b’ prefix for base32:
char* base32_cid = (char*)malloc(buffer_size + 2);
base32_cid[0] = 'b';
Finally, we encode our CID and convert it to lowercase:
base32_encode(pre_encoded_cid, input_length, base32_cid + 1, buffer_size, BASE32_ALPHABET_RFC4648);
string_to_lowercase(base32_cid);
The resulting base32_cid
is our CIDv1, ready to be used!
Important Notes
While the provided code offers a concise way to generate CIDv1, there are certain aspects not covered in this snippet. For instance, error handling is minimal, and memory management is essential. The user is responsible for freeing the allocated memory after use.
Source and References
The provided code is part of the rddl-network librddl library. For a more comprehensive understanding and to explore additional functionalities, feel free to check out the linked GitHub repository.
Conclusion
Understanding the intricacies of CIDv1 creation is essential for developers working with IPFS or other similar distributed systems. By breaking down the CIDv1 structure and walking through the code, we hope to provide a clearer picture of this crucial process. Always remember to handle memory correctly and responsibly when working in C.