2 minute read

Generating CIDv1 IPFS Content Identifiers in C: A Deep Dive

The InterPlanetary File System (IPFS) is a peer-to-peer hypermedia protocol that aims to make the web faster, safer, and more open. Each file or piece of data stored on IPFS is represented by a unique string of characters called a Content Identifier (CID). In this article, we’ll delve deeper into the construction of CID version 1 (CIDv1) using the provided C code.

The Structure of CIDv1

CIDv1 has a specific structure that can be represented as:

<cidv1> ::= <multibase-prefix><multicodec-cidv1><multicodec-content-type><multihash-content-address>

Where:

  • <multibase-prefix>: Specifies the base encoding being used, such as base32, base58, etc. For our code, we’re using base32, which has a ‘b’ prefix.
  • <multicodec-cidv1>: This indicates that the CID is version 1.
  • <multicodec-content-type>: Specifies the type of content being addressed, such as raw data, IPFS block, etc. In our case, we’re addressing raw data.
  • <multihash-content-address>: Contains the hash function used (e.g., SHA-256) followed by the length and the actual hash value of the content.

Code Walkthrough

Now, let’s go through the create_cid_v1_from_string function, which generates a CIDv1 from a given string:

char* create_cid_v1_from_string(const char* data) {
    ...
}

Step 1: Compute the SHA-256 hash

The first step is to compute the SHA-256 hash of the input string:

uint8_t hash[SHA256_DIGEST_LENGTH];
sha256(data, strlen(data), hash);

Step 2: Construct the CID

Next, we construct a byte array to hold our CID:

uint8_t pre_encoded_cid[SHA256_DIGEST_LENGTH + 4];

The extra 4 bytes are for the version, codec, hash function, and hash length:

pre_encoded_cid[0] = 0x01;  // cidv1
pre_encoded_cid[1] = 0x55;  // raw multicodec
pre_encoded_cid[2] = 0x12;  // sha256
pre_encoded_cid[3] = 0x20;  // sha256 digest length

Then, we copy the actual hash value:

memcpy(pre_encoded_cid + 4, hash, SHA256_DIGEST_LENGTH);

Step 3: Base32 Encode the CID

Before encoding, we calculate the required buffer size:

size_t input_length = SHA256_DIGEST_LENGTH + 4;
size_t buffer_size = (input_length + 4) / 5 * 8 + 1;

Then, we allocate memory for the encoded CID and add the ‘b’ prefix for base32:

char* base32_cid = (char*)malloc(buffer_size + 2);
base32_cid[0] = 'b';

Finally, we encode our CID and convert it to lowercase:

base32_encode(pre_encoded_cid, input_length, base32_cid + 1, buffer_size, BASE32_ALPHABET_RFC4648);
string_to_lowercase(base32_cid);

The resulting base32_cid is our CIDv1, ready to be used!

Important Notes

While the provided code offers a concise way to generate CIDv1, there are certain aspects not covered in this snippet. For instance, error handling is minimal, and memory management is essential. The user is responsible for freeing the allocated memory after use.

Source and References

The provided code is part of the rddl-network librddl library. For a more comprehensive understanding and to explore additional functionalities, feel free to check out the linked GitHub repository.

Conclusion

Understanding the intricacies of CIDv1 creation is essential for developers working with IPFS or other similar distributed systems. By breaking down the CIDv1 structure and walking through the code, we hope to provide a clearer picture of this crucial process. Always remember to handle memory correctly and responsibly when working in C.