Storage Consumption
We’re often asked “how many crates to do I need to hold x messages”. This is a really tricky question to answer because of how crates are measured and requires a deep-dive into how computers store text.
Turbonerds: We’re aware this is an over-simplification. If you catch anything glaringly wrong, let us know. The goal of this is to make how crates are utilized (specifically in the context of message storage) more understandable.See the “Further Learning” section below for more qualified explanations, if you’re interested.
How many crates to do I need to hold X messages?
TL;DR: There’s no way to know for sure, since a number of factors (going back to the design of text encoding on modern computers) can influence the amount of storage you need. Consider using the Message Cache instead if it fits your use case.
In order to answer this, we need to step back and explain how computers store text. Computers only understand numbers—specifically binary (0s and 1s). In order to get computers to understand text, we need to encode the text as numbers, which can then be converted to binary.
Imagine that you are making a computer for an English-speaking country. You assign the letter A to the number 1, and continue all the way to Z (represented as 26).
This is an oversimplification. A is not represented by 1 in most modern computers. There are control characters, punctuation marks, capital vs lowercase, and much more, and ASCII doesn’t start at 1 for A. Take a look at an ASCII table or list of Unicode characters (linked in the Further Reading section) for the actual encoding scheme.
We’re back to our imaginary world where A is represented by 1 and Z by 26. In binary, 1 is 0001
and 26 is 11010
. This doesn’t use up too much space so far—just 5 bits to represent A through Z.
Bits in this context means the number of 0s and 1s needed to represent a number, regardless of what each bit is set to. For example, 1101
and 1001
are 4-bit numbers, while 10110
and 10011
are 5-bit numbers.
Bytes are sets of 8 bits, and are the base unit of kilobytes, megabytes, gigabytes, terabytes, etc. Computer storage is measured based on the number of 0s and 1s can be stored.
Many decades later, you’ve expanded your English-speaking computer to include all the world’s languages. Let’s say you’re supporting 154,998 characters. In binary, the last character would be 100101110101110110
, which is an 18-bit number. The trick is that not every character takes 18 bits. Your character set still includes letters like B, which would be represented by 2
, a 2-bit number (10
in binary).
Computers can be programmed to dynamically use the right number of bits based on what is needed to store a given character. This means that not all characters use the same amount of storage. Some characters (such as English/Latin letters, which were added to specifications like Unicode before other characters) use less storage space than others.
Coming back to your Discord bot, each Inventor crate roughly corresponds to 100 bytes of storage. Depending on the length of the message itself, the languages used, and other factors (such as how efficiently you store metadata), the amount of storage used by a single message can vary significantly.
Message Cache
We would highly suggest using the Message Cache feature if it serves your use case (getting the contents of edited and deleted messages alongside the message edited/deleted events from Discord).
Message cache stores 30 messages per crate, regardless of the size of the message, and allows you to easily set a crate limit and retention timeout in days.
We designed it so you don’t have to worry about message length or size by using slower infrequent-access storage.
You can enable message cache here. Once enabled, you can use the previous message content output in the deleted/edited messages triggers.
Further Learning
If you’re interested in how text storage works on computers, here’s some more resources you may find interesting:
- Characters, Symbols and the Unicode Miracle - Computerphile
- Unicode - Wikipedia
- List of Unicode Characters - Wikipedia
- ASCII - Wikipedia