📶 Numbers Everyone Should Know

Oliver Jack Dean

Partly inspired by Jeff Dean’s famous Stanford 2009-2010 Keynote “Software Engineering Advice from Building Large-Scale Distributed Systems”, which putting aside my bias - happens to be still one of my favourtie Keynotes.

In fact, estimating is still something I am not strong at doing. Well, it’s not an habit. Yet.

Slide 12
Slide 12

Jeff makes a good point here. Before deep diving into design, architecture, component libraries, dependencies, security and all of the layering of the system - think up front about performance, as performance often dictates future courses of direction. Something too slow? Already in production? Well, now you have to refactor and take things apart again. And again. Then probably again.

Gradually, this Keynote has become a cultrual artefact for many an engineer, designer, builder and tinkerer of distributed systems.

Here is the infamous “Numbers Everyone Should Know” slide:

Slide 13
Slide 13

In a homage, I am going to try and capture numbers that I find interesting. I see this post as more of a personal/internal dump of numbers, something I hope to recall and reuse longterm. Take from it what you will. I will add more overtime.

Energy

Time Units:

Time Unit Equivalent in Seconds (s)
1 nanosecond (ns) \(10^{-9}\) s
1 microsecond (us) \(10^{-6}\) s
1 millisecond (ms) \(10^{-3}\) s
1 kilosecond (ks) \(10^{3}\) s
1 megasecond (Ms) \(10^{6}\) s
1 gigasecond (Gs) \(10^{9}\) s
1 terasecond (Ts) \(10^{12}\) s
1 petasecond (Ps) \(10^{15}\) s
1 exasecond (Es) \(10^{18}\) s

SI Prefixes:

Factor Name Symbol
\(10^{9}\) giga G
\(10^{6}\) mega M
\(10^{3}\) kilo k
\(10^{-1}\) deci d
\(10^{-2}\) centi c
\(10^{-3}\) milli m
\(10^{-6}\) micro μ
\(10^{-9}\) nano n
\(10^{-12}\) pico p

Time Comparisons:

Time Unit Approximate Equivalent in Seconds (s)
1 minute 0.6 of \(10^{2}\) s
1 hour 0.36 of \(10^{4}\) s
1 day 0.86 of \(10^{5}\) s
1 week 0.60 of \(10^{6}\) s
1 year 3.2 of \(10^{7}\) s
1 decade 3.2 of \(10^{8}\) s
1 century 3.2 of \(10^{9}\) s

Light Travel:

Event Time
Light travels 300 meters 1 microsecond
Light travels 2 miles 10 microseconds
Light travels around the Earth 134 milliseconds
Light travels from the Moon to Earth 1.28 seconds
Light travels from the Sun to Earth 8.3 minutes
Light travels across our galaxy 100,000 years
Light travels from near galaxies 2.5 million years

Energy Units:

Energy Unit Equivalent in Joules (J)
1 electronvolt (eV) \(10^{-19}\) J
1 kiloelectronvolt (keV) \(10^{-16}\) J
1 megaelectronvolt (MeV) \(10^{-13}\) J
1 gigaelectronvolt (GeV) \(10^{-10}\) J
1 teraelectronvolt (TeV) \(10^{-7}\) J
1 Joule (J) \(10^{0}\) J
1 kilocalorie (dietary “calorie”) \(10^{4}\) J
1 kilowatt-hour (kW·h) \(10^{7}\) J
1 ton of TNT \(10^{10}\) J
1 megaton of TNT \(10^{13}\) J
1 kilogram of mass-energy (mc^2) \(10^{17}\) J

Energy Comparisons:

Event Approximate Energy in Joules (J)
The energy of a visible photon \(10^{-18}\) J
The energy to metabolize a glucose molecule \(10^{-17}\) J
The energy of an electron \(10^{-14}\) J
The energy to boil a nucleus \(10^{-9}\) J
The energy of a person descending 1 meter \(10^{3}\) J
The energy of a 40W bulb for 1 hour \(10^{5}\) J
The energy of a lightning bolt \(10^{10}\) J
The energy of the Krakatoa volcano explosion \(10^{18}\) J
The energy of the Lake Toba (Sumatra) volcano explosion \(10^{20}\) J
The energy of the Earth’s rotational energy \(10^{29}\) J
The energy of the Earth’s heat content \(10^{31}\) J
The energy of the Sun’s rotational energy \(10^{35}\) J
The energy of the Earth’s mass-energy (mc^2) \(10^{42}\) J

Energy kJoules:

Description Calculation
Amount of energy consumed during the execution of the benchmark Energy (kJ) = Power (W) * Time (s) / 1000

Power Consumption:

Description Measurement
Maximum power consumed during the execution of the system Measured in Watts (W)
Average power consumed during the execution of the system Measured in Watts (W)

Power Frequency Mean:

Description Range
Ensure power frequencies are within the mean range 50Hz - 60Hz ± 1%

Voltage Frequency Mean:

Description Range
Ensure voltage frequencies are within the mean range 100V, 110V, 120V, 208V, 220V, 230V or 400V ± 5%

Kaya Identity:

Variable Description
F Global CO2 emissions from human sources
P Global population
G World GDP
g Global per-capita GDP (G/P)
E Global primary energy consumption
e Energy intensity of world GDP (E/G)
f Carbon intensity of energy (F/E)

The Kaya Identity can be expressed as:

F = P * g * e * f

Performance

File Sizes:

Power Approximate Value Full Name Short Name
10 1 Thousand 1 Kilobyte 1 KB
20 1 Million 1 Megabyte 1 MB
30 1 Billion 1 Gigabyte 1 GB
40 1 Trillion 1 Terabyte 1 TB
50 1 Quadrillion 1 Petabyte 1 PB

Latency Calculations:

Unit Equivalent in Seconds (s)
1 nanosecond (ns) \(10^{-9}\) s
1 microsecond (µs) \(10^{-6}\) s = 1,000 ns
1 millisecond (ms) \(10^{-3}\) s = 1,000 µs = 1,000,000 ns

Powers of Two Scaling vs Bytes:

Power Exact Value Approx Value Bytes
7 128    
8 256    
10 1024 1 thousand 1 KB
16 65,536   64 KB
20 1,048,576 1 million 1 MB
30 1,073,741,824 1 billion 1 GB
32 4,294,967,296   4 GB

Common Data Type Sizes:

Data Type Size
int 4
float 8
boolean 4
UTF-8 character 1
UTF-8 in Chinese 3
UNIX timestamp 4

Latency for Sequential Data Fetch:

Formula Description
latency = latency_resource_1 + latency_resource_2 Latency for sequential data fetch

Latency for Parallel Data Fetch:

Formula Description
latency = max(latency_resource_1, latency_resource_2) Latency for parallel data fetch

Query per Second (Resource Fetch):

Formula Description
QPS = number of CPU cores / average time for a request in seconds Query per second (assuming peak traffic)

Scale Calculations:

Transactions Per Day Transactions Per Second
1 million ~12
5 million ~60
30 million ~360
100 million ~1200

Note: If a user generates 50 API calls during their session, then we must support ~600 transactions per second for 1 million users per day.

Peak Times Calculations:

Transactions Per Day Peak Transactions Per Second (10% Peak for 1 Hour) Peak Transactions Per Second (30% Peak for 1 Hour)
1 million ~30 ~90
10 million ~300 ~900

Data Type Sizes:

Data Type Size (Bytes)
Int32 4
Int64 8
Float 4 or 8
JavaScript boolean 4
UTF-8 Char in English 1
UTF-8 Char in other languages 1-3

Note: For languages like Chinese, consider UTF-8 characters as 3 bytes.

System Utilization and Queue Time Calculation:

Average Cycle Time System Utilization System Cycle Time Variability Arrival Rate Variability Queue Time (avg) Cycle Time (avg) Total Time Process Efficiency
5 (minutes/hours/days) 80% 0.5 0.5 5 5 10 50%

Impact of Variability on Utilization and Process Efficiency:

Variability Level Cycle Time Utilization Cycle Time Variability Arrival Rate Variability Queue Time Process Efficiency
Zero Variability 5 99% 0 0 0 100%
Moderate Variability 5 99% 0.2 0.2 20 20%
Higher Variability 5 99% 0.5 0.5 124 4%

Note: The process efficiency is calculated as the ratio of hands-on time to total time. In software development, it typically ranges from 5-15%, indicating a high impact of excessive utilization.

File sizes:

Item Size
Storing 1 GiB/month on the cloud $0.02 US
Web site of my Twitter profile (@lemire), HTML alone 296 KiB
Web site of my Twitter profile (@lemire), all data 296 KiB
Google result for ‘Canada’, HTML alone 3.9 MiB
Google result for ‘Canada’, all data 848 KiB
Node JS runtime 3.7 MiB
Size of the Java (19) runtime 164 MiB
LLVM/clang compiler+runtime 330 MiB
Boost (C++) library (source) 5.5 GiB
Go runtime 609 MiB

Potential Savings from Config Tuning:

If the cost to run a service per year is represented by (X) (in millions), then the potential savings (S) from config tuning can be calculated as:

\[S = X \times \text{{Increase Factor}} - X\]

where the Increase Factor is between 1.5 and 2 (representing the 1.5x to 2x increase in capacity).

Impact of CFS Period Tuning:

If the original CFS period is represented by (O) (default is 100ms) and the new CFS period is represented by (P), then the reduction in worst-case throttling time (R) can be calculated as:

\[R = \frac{{O - P}}{{O}} \times 100\%\]

Impact of CPU Pinning and Isolation:

If the number of threads an application currently has is represented by \(T\) and the number of CPUs requested by the application is represented by \(C\), then the reduction in parallel threads \(D\) can be calculated as:

\[D = T - C\]

CPU Time for Single Core Execution:

If (T) represents the total CPU time for executing 2414 tool executions on a single core, then:

\[T = 16 \text{ hours}\]

CPU Time for Multi-Core Execution on a Single CPU:

If (T_n) represents the total CPU time for executing 2414 tool executions on (n) cores of a single CPU, then:

\[T_n = T + 0.5(n - 1) \text{ hours}\]

This formula assumes a linear increase in CPU time as the number of cores increases, as stated in the text.

CPU Time for Multi-Core Execution on Two CPUs:

If (T_{2n}$$ represents the total CPU time for executing 2414 tool executions on (n) cores of each of the two CPUs (i.e., (2n) cores in total), then:

\[T_{2n} = T + 0.75(n - 1) \text{ hours}\]

This formula assumes that using \(n\) cores on each of the two CPUs is slower than using (n) cores on only one CPU, with the maximum difference occurring for eight cores per CPU.

Memory Bandwidth Advantage:

CPU_time = Memory_size / CPU_bandwidth

GPU_time = Memory_size / GPU_bandwidth

Latency Hiding with Thread Parallelism:

GPU_time = Max(Latency, Memory_size / (GPU_bandwidth * Number_of_threads))

Register Memory Advantage:

CPU_time = Register_memory_size / CPU_register_bandwidth

GPU_time = Register_memory_size / GPU_register_bandwidth

Matrix Multiplication Speed:

GPU_time = Matrix_size / GPU_bandwidth

NOTE:** Actual CPU time will depend on various factors including the specific tool being executed, the workload, and the behavior of the CPUs/GPUs.

AI LLMs

TPU vs GPU Performance: Accordingly, a TPU is approximately 32% to 54% faster than a GPU for training **BERT-like models. This can be represented as:

TPU_speed = GPU_speed * (1 + 0.32 to 0.54)

Impact of Data Type Size: Data type size has a significant impact on performance. For instance, TPUs would be approximately 5.3x faster when using 32-bit values compared to 16-bit values. This can be represented as:

TPU_speed_32_bit = TPU_speed_16_bit * 5.3

Potential Speedup with 8-bit Computing:** If 8-bit computing can be made to work for general models, it could lead to significant speedups for transformers. For instance, GPUs could be 3.0x faster than TPUs **with 8-bit computation. This can be represented as:

GPU_speed_8_bit = TPU_speed * 3.0

Potential Cost Savings:

Calculation Formula
Cost Savings with Concise Prompts Cost Savings = Original Cost - (Original Cost * Conciseness Factor)
Tokens to Words Ratio Number of Tokens = Number of Words * Tokens per Word Ratio
Cost Ratio of Different Models Cost of Operation = Cost of Model * Number of Operations
GPU Memory Requirements GPU Memory Requirement = 2 * Number of Parameters
GPU Memory Requirement for Output GPU Memory Requirement for Output = Number of Tokens * Memory per Token
Throughput Improvement from Batching Throughput = Number of Queries / Total Time

Where:

  • Conciseness Factor is between 0.4 and 0.9.
  • Tokens per Word Ratio is 1.3.
  • Memory per Token is 1MB.

Sources:

Back of Envelope Calcs
LLM Numbers
Physical Constants
Mathematical Constants
Physics Notations