3 LLM Cost Model

8.17.0.6

3 LLM Cost Model🔗ℹ

package: llm-lib

llm lang integrates several cost models of LLMs to report training and inference costs after each query. This feedback is reported after each prompt is sent to the llm-lang-logger as log-level/c 'info, topic 'llm-lang. The intention is to ensure the developer is given feedback about the potential resource cost of their query to encourage better prompt-engineering practice. The raw data is also available through current-carbon-use, current-power-use, and current-water-use.

The training costs are based on the work of [faiz2023]. A copy of the tooling, with additional data and estimates used to generate the training costs are included in the repository for llm lang.

The inference models are based on the work of [wilkins2024] and [wilkins2024a]. These estimates are even more rough, but unfortunately the best I can do with non-local LLMs.

struct
(struct inference-cost-info ( input-tokens
output-tokens
prompt-duration
response-duration))
  input-tokens : natural-number/c
  output-tokens : natural-number/c
  prompt-duration : natural-number/c
  response-duration : natural-number/c

Information related to the prompt and response of a single inference query, given as input to a gen:kwh-model to compute the power cost of the query.

syntax
gen:kwh-model

A generic interface that defines a model of the KWh consumption of an LLM inference. Any such model defines a method model->kwh.

procedure
(model->kwh model info) → natural-number/c
model : gen:kwh-model
info : inference-cost-info?

Returns the KWh consumption of the query specified by info, according to the model model.

struct
(struct wilkins-inference-model ( alpha-k-s-0
alpha-k-s-1
alpha-k-s-2))
  alpha-k-s-0 : number?
  alpha-k-s-1 : number?
  alpha-k-s-2 : number?

A gen:kwh-model specifying the consumption according to Wilkins’ model [wilkins2024, wilkins2024a]. This model is parameterized by a system S, and 3 magic numbers computed experimentally. Wilkins gives a table of computed numbers for some common LLMs and 1 system, the AMD+A100. The magic numbers are highly correlated with the parameter count of the model, and we use that the guestimate the values for various backends.

struct
(struct time-avg-inference-model (kw))
kw : natural-number/c

A gen:kwh-model specifying the consumption as a function of the wattage of the system and the duration of the query. This is more accurate if you know the wattage of the system, such as when running locally, but difficult to determine for a proprietary hetergeneous system in some data center.

struct
(struct model-cost-info ( model-name
query-tco2/kwh
query-water-L/kWh
training-water-L/kWh
training-tco2
training-kwh
inference-model))
  model-name : (or/c string? symbol?)
  query-tco2/kwh : natural-number/c
  query-water-L/kWh : natural-number/c
  training-water-L/kWh : natural-number/c
  training-tco2 : natural-number/c
  training-kwh : natural-number/c
  inference-model : gen:kwh-model

The cost information for a particular LLM.

model-name is used for debugging and reporting, and represents the model’s name to a developer.

query-tco2/kwh represent the metric tons of CO2 used to produce a KWh during an inference query. This information is specific the power generation for the power used to the language model during inference.

query-water-L/kWh represents the litres of water used per KWh during an inference query. This is the on-site scope 1 usage, which you can think of as the water consumptions of cooling aparatus in a data centre. It’s probably 0 for a local LLM. See [li2023] for more info.

training-water-L/kWh represents the litres of water used per KWh during training of the LLM. This is meant to be the on-site scope 1 usage.

training-tco2 represents the metric tons of CO2-equivalent emitted during training. This captures the power generation and emboddied carbon of the GPUs.

training-kwh represents the kilowatt-hours used to power computation while training the LLM.

inference-model represents the gen:kwh-model used for inferences to this LLM.

parameter
(current-model-cost-log) → (list/c cost-log-entry?)
(current-model-cost-log log) → void?
log : (list/c cost-log-entry?)
= '()

A list of cost-log-entrys for the current LLM session.

struct
(struct cost-log-entry (model-cost-info inference-info))
model-cost-info : model-cost-info?
inference-info : inference-cost-info?

An entry in the inference cost log current-model-cost-log, recording the cost model information for the LLM backend, and the cost information for a given query.

parameter
(current-model-cost-logger)
→ (-> (list/c cost-log-entry?) (list/c cost-log-entry?))
(current-model-cost-logger logger-f) → void?
logger-f : (-> (list/c cost-log-entry?) (list/c cost-log-entry?))
= log-logger

A parameter defining how to write the log after each call to prompt!. The logger-f should return a (possibly modified) log, in order to be able to compose loggers.

procedure
(log-logger log) → (list/c cost-log-entry?)
log : (list/c cost-log-entry?)

A logger that writes the log, as a string, to the llm-lang-logger, with log-level/c 'info and topic 'llm-lang, returning the log unchanged. Uses log->string to produce a readable string representation of the log and report contextualizing information.

parameter
(current-cost-port) → output-port?
(current-cost-port port) → void?
port : output-port?
= (current-error-port)

The port to which the current-model-cost-log is written, by string-stderr-model-cost-logger, after each call to prompt!.

procedure
(string-stderr-model-cost-logger log)
→ (list/c cost-log-entry?)
log : (list/c cost-log-entry?)

A logger that writes the log, as a string, the current-cost-port, which is expected to be the standard error port, returning the log unchanged. Uses log->string to produce a readable string represetation of the log.

procedure
(log->string log) → string?
log : (list/c cost-log-entry?)

Render the log as a readable string representation, with information contextualizing the resource cost of the log.

procedure
(log-model-cost! entry) → void?
entry : cost-log-entry?

Prepend entry to the current-cost-log.

procedure
(current-carbon-use [mode]) → number?
mode : (or/c 'queries 'training) = 'queries

Reports the current estimate of CO2 used by llm-lang backends, in tons of CO2. The mode 'queries reports the CO2 used by the queries of this session, while 'training reports the estimate of the CO2 used in one-time training costs used to train the backends used in this session.

procedure
(current-power-use [mode]) → number?
mode : (or/c 'queries 'training) = 'queries

Reports the current power used by llm-lang backends, in kWh. The mode 'queries reports the power used by the queries of this session, while 'training reports the estimate of the power used in one-time training costs used to train the backends used in this session.

procedure
(current-water-use [mode]) → number?
mode : (or/c 'queries 'training) = 'queries

Reports the current water used by llm-lang backends, in litres. The mode 'queries reports the water used by the queries of this session, while 'training reports the estimate of the water used in one-time training costs used to train the backends used in this session.

Bibliography🔗ℹ

[faiz2023]		Faiz, Ahmad and Kaneda, Sotaro and Wang, Ruhan and Osi, Rita and Sharma, Prateek and Chen, Fan and Jiang, Lei, “LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models,” ICLR, 2023. https://doi.org/10.48550/ARXIV.2309.14393
[wilkins2024a]		Wilkins, Grant and Keshav, Srinivasan and Mortier, Richard, “Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems.” 2024. https://doi.org/10.48550/ARXIV.2407.04014
[wilkins2024]		Wilkins, Grant, “Online Workload Allocation and Energy Optimization in Large Language Model Inference Systems.” 2024. https://grantwilkins.github.io/gfw27_project.pdfMSc. Thesis
[li2023]		Li, Pengfei and Yang, Jianyi and Islam, Mohammad A. and Ren, Shaolei, “Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models.” 2023. https://doi.org/10.48550/ARXIV.2304.03271

1	LLM API Library
2	LLM Compile-time Metaprogramming
3	LLM Cost Model
4	LLM Backends