3 LLM Cost Model
(require llm/cost-base) | package: llm-lib |
llm lang integrates several cost models of LLMs to report training and inference costs after each query. This feedback is reported after each prompt is sent to the llm-lang-logger as log-level/c 'info, topic 'llm-lang. The intention is to ensure the developer is given feedback about the potential resource cost of their query to encourage better prompt-engineering practice. The raw data is also available through current-carbon-use, current-power-use, and current-water-use.
The training costs are based on the work of [faiz2023]. A copy of the tooling, with additional data and estimates used to generate the training costs are included in the repository for llm lang.
The inference models are based on the work of [wilkins2024] and [wilkins2024a]. These estimates are even more rough, but unfortunately the best I can do with non-local LLMs.
struct
(struct inference-cost-info ( input-tokens output-tokens prompt-duration response-duration)) input-tokens : natural-number/c output-tokens : natural-number/c prompt-duration : natural-number/c response-duration : natural-number/c
syntax
procedure
(model->kwh model info) → natural-number/c
model : gen:kwh-model info : inference-cost-info?
struct
(struct wilkins-inference-model ( alpha-k-s-0 alpha-k-s-1 alpha-k-s-2)) alpha-k-s-0 : number? alpha-k-s-1 : number? alpha-k-s-2 : number?
struct
(struct time-avg-inference-model (kw))
kw : natural-number/c
struct
(struct model-cost-info ( model-name query-tco2/kwh query-water-L/kWh training-water-L/kWh training-tco2 training-kwh inference-model)) model-name : (or/c string? symbol?) query-tco2/kwh : natural-number/c query-water-L/kWh : natural-number/c training-water-L/kWh : natural-number/c training-tco2 : natural-number/c training-kwh : natural-number/c inference-model : gen:kwh-model
model-name is used for debugging and reporting, and represents the model’s name to a developer.
query-tco2/kwh represent the metric tons of CO2 used to produce a KWh during an inference query. This information is specific the power generation for the power used to the language model during inference.
query-water-L/kWh represents the litres of water used per KWh during an inference query. This is the on-site scope 1 usage, which you can think of as the water consumptions of cooling aparatus in a data centre. It’s probably 0 for a local LLM. See [li2023] for more info.
training-water-L/kWh represents the litres of water used per KWh during training of the LLM. This is meant to be the on-site scope 1 usage.
training-tco2 represents the metric tons of CO2-equivalent emitted during training. This captures the power generation and emboddied carbon of the GPUs.
training-kwh represents the kilowatt-hours used to power computation while training the LLM.
inference-model represents the gen:kwh-model used for inferences to this LLM.
parameter
(current-model-cost-log log) → void? log : (list/c cost-log-entry?)
= '()
struct
(struct cost-log-entry (model-cost-info inference-info))
model-cost-info : model-cost-info? inference-info : inference-cost-info?
parameter
→ (-> (list/c cost-log-entry?) (list/c cost-log-entry?)) (current-model-cost-logger logger-f) → void? logger-f : (-> (list/c cost-log-entry?) (list/c cost-log-entry?))
= log-logger
procedure
(log-logger log) → (list/c cost-log-entry?)
log : (list/c cost-log-entry?)
parameter
(current-cost-port port) → void? port : output-port?
= (current-error-port)
procedure
→ (list/c cost-log-entry?) log : (list/c cost-log-entry?)
procedure
(log->string log) → string?
log : (list/c cost-log-entry?)
procedure
(log-model-cost! entry) → void?
entry : cost-log-entry?
procedure
(current-carbon-use [mode]) → number?
mode : (or/c 'queries 'training) = 'queries
procedure
(current-power-use [mode]) → number?
mode : (or/c 'queries 'training) = 'queries
procedure
(current-water-use [mode]) → number?
mode : (or/c 'queries 'training) = 'queries
Bibliography
[faiz2023] | Faiz, Ahmad and Kaneda, Sotaro and Wang, Ruhan and Osi, Rita and Sharma, Prateek and Chen, Fan and Jiang, Lei, “LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models,” ICLR, 2023. https://doi.org/10.48550/ARXIV.2309.14393 | |
[wilkins2024a] | Wilkins, Grant and Keshav, Srinivasan and Mortier, Richard, “Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems.” 2024. https://doi.org/10.48550/ARXIV.2407.04014 | |
[wilkins2024] | Wilkins, Grant, “Online Workload Allocation and Energy Optimization in Large Language Model Inference Systems.” 2024. https://grantwilkins.github.io/gfw27_project.pdfMSc. Thesis | |
[li2023] | Li, Pengfei and Yang, Jianyi and Islam, Mohammad A. and Ren, Shaolei, “Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models.” 2023. https://doi.org/10.48550/ARXIV.2304.03271 |