langmech.bib
Natural Language Mechanisms via Self-Resolution with Foundation Models
Abstract
Traditional mechanisms often constrain agent reports to simplified formats, potentially limiting expressible information. We propose Language Model Mechanisms (LMMs) that elicit natural language reports and leverage large language models (LLMs) for outcome selection and payoff assignment. We identify sufficient conditions for incentive-compatibility and efficiency: the LLM being a sufficiently good world model and a strong inter-agent information over-determination condition. We demonstrate LMMs can successfully aggregate information in scenarios where traditional mechanisms like prediction markets fail.
1 Introduction
Mechanism design, a cornerstone of economics and computer science, aims to create rules for social interactions that achieve desirable outcomes. Traditional mechanisms often constrain agent reports to simplified formats like trades or rank orderings, potentially limiting the information agents can express. This limitation can lead to suboptimal outcomes, especially when agents possess complex, high-dimensional private information.
We propose a novel class of mechanisms that leverage the power of large language models (LLMs) to overcome these limitations. Our main contributions are:
-
1.
We introduce Language Model Mechanisms (LMMs) that elicit agent reports in natural language, allowing for richer information exchange.
-
2.
We identify sufficient conditions for these mechanisms to be incentive-compatible and efficient, based on the LLM’s capability as a world model and a strong inter-agent information over-determination condition.
-
3.
We demonstrate scenarios where LMMs can successfully aggregate information in signal structures where traditional mechanisms like prediction markets fail.
Our approach represents a significant departure from conventional mechanism design, offering new possibilities for information aggregation and decision-making in complex environments.
1.1 Motivating Examples
To illustrate the potential of LMMs and the concept of information over-determination, consider the following scenarios:
-
1.
Urban Development Planning: A city plans a new high-density, walkable urban development. Potential residents describe their ideal living situations, daily routines, and community preferences. The over-determination condition is met because many participants will have overlapping preferences or complementary needs.
-
2.
Collaborative Scientific Research: In a large-scale scientific collaboration, researchers from various disciplines provide detailed reports on their findings, methodologies, and interpretations. The over-determination condition is satisfied because key scientific facts or methodological best practices will be known by multiple experts.
-
3.
Crowd-Sourced Product Development: A company uses an LMM to gather consumer insights for a new product line. Participants provide detailed descriptions of their needs, usage scenarios, and feature preferences. The over-determination condition is met because many consumers will have similar needs or use cases.
These examples demonstrate how LMMs can leverage rich, natural language reports in settings where information is distributed across many agents with overlapping knowledge or experiences.
1.2 Simple Example: 2 Variables and 6 Players
Consider a scenario with two binary variables and , each taking values in , and six players. The true state of the world, denoted by , is determined by the XOR of and :
Each player receives a noisy signal about one of the variables:
-
•
Players 1, 2, and 3 each receive an independent noisy signal about
-
•
Players 4, 5, and 6 each receive an independent noisy signal about
Let be the signal received by player . The signals are generated as follows:
In this setting, no individual player has enough information to determine the true state with confidence greater than the prior. However, collectively, the players have sufficient information to determine with high probability.
In a traditional prediction market for , if the prior probabilities for are uniform, no individual player would have an incentive to trade, as their individual signal doesn’t change the expected value of .
In contrast, our language model mechanism can aggregate this distributed information effectively:
-
1.
Each player reports their signal to the mechanism in natural language.
-
2.
The LLM processes all six reports, recognizing that multiple consistent reports about X and Y provide strong evidence about their true values, and then computes .
-
3.
The LLM outputs its best estimate of the true state based on all reports.
-
4.
The payment rule incentivizes truthful reporting.
This example illustrates how our mechanism can successfully aggregate distributed information in scenarios where traditional mechanisms fail.
2 Related Literature
The literature on self-resolution, prediction markets, and peer prediction is vast. Recent work by Srinivasan et al. [srinivasan2023self] shows there are truthful equilibria in a self-resolving prediction market under fairly standard assumptions for that literature.
The capacity of LLMs to detect the miss reports when the information to do so is available, is related to the degree of economic rationality of models, [raman2024steerassessingeconomicrationality] provide a framework to asses it. There are, however, fundamental limits from pre-training approaches, that mean they must hallucinate if they are to be calibrated [kalai2024calibratedlanguagemodelshallucinate].
2.1 LLMs and Mechanism Design
Several recent works have explored the intersection of LLMs and mechanism design:
-
•
[hao2023reasoning] considers reasoning with language models as planning with a world model.
-
•
Duetting et al. [duetting2023mechanism] discuss mechanism design for large language models where the allocation and payment rules are constructed in a token-by-token manner over a set of LLMs.
-
•
Dubey et al. [dubey2024auctions] propose a factorized framework that contains an auction module and an LLM module.
-
•
Rahaman et al. [rahaman2024language] empirically study the reduction of the buyer’s inspection paradox in information markets, showing cases where an LLM was able to reduce the asymmetry between buyer and seller in simulated scenarios.
The novelty in our work consists in using LLMs to elicit and aggregate rich information in natural language, with strong incentive guarantees under strong assumptions on the quality of the LLM and on the over-determination of the information across participating agents.
3 Model
Consider a set of agents . For each agent , let be a space of natural language signals. The joint signal space is denoted by .
Let denote the true signal observed by agent , and let be the profile of true signals across all agents. We use to denote the profile of true signals for all agents except .
A language model mechanism (LMM) consists of:
-
•
An outcome function that maps a profile of reported natural language signals to an outcome, using a large language model (LLM).
-
•
A payment rule , where specifies the payment to agent as a function of the report profile and the LLM’s output.
The timing of the mechanism is as follows:
-
1.
Each agent observes their private signal and submits a report , where possibly .
-
2.
The mechanism computes the outcome and payments , where is the profile of reports.
In this model, we focus on the case where the information is separable from the preferences over the outcome. Agents with information only have preferences over their payments , not over the selected outcome itself.
3.1 Key Definitions
Definition 3.1 (-Sufficient World Model).
An LLM is considered a -sufficient world model if, for any profile of true signals , the outcome selected by the LLM satisfies:
where is the optimal outcome, is the welfare function, and is a small positive constant. The expectation is taken over any randomness in the LLM’s output.
Definition 3.2 (Inter-agent Information Over-determination).
The information structure satisfies inter-agent information over-determination if, for any agent and any misreport , either:
-
•
(Zero-shot setting) The LLM can detect with high probability that is inconsistent with respect to the true reports , without necessarily being able to reproduce .
-
•
(Observable outcomes setting) The expected forecasting error of the LLM increases when is substituted for , i.e., , where is a forecasting error function and the expectation is taken over the distribution of signals for other agents.
3.2 Truthfulness and Efficiency
The LMM has a truthful equilibrium if for every agent , their true signal , all other agents’ true signals , and any report :
That is, truthful reporting is a best response in expectation when others are truthful. The expectation is taken over any randomness in the LLM’s output and the distribution of other agents’ signals.
The truthful equilibrium is approximately efficient if the outcome selected by the mechanism achieves expected welfare within a factor of the maximum expected welfare.
Proposition 1.
Under the following conditions, the language model mechanism (LMM) has a truthful and approximately efficient equilibrium:
-
1.
The LLM is a -sufficient world model for some small .
-
2.
The information structure satisfies the inter-agent information over-determination condition.
Proof.
We will prove this separately for the observable outcomes setting and the zero-shot setting.
Observable Outcomes Setting:
Let be the forecasting error function of the LLM. Define the payment rule as:
where is a scaling factor and is a constant large enough to ensure non-negative payments.
For any agent , true signal , and potential misreport , we have:
The inequality follows from the inter-agent information over-determination condition in the observable outcomes setting.
Zero-Shot Setting:
Let be a function representing the LLM’s assessment of the consistency of a report profile. Define the payment rule as:
where is a scaling factor.
For any agent , true signal , and potential misreport , we have:
The inequality follows from the inter-agent information over-determination condition in the zero-shot setting.
Thus, in both settings, truthful reporting is a best response in expectation when others are truthful, constituting a truthful equilibrium.
Approximate Efficiency: Given truthful reporting and the -sufficient world model condition, the LLM selects an outcome such that , ensuring approximate efficiency in expectation. ∎
3.3 Information Structure
The inter-agent information over-determination condition introduced in our model shares similarities with, but is distinct from, several classical concepts in information economics. In this section, we explore these relationships, focusing particularly on information monotonicity, which is a key concept in mechanism design.
3.4 Information Monotonicity
Information monotonicity is a condition often used in information economics to ensure that truthful reporting is optimal. In our context, we can define it as follows:
Definition 3.3 (Information Monotonicity).
The information structure satisfies information monotonicity if for any agent and any misreport ,
where is the forecasting error function as defined in our model.
This condition stipulates that truthful reporting leads to lower expected forecasting errors than misreporting. While our inter-agent information over-determination condition shares this intuition, it is in fact a stronger requirement, as we show in the following proposition:
Proposition 2.
The inter-agent information over-determination condition (in the observable outcomes setting) implies information monotonicity, but the converse is not true.
Proof.
First, we show that inter-agent information over-determination implies information monotonicity:
Let be any agent and be any misreport. By the inter-agent information over-determination condition, we have:
Since , this is equivalent to:
This strict inequality clearly implies the non-strict inequality required by information monotonicity:
To show that information monotonicity does not imply inter-agent information over-determination, we provide a counterexample:
Consider a setting with two agents, where can take only two values: 0 or 1. Let the joint distribution of signals be such that:
for any for any
This setting satisfies information monotonicity, as for any and :
However, it does not satisfy inter-agent information over-determination, because the strict inequality does not hold. ∎
3.5 Other Related Concepts
While our focus has been on information monotonicity, the inter-agent information over-determination condition also relates to other concepts in information economics:
1. Single-Crossing Condition: In mechanism design, the single-crossing condition often ensures that an agent’s preferences satisfy a certain monotonicity property. Our condition similarly ensures a form of monotonicity, but with respect to the quality of information rather than preferences.
2. Supermodularity: In some information structures, the marginal value of one agent’s information increases with the quality of other agents’ information. While our condition doesn’t directly imply this property, it does capture a related idea of information complementarity across agents.
4 Discussion
The key insight that language models are world models can be leveraged to make their use as allocation functions incentive compatible by linking the payoff of agents to their performance as scored by the model using the other reports. In practice, this allows for the exploration of a novel and expressive set of mechanisms, with very different assumptions than previously proposed mechanisms.
4.1 Practical considerations
The prompt template that the reports are inserted into effectively acts as contract. It is used when instantiating the prompt that is given to the LLM. One practical approach is to use an intermediary representation that would be the direct output of the LLM, such as code that can be checked for syntactic correctness and tested. This code can then further be executed to generate the outcome and the payment. In general, when we refer to the LLM we include pre and post processing pipelines that populate its prompt.
4.2 Limitations & Application Scope
The conditions identified for sufficiency are very strong. In particular the information structure needs to have enough redundancy such that it is not possible to go undetected when making directionally correct reports that have a slight bias. This implies that any piece of information must be had by a sufficiently high number of agents. There are potential application domains for institutions designed around such mechanisms, that the sufficient conditions might be met, namely: The underlying information structure has a high degree of redundancy across agents. Some examples include:
-
•
LLM agents that simulate individuals’ preferences for who to have or not as neighbors.
-
•
LLM agents representing potential buyers on a new high density walkable urban development in a previously unpopulated area. Being able to coordinate who is neighbors with whom is hard without LLMs because most potential buyers do not have the information nor time to evaluate how good of a match they would be to all hypothetical potential neighbors. Having the LLM evaluate the compatibility of the reports (which might include both the potential buyers’ stated preferences and their observable characteristics) could enable better coordination.
-
•
The miss-report detection might then be possible by using the reports of those who do know the individual about them that their LLM generates, and using this to evaluate the self-reported LLM simulator.
The linguistic report can be thought of in several ways:
-
•
Most concretely: As a string reporting preferences
-
•
As a string that is used to prompt or fine-tune a given standard model
-
•
As a set of weights for a given standardized model architecture
-
•
Most generally: a pipeline and the weights of the LLM(s) used in it representing the user
An interesting direction for future work is in the most general model, if the prompts being used in the pipeline are themselves the subject of optimization (as in DSPy). In the setting with feedback, to enable self-resolution we need the model to be able to evaluate the truthfulness of reports.
4.3 Future work
Key empirical challenges include understanding how and to what limits empirical measurements can determine if the conditions for truthfulness and efficiency are met, and how and to what extent can we measure the agents’ information over-determation. A key theoretical question is how much the agent information substitutability conditions can be relaxed. A natural and interesting generalization of the information structure is when the signal that the world model needs is intermixed with the private information of those with preferences over the outcome. In other words, in the setting where agents with information have preferences not only over their payments but also over the selected outcome itself. Under the truthful equilibrium with the assumption that deviations are zero-shot detectable, agents falsifying their reports to manipulate the allocation would be detectable. If these agents are punishable in the payments they could be made out of the equilibrium. In the observable outcomes settings, it would be valuable to understand how the error of the language model in forecasting relates to the degree to which agents can miss-report undetectably.