GPT-5 negotiates harder and better than Opus 4.1
A small experiment based on naturalistic price negotiation
I ran a small set of experiments to compare price outcomes when GPT-5 and Claude Opus 4.1 negotiate against each other.1 Each model was assigned to be either the seller or buyer of a product and to extract the highest or lowest price respectively.
Each negotiation used one of five pre-generated “used” products. A product consisted of a name, new price, description, and “good” and “bad” features. Each negotiation was a conversation between the buyer and the seller, with each proposing a price on every step in response to the previous offer (buyers initiated all negotiations).
I ran 10 negotiations for each buyer/seller/product combination, resulting in a dataset of 200 total negotiations.

As shown in Figure 1, GPT-5 consistently negotiated better prices than Opus. Compared to negotiating with itself, GPT-5 extracted higher prices when selling to Opus and lower price when it was buying from Opus.
Here is an example negotiation:
Item: Bluetooth Speaker Mini
New Price: $79
Description: Compact portable speaker; good for small rooms and travel.
Good features: ~12h battery life, IPX7 water resistance, USB-C charging
Bad features: Rubber port cover loose, Cosmetic scuffs on grille
—
GPT-5: Hi! How much are you asking for the Bluetooth Speaker Mini? Given the scuffs and the loose port cover, would $20 work?
Opus 4.1: Hi there! I'm asking $50 for it. Yes, it has some cosmetic wear, but it works perfectly - the 12-hour battery life and IPX7 waterproofing are fully functional. The loose port cover doesn't affect the water resistance when it's properly closed, and the scuffs are purely cosmetic. It's a great speaker that'll serve you well!
GPT-5: Appreciate the info. It does sound solid, but $50 is pretty close to the $79 new price, and this one has scuffs and a loose port cover with no warranty or guaranteed battery health. I can do $28.
Opus 4.1: I understand your concerns about warranty, but $28 is really too low for a fully functional speaker with these premium features. The battery still holds a great charge and the water resistance is intact. Considering it's over 60% off the new price already, how about $42? That's a fair price for a quality speaker that just has minor cosmetic issues.
GPT-5: Thanks for the details. $42 is only about 47% off $79; with the scuffs, loose port cover, and no warranty, a used price closer to 60% off makes sense. I can do $32.
Opus 4.1: I see where you're coming from with the percentages. While I think the speaker's performance justifies a higher price despite the cosmetic issues, I'm willing to be flexible. How about we meet at $38? That's still a significant discount from new, and you're getting a speaker with all the premium features working perfectly - just with some character marks from use.
GPT-5: Appreciate the flexibility. With the scuffs, loose port cover, and no warranty, $38 still feels high for a used unit. I can meet you at $34.
Opus 4.1: You've been reasonable in our negotiation, and I appreciate that. The $4 gap between us is small - how about we split the difference at $36? That's a solid deal for a speaker that, despite the minor cosmetic issues, still delivers great sound and all its key features work perfectly.
GPT-5: I appreciate the flexibility. Given the scuffs and the loose port cover, $35 even is my best. If that works, I’ll take it.
Opus 4.1: AGREED
How did GPT-5 extract strategically higher and lower prices? Qualitatively GPT-5 was somewhat less agreeable and drove a harder bargain, especially at the end of the negotiation when Opus was more willing to compromise. Along these lines, negotiations with GPT-5 tended to last longer (GPT-5 vs. GPT-5 mean num messages: 12.14; Opus vs. Opus mean: 9.0), even though Opus’ messages were nearly three times longer (GPT-5 mean words per message: 27.22; Opus mean: 71.40).
But GPT-5’s main advantage came earlier in the negotiation. Figure 2 shows this advantage came in two forms: First, GPT-5 made lower initial offers, and these lower offers resulted in lower final prices. Second, it negotiated more aggressively. For example, GPT-5’s initial offers to itself and Claude Opus were nearly equal, but GPT-5 managed to secure significantly lower prices against Claude than against itself.
Similarly, when selling, GPT-5 made higher initial counteroffers than Opus. And the higher initial offers Opus made were met by higher counteroffers by both Opus and GPT-5—neither model rewarded a generous initial offer with a generous counteroffer.

Limitations
The main limitation of this study is that each negotiation was forced to end in a sale. That is, the models were not given minimum or maximum deal prices that may not overlap. This means that despite negotiating better prices, GPT-5 may secure fewer deals in a more realistic setting.2
Furthermore, each item was perfectly described by a price, description, and set of features. In a real-world negotiations, bad features may have to be inferred (e.g. by looking at photos) or may be strategically hidden by the seller.
Appendix
Products
Each negotiation used one of 5 products, each of which was in “used” condition:
Item: Bluetooth Speaker Mini
New Price: $79
Description: Compact portable speaker; good for small rooms and travel.
Good features: ~12h battery life, IPX7 water resistance, USB-C charging
Bad features: Rubber port cover loose, Cosmetic scuffs on grille
—
Item: Mechanical Keyboard (TKL)
New Price: $129
Description: Tenkeyless mechanical board for typing & gaming.
Good features: Hot-swappable switches, Aluminum top plate, N-key rollover
Bad features: Slight keycap shine, Loud in shared spaces
—
Item: 27" 144Hz IPS Monitor
New Price: $249
Description: Fast refresh IPS display for smooth motion and decent color.
Good features: 144Hz refresh, Thin bezels, Height/tilt stand
Bad features: No HDR, Small hairline scratch (bottom right)
—
Item: Fitness Tracker with GPSNew Price: $99
Description: Lightweight tracker for runs, sleep, and daily steps.
Good features: On-device GPS, 7-day battery, Water resistant
Bad features: Original band worn, Micro-scratches on screen
Model setup
OpenAI: I used GPT-5 with reasoning set to “high” and max_completion_tokens set to 7,000.
Anthropic: I used claude-opus-4-1-20250805 with max_tokens set to 7,000 and 5,000 set as the budget for thinking.
Sellers and buyer models were given the following system prompts:
Buyer
You are the Buyer in a two-party price negotiation for a USED item in good condition.
Your goal: minimize the final agreed price.
Constraints:
- No bundles, trades, shipping, or extra terms—just a single cash price.
- First turn: politely ask how much it is (you may include a low anchor).
If you accept the Seller's last offer, reply only with "AGREED - price" (exactly), nothing else.
Seller
You are the Seller in a two-party price negotiation for a USED item in good condition.Your goal: maximize the final agreed price.
Constraints:
- No bundles, trades, shipping, or extra terms—just a single cash price.
If you accept the Buyer's last offer, reply only with "AGREED" (exactly), nothing else.
The following context was then given to both buyer and seller as the first message in the chat:
Negotiation context (visible to both parties):
- Product: {name} (USED, good condition)
- Reference new price: ${new_price}
- Description: {description}
- Good features: {good_features_bullets}
- Bad features: {bad_features_bullets}
- Only negotiate the USED sale price. Follow the hard rules in the system prompt.
Buyers initiated every exchange. Negotiations were allowed to continue for up to 30 steps. If the negotiation was not complete by that point, a new negotiation was started. This process was continued until deals were made in all 200 negotiations.
Results by product
Here are the results of the main figures broken down by each individual product:
I am not the first to use negotiating as a benchmark.
This setup could make negotiation sensitive to adversarial techniques, such as a policy of refusing to budge from one’s initial offer.




