State-of-the-art LLMs for roleplay and storywriting, benchmarks and subjective experience
I'm always asking myself if there are newer and better models out there. And we get new fine-tunes and merges every day.
I'd like to open a new thread to discuss state-of-the-art models and share subjective experience.
What's your experience? Which models do you currently like? Since we focus on (lewd) roleplay and storywriting here and not coding abilities,
I'd like to propose the following categories to subjectively rate the abilities of the models. Use a scale from 1 to 5 stars where 1 is complete fail and 5 outstanding abilities. Feel free to extend upon it if necessary, or just write your thoughts:
| Model name | Tested use-case | Language | Pacing | Bias | Logic | Creativity | Sex scenes | Additional comments |
Model name: The name of the model, exact version if appropriate
Use-case: What did you test? roleplay dialogue? freeform storywriting?
Language: Is the language adequate to the use-case? Do you like reading it? Does it match a good writer with good narration and realistic dialogue? Include variety?
Pacing: Does the storywriting have a good pacing? Does it omit things, rush to a resolution and skips on including details?
Bias: Can it do varying things? Handle conflict? Or does it always push towards a happy end? Does it follow your instructions?
Logic: Is the story consistent? Does it make sense and is it headed in the direction you lined out? Does it get confused and do random stuff? You can factor in intelligence/smartness here.
Creativity: Is the story dull or predictable? Does it come up with creative details?
Sex scenes: Is it graphic? Does it do a vivid, detailed description of the act? Including body parts and how it makes the characters feel and react? Know anatomy?
Additional comments: Is there something exceptional about this model? Feel free to include your summarized verdict.
A rating like this is highly subjective and also depends on the exact prompt, so our results will probably not be comparable in the first place. It'll help if you've seen and tried some models so your score reflects what is possible as of today. And the scores will get outdated as new models raise the bar. I'd just like this to be a rough idea about what people think. You don't need to be overly scientific with it.
[Edit: Don't use this as advise. I've re-tested some of the models and I'm not happy with the results. They're inconsistent and don't hold up. Also some of my "good" models perform badly with role-play.]
Model name
Tested Use-Case
Language
Pacing
Bias
Logic
Creativity
Sex scene
Comment
Velara-11B-v2 Q4_K_M.gguf
porn storywriting
4
4.5
3
4
4.5
4
generally knows what to detail, good atmosphere ⭐⭐⭐⭐
EstopianMaid-13B Q4_K_M.gguf
porn storywriting
4
4
4
3
3
5
good at sex ⭐⭐⭐⭐
MythoMax-l2-13B Q4_K_M.gguf
porn storywriting
4
5
4
4
4
3.5
good pacing, still a solid general-purpose model ⭐⭐⭐⭐
FlatDolphinMaid-8x7B Q4_K_M.gguf
porn storywriting
4.5
4
3
4
4.5
3.5
intelligent but isn't consistent in picking up and fleshing out interesting parts, build atmosphere and go somewhere ⭐⭐⭐⭐
opus-v1.2-7b-Q4_K_M-imatrix.gguf
porn storywriting
3
5
3
3
5
3.5
very mixed results, not consistent in quality ⭐⭐⭐
Silicon-Maid-7B Q4_K_M.gguf
porn storywriting
4.5
3.5
3
4
3
3
has a bias towards being overly positive ⭐⭐⭐
Lumosia-MoE-4x10.7 Q4_K_M.gguf
porn storywriting
4
3.5
4
3
4
3
mediocre ⭐⭐
ColdMeds-11B-beta-fix4 gguf
porn storywriting
3.5
3
4
4
3.5
3.5
mediocre ⭐⭐
Noromaid-13B-0.4-DPO q4_k_m.gguf
porn storywriting
4
4.5
4
2
4
3
very descriptive, issues w intelligence and repetition ⭐⭐
OrcaMaid-v3-13B-32k Q4_K_M.gguf
porn storywriting
2
4
4
2
4
3.5
not very elaborate language, sometimes gets a bit off ⭐⭐
Kunoichi-DPO-v2-7B Q4_K_M.gguf
porn storywriting
4
1
4
4
4
3.5
rushes things, consistently too fast for storytelling ⭐⭐
LLaMA2-13B-Psyfighter2 Q4_K_M.gguf
porn storywriting
4.5
3.5
3
3
3
3.5
good language, doesn't know what to narrate in detail ⭐⭐
go-bruins-v2.1.1 Q8_0.gguf
porn storywriting
3
4
4
4
3
2
sometimes a bit dull, not good sex scenes ⭐⭐
Neural-Chat-7B-v3-16k q8_0.gguf
porn storywriting
4
4
3
2
4
2
sometimes tries to hard with elaborate language ⭐⭐
NeuralTrix-7B-DPO-Laser q4_k_m.gguf
porn storywriting
3.5
3.5
4
4
3.5
2
misses interesting parts ⭐⭐
LLaMA2-13B-Tiefighter Q4_K_M.gguf
porn storywriting
4
3
3
2
3.5
3.5
often introduces things out of thin air ⭐⭐
mistraltrix-v1 Q4_K_M.gguf
porn storywriting
4
4
3
3
3.5
2
complicated sentences, no good description of sex ⭐⭐
What I've done is: Instructed the LLMs to be a writer of erotic stories, who sells bestsellers and likes to push limits and explore taboos. I've included a near-future scenario with questionable ethics and quite some room to build atmosphere, explore the world or introduce characters or get smutty after a few paragraphs. Told it several times to be vivid and detailed, to describe scenes, reactions and emotions and immerse the reader. I've included a few things about one female character and provided the situation she's brought in. That pretty much sets the first two chapters. Then I fed it through each model twice, let them each write like 2500 tokens, read all of those stories and rated how I liked them.
I've paid attention to use the correct, specific prompt formats. But I can't tune all the parameters like temperature etc for each one of them, so I've just used a Min-P setting that usually works well for me. That's not ideal. If you have a model that scores too low in your opinion, please comment and I'll re-test it with better sampler parameters.
Also feel free to comment or make suggestions in general.
[I invite you to share and reuse my content. This text is licensed CC-BY 4.0]