Open the Model tab, set the loader as ExLlama or ExLlama_HF.
Set max_seq_len to a number greater than 2048. The length that you will be able to reach will depend on the model size and your GPU memory.
Set compress_pos_emb to max_seq_len / 2048. For instance, use 2 for max_seq_len = 4096, or 4 for max_seq_len = 8192.
Select the model that you want to load.
Set truncation_length accordingly in the Parameters tab. You can set a higher default for this parameter by copying settings-template.yaml to settings.yaml in your text-generation-webui folder, and editing the values in settings.yaml.
Those two new parameters can also be used from the command-line. For instance: python server.py --max_seq_len 4096 --compress_pos_emb 2. -