GGUF model stops generation around 900-1000 tokens

by High-U - opened 20 days ago

20 days ago

The GGUF version of rnj-1-instruct consistently stops generation at around 900-1000 tokens when asked to generate longer content (e.g., complete HTML applications).
The model appears to complete generation early rather than continuing to the requested length. Is this expected behavior, or is there a way to generate longer outputs with the GGUF version?

monk-essential

20 days ago

Hi, can you give an example of a prompt that behaves this way?

monk-essential

20 days ago

Generate a complete HTML and JS application that looks like a screensaver of wandering through a maze. It should be large and intricate. Write extensive unit tests for it as well.

I'm able to sometimes trigger it with this prompt, but sometimes it continues on without an issue. If you're using llama-server, in the settings there's an option to enable a "continue" button that sometimes works. Will look into this more.

High-U

19 days ago

After enabling the "Continue" button setting in llama-server's web UI (General → Enable "Continue" button), generation behavior changed on that UI.Generation now often completes fully (1500+ tokens), though it still sometimes stops around 1000 tokens.
The environment that always stopped at around 1000 tokens was not the llama-server UI, but CURL or my UI that calls the llama-server endpoint.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment