Code
cookbook/models/vllm/basic_stream.py
Documentation Index
Fetch the complete documentation index at: /llms.txt
Use this file to discover all available pages before exploring further.
You are viewing v1 docs. For the latest documentation, visit docs.agno.com
from agno.agent import Agent
from agno.models.vllm import vLLM
agent = Agent(
model=vLLM(id="Qwen/Qwen2.5-7B-Instruct", top_k=20, enable_thinking=False),
markdown=True,
)
agent.print_response("Share a 2 sentence horror story", stream=True)
Create a virtual environment
Terminal and create a python virtual environment.python3 -m venv .venv
source .venv/bin/activate
Start vLLM server
vllm serve Qwen/Qwen2.5-7B-Instruct \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--dtype float16 \
--max-model-len 8192 \
--gpu-memory-utilization 0.9
Was this page helpful?