Documentation Index
Fetch the complete documentation index at: https://cerebrium.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Streaming sends live output from a model over a server-sent event (SSE) stream.
It works with any Python object that implements the iterator or generator protocol.
The generator/iterator must yield data, which is sent downstream via the text/event-stream Content-Type.
Data can be sent in JSON format and decoded on the client side.
A minimal example:
import time
def run(upper_range: int):
for i in range(upper_range):
yield f"Number {i} "
time.sleep(1)
Deploy this snippet and call the endpoint. SSE events appear progressively, one per second:
curl -X POST https://api.aws.us-east-1.cerebrium.ai/v4/<YOUR-PROJECT-ID>/2-streaming-endpoint/run \
-H 'Content-Type: application/json'\
-H 'Accept: text/event-stream\
-H 'Authorization: Bearer <YOUR-JWT-TOKEN>\
--data '{"upper_range": 3}'
This should output:
HTTP/1.1 200 OK
cache-control: no-cache
content-encoding: gzip
content-type: text/event-stream; charset=utf-8
date: Tue, 28 May 2024 21:12:46 GMT
server: envoy
transfer-encoding: chunked
vary: Accept-Encoding
x-envoy-upstream-service-time: 198995
x-request-id: e6b55132-32af-96d7-a064-8915c4a42452
data: Number 0
...
The remaining data streams in every second:
...
data: Number 1
data: Number 2
Postman also supports SSE streams natively.
For a Falcon-7B streaming example, see the streaming endpoint example.