DSPy is a framework for algorithmically optimizing LM prompts and weights.
When combined with Autoblocks, it becomes even more powerful.
Throughout this guide, we will show you how to set up a DSPy application and integrate Autoblocks tooling to further tune your AI application.
At the end of this guide, you’ll have:
- Set up a functioning DSPy application: Learn how to configure and run a DSPy application optimized for language model enhancement.
- Configured an Autoblocks Test Suite: Track experiments and evaluate your application using our easy-to-use CLI and SDK.
- Enabled Autoblocks Tracing: Gain insights into the underlying LLM calls to better understand what is happening under the hood.
- Created an Autoblocks Config: Remotely and collaboratively manage and update your application’s settings to adapt to new requirements seamlessly.
Note: You can view the full working example here: https://github.com/autoblocksai/autoblocks-examples/tree/main/Python/dspy
Note: We will use Poetry throughout this guide, but you can use the dependency management solution of your choice.
Step 1: Set Up the DSPy App
We will use the DSPy minimal example as a base.
First, create a new poetry project:
poetry new autoblocks-dspy
Install DSPy:
poetry add dspy-ai
Create a file named run.py
inside the autoblocks_dspy
directory. This file sets up the language model, loads the GSM8K dataset, defines a Chain of Thought (CoT) program, and evaluates the program using DSPy.
# autoblocks_dspy/run.py
import dspy
from dspy.datasets.gsm8k import GSM8K, gsm8k_metric
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate
# Set up the LM
turbo = dspy.OpenAI(model='gpt-4-turbo', max_tokens=250)
dspy.settings.configure(lm=turbo)
# Load math questions from the GSM8K dataset
gsm8k = GSM8K()
gsm8k_trainset, gsm8k_devset = gsm8k.train[:10], gsm8k.dev[:10]
class CoT(dspy.Module):
def __init__(self):
super().__init__()
self.prog = dspy.ChainOfThought("question -> answer")
def forward(self, question: str):
return self.prog(question=question)
# Set up the optimizer: we want to "bootstrap" (i.e., self-generate)
# 4-shot examples of our CoT program.
config = dict(max_bootstrapped_demos=4, max_labeled_demos=4)
# Optimize! Use the `gsm8k_metric` here.
# In general, the metric is going to tell the optimizer how well it's doing.
teleprompter = BootstrapFewShot(metric=gsm8k_metric, **config)
optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset)
# Set up the evaluator, which can be used multiple times.
evaluate = Evaluate(
devset=gsm8k_devset,
metric=gsm8k_metric,
num_threads=4,
display_progress=True,
display_table=0
)
def run():
# Evaluate our `optimized_cot` program.
evaluate(optimized_cot)
Now update the pyproject.toml
file to include a start script. This makes it easy to run your DSPy application with a single command:
[tool.poetry.scripts]
start = "autoblocks_dspy.run:run"
Run poetry install
again to register the script:
poetry install
Set your OpenAI API Key as an environment variable:
export OPENAI_API_KEY=...
Now you can run the application using Poetry:
poetry run start
You should see output in your terminal, indicating that you have successfully run your first DSPy application!
Step 2: Add Autoblocks Testing
Now let's add Autoblocks Testing to our DSPy application so we can declaratively define tests to track experiments and optimizations.
First, install the Autoblocks SDK:
poetry add autoblocksai
Update the run function in autoblocks_dspy/run.py
to take a question as input and return the result. This prepares your function for testing:
# autoblocks_dspy/run.py
def run(question: str) -> dspy.Prediction:
return optimized_cot(question=question)
Now we’ll add our Autoblocks test suite and test cases. Create a new file inside the autoblocks_dspy
folder named evaluate.py
. This script defines the test cases, evaluators, and test suite.
# autoblocks_dspy/evaluate.py
from dspy.datasets.gsm8k import GSM8K, gsm8k_metric
from dspy.primitives import Prediction
from dataclasses import dataclass
from autoblocks.testing.models import BaseTestCase
from autoblocks.testing.models import BaseTestEvaluator
from autoblocks.testing.models import Evaluation
from autoblocks.testing.models import Threshold
from autoblocks.testing.run import run_test_suite
from autoblocks.testing.util import md5
from autoblocks_dspy.run import run
# Load math questions from the GSM8K dataset
gsm8k = GSM8K()
gsm8k_devset = gsm8k.dev[:10]
@dataclass
class TestCase(BaseTestCase):
question: str
answer: str
def hash(self) -> str:
"""
This hash serves as a unique identifier for a test case throughout its lifetime.
"""
return md5(self.question)
@dataclass
class Output:
"""
Represents the output of the test_fn.
"""
answer: str
rationale: str
class Correctness(BaseTestEvaluator):
id = "correctness"
threshold = Threshold(gte=1)
async def evaluate_test_case(
self,
test_case: TestCase,
output: Output,
) -> Evaluation:
metric = gsm8k_metric(gold=test_case, pred=Prediction(answer=output))
return Evaluation(
score=1 if metric else 0,
threshold=self.threshold,
)
def test_fn(test_case: TestCase) -> Output:
prediction = run(question=test_case.question)
return Output(
answer=prediction.answer,
rationale=prediction.rationale,
)
def run_test():
run_test_suite(
id="dspy",
test_cases=[
TestCase(
question=item["question"],
answer=item["answer"],
)
for item in gsm8k_devset
],
evaluators=[
Correctness()
],
fn=test_fn,
)
Update the run script in pyproject.toml
to point to your new test script:
[tool.poetry.scripts]
start = "autoblocks_dspy.evaluate:run_test"
Install the Autoblocks CLI following the guide here: https://docs.autoblocks.ai/cli/setup
Retrieve your local testing API key from the settings page (https://app.autoblocks.ai/settings/api-keys) and set it as an environment variable:
export AUTOBLOCKS_API_KEY=...
Run the test suite:
npx autoblocks testing exec -m "My first run" -- poetry run start
In your terminal you should see:
In the Autoblocks Testing UI (https://app.autoblocks.ai/testing/local), you should see your test suite as well as the results:
🎉 Congratulations on running your first Autoblocks test on your DSPy application!
Step 3: Log OpenAI Calls
To gain insights into the LLM calls DSPy is making under the hood, we can add the Autoblocks Tracer.
Update the autoblocks_dspy/run.py
file to override the OpenAI client from DSPy. This script logs requests and responses to Autoblocks, helping you trace the flow of data through your AI application:
# autoblocks_dspy/run.py
import os
import uuid
import dspy
from dspy.datasets.gsm8k import GSM8K, gsm8k_metric
from dspy.teleprompt import BootstrapFewShot
from autoblocks.tracer import AutoblocksTracer
tracer = AutoblocksTracer(
os.environ["AUTOBLOCKS_INGESTION_KEY"],
)
class LoggingOpenAI(dspy.OpenAI):
"""Extend the OpenAI class from DSPy to log requests and responses to Autoblocks."""
def basic_request(self, prompt: str, **kwargs):
trace_id = str(uuid.uuid4())
tracer.send_event(
"ai.request",
trace_id=trace_id,
properties={
"prompt": prompt,
**self.kwargs,
**kwargs,
},
)
response = super().basic_request(prompt=prompt, **kwargs)
tracer.send_event(
"ai.response",
trace_id=trace_id,
properties=response,
)
return response
# Set up the LM
turbo = LoggingOpenAI(model='gpt-3.5-turbo', max_tokens=250)
dspy.settings.configure(lm=turbo)
# Load math questions from the GSM8K dataset
gsm8k = GSM8K()
gsm8k_trainset = gsm8k.train[:10]
class CoT(dspy.Module):
def __init__(self):
super().__init__()
self.prog = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.prog(question=question)
# Set up the optimizer: we want to "bootstrap" (i.e., self-generate) 4-shot examples of our CoT program.
config = dict(max_bootstrapped_demos=4, max_labeled_demos=4)
# Optimize! Use the `gsm8k_metric` here. In general, the metric is going to tell the optimizer how well it's doing.
teleprompter = BootstrapFewShot(metric=gsm8k_metric, **config)
optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset)
def run(question: str) -> dspy.Prediction:
return optimized_cot(question=question)
Retrieve your Ingestion key from the settings page and set it as an environment variable:
export AUTOBLOCKS_INGESTION_KEY=...
Run your tests again:
npx autoblocks testing exec -m "My second run" -- poetry run start
Inside the Autoblocks UI, you can now open events for a test case to view the underlying LLM calls that DSPy made.
Step 4: Add Autoblocks Config
Next, you can add an Autoblocks Config to enable collaborative editing and versioning of the different configuration parameters for your application.
First, create a new config in Autoblocks (https://app.autoblocks.ai/configs/create)
Name your config dspy
and set up the following parameters:
Parameter Name | Type | Default | Values |
---|---|---|---|
model | enum | gpt-3.5-turbo | gpt-3.5-turbo, gpt-4, gpt-4-turbo |
max_bootstrapped_demos | number | 4 | |
max_labeled_demos | number | 4 | |
max_rounds | number | 1 | |
max_errors | number | 5 |
Create a file named config.py
inside of the autoblocks_dspy
directory. This script defines the configuration model in code using Pydantic and loads the remote config from Autoblocks.
# autoblocks_dspy/config.py
import pydantic
from autoblocks.configs.config import AutoblocksConfig
from autoblocks.configs.models import RemoteConfig
class ConfigValue(pydantic.BaseModel):
model: str
max_bootstrapped_demos: int
max_labeled_demos: int
max_rounds: int
max_errors: int
class Config(AutoblocksConfig[ConfigValue]):
pass
config = Config(
value=ConfigValue(
model="gpt-4-turbo",
max_bootstrapped_demos=4,
max_labeled_demos=4,
max_rounds=10,
max_errors=5,
),
)
config.activate_from_remote(
config=RemoteConfig(
id="dspy",
# Note we are using dangerously use undeployed here
# Once you deploy your config
# you can update this to use a deployed version
dangerously_use_undeployed_revision="latest"
),
parser=ConfigValue.model_validate,
)
Update autoblocks_dspy/run.py
to use the newly created config. This ensures your application uses the configuration parameters defined in Autoblocks:
# autoblocks_dspy/run.py
import os
import uuid
import dspy
from dspy.datasets.gsm8k import GSM8K, gsm8k_metric
from dspy.teleprompt import BootstrapFewShot
from autoblocks.tracer import AutoblocksTracer
from autoblocks_dspy.config import config
tracer = AutoblocksTracer(
os.environ["AUTOBLOCKS_INGESTION_KEY"],
)
class LoggingOpenAI(dspy.OpenAI):
"""Extend the OpenAI class from DSPy to log requests and responses to Autoblocks."""
def basic_request(self, prompt: str, **kwargs):
trace_id = str(uuid.uuid4())
tracer.send_event(
"ai.request",
trace_id=trace_id,
properties={
"prompt": prompt,
**self.kwargs,
**kwargs,
},
)
response = super().basic_request(prompt=prompt, **kwargs)
tracer.send_event(
"ai.response",
trace_id=trace_id,
properties=response,
)
return response
# Set up the LM
turbo = LoggingOpenAI(model=config.value.model, max_tokens=250)
dspy.settings.configure(lm=turbo)
# Load math questions from the GSM8K dataset
gsm8k = GSM8K()
gsm8k_trainset = gsm8k.train[:10]
class CoT(dspy.Module):
def __init__(self):
super().__init__()
self.prog = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.prog(question=question)
# Optimize! Use the `gsm8k_metric` here. In general, the metric is going to tell the optimizer how well it's doing.
teleprompter = BootstrapFewShot(
metric=gsm8k_metric,
max_bootstrapped_demos=config.value.max_bootstrapped_demos,
max_labeled_demos=config.value.max_labeled_demos,
max_rounds=config.value.max_rounds,
max_errors=config.value.max_errors,
)
optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset)
def run(question: str) -> dspy.Prediction:
return optimized_cot(question=question)
Run your tests!
npx autoblocks testing exec -m "My third run" -- poetry run start
You can now edit your config in the Autoblocks UI and maintain a history of all changes you’ve made.
What’s Next?
Congratulations on integrating DSPy and Autoblocks into your workflow!
With this robust setup, you can track and refine your AI applications effectively.
To further enhance your development process and foster collaboration across your entire team, consider integrating your application with a Continuous Integration (CI) system.
The video below demonstrates how any team member can tweak configurations and test your application within Autoblocks—without needing to set up a development environment. This capability not only simplifies the process but also ensures that your AI development is more accessible and inclusive.
About Autoblocks
Autoblocks enables teams to continuously improve their scaling AI-powered products with speed and confidence.
Product teams at companies of all sizes—from seed-stage startups to multi-billion dollar enterprises—use Autoblocks to guide their AI development efforts.
The Autoblocks platform revolves around outcomes-oriented development. It marries product analytics with testing to help teams laser-focus their resources on making product changes that improve KPIs that matter.
About DSPy
DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline.