tl;dr: Today, we’re launching AI Product Playground, a tool for collaboratively evaluating and improving the quality of your AI products.
By enabling teams to inject domain expertise where it's needed most — including in prompt engineering, evaluations, and test case curation — AI Product Playground lets teams move faster and with more confidence.
Playgrounds make AI development more accessible
Subject-matter experts play a critical role in AI product development.
These experts – often non-technical – usually have the best understanding of what “good” looks like, ensuring that LLM response quality meets the mark.
They can also make more incisive improvements to prompts, saving teams valuable time and money.
With this in mind, it’s no surprise that prompt playgrounds have exploded in popularity. They offer SMEs a quick and easy way to test different prompts against LLMs.Â
They’re easy to use, provide instant feedback, and don’t require technical skills.
But prompt playgrounds have a big problem…
AI product teams don’t trust their outputs!
Traditional playgrounds are too naive
Fancy sliders and toggles don’t change the hard truth:
Even the most sophisticated prompt playgrounds represent a simple prompt to model interaction.
Over the past year, we've spoken to over 200 product teams and learned that the vast majority of them either didn't use prompt playgrounds at all or considered them inconsequential to their development.
The number one reason why? Responses from prompt playgrounds don’t accurately represent outputs from their AI products.
This is because AI product pipelines comprise several parts that shape the model input and response. These parts include: pre-processing, prompt chaining, and retrieval systems (like RAG).
Modifications to any part of this pipeline can drastically change the resulting output.
Prompt playgrounds only capture a sliver of an overall AI product pipeline, so their outputs become increasingly unreliable as a pipeline matures.
The unfortunate side effect of a reduced reliability of prompt playgrounds is that collaboration takes a big hit.
SMEs are sidelined and engineering is forced to develop in a silo, or teams create hacky workarounds that leave teams on edge, hoping their code doesn’t break.
Product velocity slows down, and quality degrades.
The dilemma: robustness or collaboration
We hear all the time: AI product teams have to choose between the robustness of their AI product pipeline and facilitating collaborative workflows.
Developers are frustrated because:
They’re unable to involve SMEs, who are better qualified to engineer prompts and evaluate output quality
They have to manage clunky prompt testing workflows that make them a bottleneck in prompt testing
They’ve exposed their application to vulnerabilities by hacking together collaborative surfaces for SMEs
Non-technical stakeholders are frustrated because:
Prompt playgrounds aren’t giving them reliable outputs
They have to go through Engineering to test prompt changes through their pipeline
Learnings from experiments are falling through the cracks because of fragmented tooling
This ends up being a costly compromise. Collaboration isn't just a nice-to-have…
Collaboration is the most important factor in determining the velocity and reliability of AI product evaluation.
Most teams understand this cost, but settle for the status quo due to a lack of better alternatives.
AI Product Playground: Collaboratively evaluate your entire AI product pipeline
We’re excited to share that AI product teams no longer have to make the false choice between robustness and collaboration.
Today, we’re introducing the Autoblocks AI Product Playground – a tool for product teams to collaboratively evaluate and manage parts of their AI product pipeline.
The AI Product Playground enables teams to inject domain expertise where it's needed most — in areas like in prompt engineering, output quality evaluation, and test case curation — enabling them to move faster and with more confidence.
Developers can use our TypeScript and Python SDKs to surface any part of their AI product pipeline to a customizable and collaborative UI. This allows them to maintain full control over the underlying code, while enabling seamless collaboration with non-technical stakeholders.
The Autoblocks Product Playground is:
Collaborative: Non-technical stakeholders can modify and test changes to these parameters, including prompts, from the UI.
Robust: Changes are automatically versioned and protected from backward incompatibility, ensuring robustness and code safety.
Reliable: Tests run through your entire AI product system, in your own codebase, so outputs are representative of what your users would see in production.
And it retains the parts of prompt playgrounds that everyone loves: simplicity, instant feedback, and accessibility.
The collaborative evaluation platform
At Autoblocks, we believe speed and reliability in AI product development are byproducts of effective collaboration.
The best AI product teams have highly collaborative workflows, allowing different stakeholders to optimally contribute to the process.
Conversely, teams that struggle with speed or reliability often suffer from poor collaboration between developers and non-technical stakeholders.
That’s why we’ve embedded seamless collaboration into the entire Autoblocks platform:
Human reviews & alignment
Autoblocks enables SMEs to easily review and evaluate model outputs. We help you align expert opinions with automated evaluator results, ensuring their continued reliability.
Test case curation
Autoblocks makes it simple to curate relevant test cases based on clusters of real user interactions. You can even use AI to generate additional synthetic test cases.
Autoblocks CLI
For developers, our CLI streamlines evaluator setup and interaction with the Autoblocks platform and API directly from the terminal.
Get started
By making collaboration central to the AI product evaluation process, Autoblocks empowers teams to improve AI products more quickly and reliably.
We can't wait to see what you create with the new AI Product Playground.