Join the waitlist for Windows

Infrastructure for verified CUA datasets.

Turning real desktop workflows into verified CUA training data.

Desktop is a Windows app where signed-in users capture local workflows, run computer-use agents, and manage workflow data for training, evaluation, and RL.

For frontier labs that need private and operational desktop workflow data.

workflow evidence pipeline

real_workflow_to_verified_task_package

Real workflow

human run

Task package

trajectory + outcome

Verifier audit

pass/fail + reward

Eval report

pass@k + failures

what is desktop?

From local workflows to CUA-ready datasets.

Desktop is an end-to-end platform for turning desktop workflows into verified CUA training data.

Teams can capture workflows, normalize traces, label outcomes with verifiers, train models, evaluate performance, and run computer-use agents from one UI.

what makes data good?

Computer-use agents need real workflow data.

Synthetic tasks do not teach agents how real work breaks.

The strongest signal does not come from generic synthetic data or benchmark-shaped tasks. It comes from real workflows: people solving real problems across messy tools, changing screens, mistakes, corrections, and final outcomes.

That is the data Desktop is built to produce.

synthetic tasks

clean DOM
known state
toy workflow
vs

verified workflow packages

task goal
action trajectory
verifier
eval / RL signal

source

PDFs / Excel / portals

traces

screens + actions

failures

model weak points

reward

verified outcomes

Does the model improve?

Better workflow data. Better CUA models.

Desktop turns operational workflows into verified training and evaluation data, so a base model can improve on the same kind of task.

The hard part is not recording workflows. It is verifying the unverifiable: proving what happened, whether it worked, and whether the model improved.

same workflow family

Base model to workflow-trained

+34 pts
Measured improvement from verified workflow data Line chart with workflow data on the x-axis and task success on the y-axis. 0% 50% 100% Task success Workflow data 52%Base 86%Desktop data
Capture
Verify
Improve
engagement

Pricing.

Individual

20 USD
Get started
  • Windows local runtime
  • Routine and skill capture
  • Exportable workflow packages

Teams

Custom
Contact us
  • Team workflow capture
  • Verifier-ready task packages
  • Model eval reports

Enterprise

Custom
Contact us
  • Private workflow data supply
  • Custom verifier authoring
  • Eval and RL-ready datasets

Want to inspect the package shape first? Explore examples.

faq

Frequently asked questions.

Do I need Desktop? +

Yes if real workflow data matters. Individuals can train agents on personal routines or package valuable workflow data. Teams can let non-developers keep working while Desktop turns their work into clean CUA trajectories and verifier-ready task packages. Enterprises can use Desktop when they need recurring, verified workflow data for post-training, evals, and RL.

How do customers work with Desktop? +

There are two paths. Teams can use Desktop as end-to-end workflow data infrastructure: employees keep doing real work while Desktop captures, normalizes, labels, verifies, and packages it for CUA post-training. Or UseDesktop can supply high-quality verified workflow datasets with proof: active testing, verifier audits, model comparisons, failure traces, and eval/RL signals.

What does Desktop produce? +

Verified CUA workflow packages: task definitions, human trajectories, UI states, outcomes, verifiers, eval reports, failure traces, and reward signals.

Why not just use synthetic tasks? +

Synthetic tasks are usually short, clean, and known-state. Real desktop work is long-horizon, messy, and spread across PDFs, Excel, portals, files, and legacy apps.

What makes a package verified? +

Every package ties the task goal to a trajectory, final outcome, verifier, failure cases, and scoring signal so model attempts can be evaluated instead of eyeballed.

How do you prove the data is good? +

We measure solvability, ambiguity, verifier false positives and negatives, pass@k across models, failure modes, and contamination risk.

Is this a one-off dataset? +

No. The goal is recurring workflow data capacity: as apps and workflows change, the task packages, verifiers, eval sets, and training signals keep improving.