A research lab · Athens / Montreal

The next models will need to move.
We are capturing the data to teach them how.

Language models learned from text the internet had already written down. The next generation of embodied models needs something the internet does not have: high-fidelity recordings of skilled human craft. TurboTune builds those datasets — beginning in the kitchen, where the problem is hardest and the experts are best.

Cookbooks were never the recipe.

The hard part of a craft is the part its practitioners can't put into words.

A chef who has worked a station for ten years knows things that no recipe describes: how the sound of garlic shifts a moment before it burns, the resistance of dough that's been kneaded enough, the exact pressure that separates fillet from bone. This is what philosophers call techne — knowledge that lives in the hands and the senses, not on the page.

Foundation models trained on text have run out of text. The frontier is moving toward embodied, multimodal systems — robots, agents that act in the world — and the bottleneck is no longer compute or architecture. It is data. Specifically: paired vision, motion, force, audio, and intent, captured at the resolution at which skill actually unfolds.

TurboTune is a research lab building those datasets. We capture how experts actually work — at a fidelity that is trainable today and will be more valuable tomorrow than any model trained on it.

Four things the existing manipulation datasets get wrong.

→ 01

Synchronized multimodality

Every stream — RGB-D, body and hand pose, instrumented utensils, ambient and contact audio, thermal — is captured to a shared timeline at sub-10ms. Modalities are designed in together, not bolted on.

→ 02

Force, not just kinematics

Position alone teaches a model where a hand moves. It cannot teach the difference between dicing an onion and crushing it. We instrument the tools and the surfaces so contact dynamics are first-class data.

→ 03

Skills before recipes

A model cannot learn an entire dish from fifty examples. It can learn atomic skills — knife work, searing, folding, plating — composed into recipes. We design our taxonomy and capture protocol around that.

→ 04

Failures included

Most datasets capture only the successful take. We capture the broken sauce, the over-reduced glaze, the recovery — because that is where chefs' judgment is most visible, and where future systems will need it most.

What a single TurboTune station records.

The full station specification, calibration protocol, and annotation schema are shared with research partners under agreement. A high-level summary:

  • VisionMulti-view RGB-D + first-person + wristOverhead, lateral, chef's-eye, and hand-mounted streams.
  • MotionMarkerless body + inertial glovesRobust through occlusion, splash, and speed.
  • ForceInstrumented tools and surfacesKnives, spatulas, boards, pans — every contact a measured signal.
  • ThermalCooktop and surface IRPan temperature, oil readiness, doneness gradients.
  • AudioStation array + chef lavalierSizzle, knife cadence, narration captured and transcribed.
  • LanguageStructured chef narrationReal-time verbalization grounded in the action that produced it.
Station — schematic
CAPTURE STATION2.4m x 0.8mCHEFRGB-D 01RGB-D 02RGB-D 03RGB-D 04THERMALFSR ARRAY+ inertial gloves · lavalier · array micoverhead view

The kitchen is the hardest manipulation problem with the best teachers.

We are not a food company. We are a manipulation lab. Cooking is the wedge because it concentrates every problem worth solving into one workspace.

Bimanual, deformable, time-bound.

Almost every other manipulation benchmark uses rigid objects. Kitchens force the hard cases — dough, leaves, liquids, raw protein — under irrecoverable time pressure.

Experts are everywhere.

You cannot easily recruit a hundred neurosurgeons. You can recruit a hundred chefs. The talent density is uniquely available, and the variation between traditions is itself signal.

The transfer is broad.

A model that learns to debone a fish has learned about delicate force, tool use, and visual state estimation — all of which transfer to surgery, manufacturing, and care work.

Four kinds of partners. Distinct conversations.

For investors

A defensible data position.

Compute and architecture commoditize. Proprietary, high-fidelity skill data does not. We are happy to walk through the thesis and the moat.

For research labs

Datasets and benchmarks.

Licensed access to capture data, evaluation suites, and station replication kits for academic and industrial research groups.

For chefs

Your craft, preserved.

Compensated, attributed, and protected. We are building the archive your technique deserves — with you as a participant, not a subject.

For platform builders

Model and integration partnerships.

For humanoid, manipulator, and VLA teams who need real-world, real-skill data to push past benchmark saturation.

A short, honest status.

Pilot capture station
First instrumented station coming online. Initial chef partners onboarding.
Active
Skill taxonomy v1
Atomic-skill ontology and annotation schema in iteration with practitioner input.
Active
First baseline benchmark
Single bimanual skill, multiple chefs, public evaluation harness.
Q3
Multi-station deployment
Athens flagship, with planned secondary capture sites in partner kitchens.
Late 2026
Open subset release
A research-grade slice published openly to seed community work.
Planning

If any of this fits your work,
we should talk.

We keep conversations short and concrete. Tell us who you are and what you're working on.