Upcube ResearchSafety-first, reviewable systems

Pioneering research toward useful, safe, long-context systems

We explore models and methods that help people solve real problems: assistants that keep long context, cite sources, call tools, speak naturally, and generate on-brand visuals—while staying reviewable and safe.

View research index Learn about safety

256K contextGrounded search & citationsTool calling & agentsVoice & imagesSafety & evaluations

Focus areas

The streams we publish, evaluate, and ship—optimized for usefulness, reliability, and safety.

Built for production, tested like research.

Language & Interaction (Upcube-Instruct)

Long-context dialogue and structured outputs (JSON/Markdown/LaTeX).
Grounded answers with quote-level citations and uncertainty notes.
Multi-turn state tracking across help, research, and build flows.

Reasoning & Planning (Upcube-Base + Agents)

Tool choice and multi-step execution for real tasks.
Program synthesis, code repair, and data pipelines.
Evaluation suites for plan quality, robustness, and cost.

Vision & Image Generation

Photorealistic & stylized outputs with subject/style controls.
Brand anchors & captioning for accessibility.
Auditability: prompt/seed/config provenance.

Voice & Audio

Low-latency, full-duplex conversations with barge-in.
Expressive synthesis with controllable prosody.
ASR robustness, diarization, and privacy modes.

Safety & Alignment

Policy-tuned prompts, refusal calibration, and evals.
Red-teaming for sensitive domains; incident playbooks.
Role-based controls, audit logs, and data governance.

Systems & Efficiency

Mixture-of-Experts with ~32B activated parameters for efficiency.
Token-efficient pretraining: stability via optimizer & clipping.
Throughput/latency work on vLLM, TensorRT-LLM, and SGLang.

How we do research

We treat safety and reliability as engineering disciplines—measurable, testable, and continuously improved.

Training & Post-training

Mixture-of-Experts with routing tuned for stability and cost.
Post-training with preference data and rubric-based feedback.
On-policy rollouts for verifiable rewards (math, coding) and rubric-judged tasks.

Evaluations

Knowledge, reasoning, math/STEM, coding, and tool-use suites.
Grounding: citation precision/recall, quote accuracy, coverage.
Safety: refusal calibration, bias audits, privacy leak tests.

Reproducibility & Reporting

Seeds, configs, and environment manifests captured per run.
Action logs: tools called, parameters, and evidence paths.
Exportable artifacts: HTML/PDF reports and .yaml pipelines.

Deployment & Systems

Inference engines: vLLM, SGLang, KTransformers, TensorRT-LLM.
Latency-first voice paths and streaming JSON outputs.
Multi-channel delivery with RBAC and audit trails.

Research perspective: “Safely aligning capable systems is a scientific challenge and an engineering discipline. Our aim is practical reliability: assistants that cite sources, ask before acting, and make it easy to see what happened and why.”

— Upcube Research Team

Streams & artifacts

Each stream ships repeatable artifacts—benchmarks, example pipelines, incident templates, and reproducible writeups.

Language & Interaction›

Instruction following, structured outputs, long-context dialogue, and grounded generation with citations.

Reasoning & Planning›

Tool-using agents, robust planning, program synthesis, and multi-step task execution.

Vision & Image›

Image generation and editing with controllable style/subject constraints and provenance.

Voice & Audio›

Realtime speech, diarization, low-latency turn-taking, and privacy-preserving pipelines.

Safety & Alignment›

Policy calibration, red-team methods, governance, and incident readiness.

Systems & Efficiency›

Serving stacks, optimization, cost-aware routing, and reliable deployment patterns.

Recent research highlights

Upcube context upgrade

New weights unlock 256K context across chat, search, voice, and images—longer threads, larger inputs, clearer answers.

Agentic evaluation toolkit

Benchmarks for tool selection, plan repair, and recovery from API errors—reporting success@k and cost curves.

Grounded search improvements

Better quote attribution, duplicate collapse, and coverage metrics (precision/recall vs. curated corpora).

Get involved

Collaborations

Join evaluations and build the tooling that makes systems safer and easier to audit.

Research evaluations and red-team exercises.
Domain-specific datasets and ground truth building.
Tool/plugin ecosystems for agentic tasks.

Open materials

Notes, benchmarks, and reproducible examples you can run and verify.

Research notes, benchmarks, and example pipelines.
Reproduction guides and config packs.
Safety checklists and incident templates.

Contact Upcube Research

Send a note

Share context, goals, and what you want to evaluate or build.

Where to find us

For safety, privacy, and evaluation requests, include your timeline and deployment context.

New York, NY 10005 · USA

[email protected]

Quick links

If you’re reporting an issue, include: reproduction steps, affected endpoints/pages, timestamps, and any logs you can share.