Upcube ResearchSafety-first, reviewable systems

Pioneering research toward useful, safe, long-context systems

We explore models and methods that help people solve real problems: assistants that keep long context, cite sources, call tools, speak naturally, and generate on-brand visuals—while staying reviewable and safe.

256K contextGrounded search & citationsTool calling & agentsVoice & imagesSafety & evaluations

Focus areas

The streams we publish, evaluate, and ship—optimized for usefulness, reliability, and safety.

Language & Interaction (Upcube-Instruct)

  • Long-context dialogue and structured outputs (JSON/Markdown/LaTeX).
  • Grounded answers with quote-level citations and uncertainty notes.
  • Multi-turn state tracking across help, research, and build flows.

Reasoning & Planning (Upcube-Base + Agents)

  • Tool choice and multi-step execution for real tasks.
  • Program synthesis, code repair, and data pipelines.
  • Evaluation suites for plan quality, robustness, and cost.

Vision & Image Generation

  • Photorealistic & stylized outputs with subject/style controls.
  • Brand anchors & captioning for accessibility.
  • Auditability: prompt/seed/config provenance.

Voice & Audio

  • Low-latency, full-duplex conversations with barge-in.
  • Expressive synthesis with controllable prosody.
  • ASR robustness, diarization, and privacy modes.

Safety & Alignment

  • Policy-tuned prompts, refusal calibration, and evals.
  • Red-teaming for sensitive domains; incident playbooks.
  • Role-based controls, audit logs, and data governance.

Systems & Efficiency

  • Mixture-of-Experts with ~32B activated parameters for efficiency.
  • Token-efficient pretraining: stability via optimizer & clipping.
  • Throughput/latency work on vLLM, TensorRT-LLM, and SGLang.

How we do research

We treat safety and reliability as engineering disciplines—measurable, testable, and continuously improved.

Training & Post-training

  • Mixture-of-Experts with routing tuned for stability and cost.
  • Post-training with preference data and rubric-based feedback.
  • On-policy rollouts for verifiable rewards (math, coding) and rubric-judged tasks.

Evaluations

  • Knowledge, reasoning, math/STEM, coding, and tool-use suites.
  • Grounding: citation precision/recall, quote accuracy, coverage.
  • Safety: refusal calibration, bias audits, privacy leak tests.

Reproducibility & Reporting

  • Seeds, configs, and environment manifests captured per run.
  • Action logs: tools called, parameters, and evidence paths.
  • Exportable artifacts: HTML/PDF reports and .yaml pipelines.

Deployment & Systems

  • Inference engines: vLLM, SGLang, KTransformers, TensorRT-LLM.
  • Latency-first voice paths and streaming JSON outputs.
  • Multi-channel delivery with RBAC and audit trails.

Research perspective: “Safely aligning capable systems is a scientific challenge and an engineering discipline. Our aim is practical reliability: assistants that cite sources, ask before acting, and make it easy to see what happened and why.”

— Upcube Research Team

Streams & artifacts

Each stream ships repeatable artifacts—benchmarks, example pipelines, incident templates, and reproducible writeups.

Language & Interaction

Instruction following, structured outputs, long-context dialogue, and grounded generation with citations.

Reasoning & Planning

Tool-using agents, robust planning, program synthesis, and multi-step task execution.

Vision & Image

Image generation and editing with controllable style/subject constraints and provenance.

Voice & Audio

Realtime speech, diarization, low-latency turn-taking, and privacy-preserving pipelines.

Safety & Alignment

Policy calibration, red-team methods, governance, and incident readiness.

Systems & Efficiency

Serving stacks, optimization, cost-aware routing, and reliable deployment patterns.

Recent research highlights

Upcube context upgrade

New weights unlock 256K context across chat, search, voice, and images—longer threads, larger inputs, clearer answers.

Agentic evaluation toolkit

Benchmarks for tool selection, plan repair, and recovery from API errors—reporting success@k and cost curves.

Grounded search improvements

Better quote attribution, duplicate collapse, and coverage metrics (precision/recall vs. curated corpora).

Get involved

Collaborations

Join evaluations and build the tooling that makes systems safer and easier to audit.

  • Research evaluations and red-team exercises.
  • Domain-specific datasets and ground truth building.
  • Tool/plugin ecosystems for agentic tasks.

Open materials

Notes, benchmarks, and reproducible examples you can run and verify.

  • Research notes, benchmarks, and example pipelines.
  • Reproduction guides and config packs.
  • Safety checklists and incident templates.

Contact Upcube Research

Send a note

Share context, goals, and what you want to evaluate or build.

No spam. We reply to serious requests.

Where to find us

For safety, privacy, and evaluation requests, include your timeline and deployment context.

New York, NY 10005 · USA
If you’re reporting an issue, include: reproduction steps, affected endpoints/pages, timestamps, and any logs you can share.