Pioneering research toward useful, safe, long-context systems
We explore models and methods that help people solve real problems: assistants that keep long context, cite sources, call tools, speak naturally, and generate on-brand visuals—while staying reviewable and safe.
Focus areas
The streams we publish, evaluate, and ship—optimized for usefulness, reliability, and safety.
Language & Interaction (Upcube-Instruct)
- Long-context dialogue and structured outputs (JSON/Markdown/LaTeX).
- Grounded answers with quote-level citations and uncertainty notes.
- Multi-turn state tracking across help, research, and build flows.
Reasoning & Planning (Upcube-Base + Agents)
- Tool choice and multi-step execution for real tasks.
- Program synthesis, code repair, and data pipelines.
- Evaluation suites for plan quality, robustness, and cost.
Vision & Image Generation
- Photorealistic & stylized outputs with subject/style controls.
- Brand anchors & captioning for accessibility.
- Auditability: prompt/seed/config provenance.
Voice & Audio
- Low-latency, full-duplex conversations with barge-in.
- Expressive synthesis with controllable prosody.
- ASR robustness, diarization, and privacy modes.
Safety & Alignment
- Policy-tuned prompts, refusal calibration, and evals.
- Red-teaming for sensitive domains; incident playbooks.
- Role-based controls, audit logs, and data governance.
Systems & Efficiency
- Mixture-of-Experts with ~32B activated parameters for efficiency.
- Token-efficient pretraining: stability via optimizer & clipping.
- Throughput/latency work on vLLM, TensorRT-LLM, and SGLang.
How we do research
We treat safety and reliability as engineering disciplines—measurable, testable, and continuously improved.
Training & Post-training
- Mixture-of-Experts with routing tuned for stability and cost.
- Post-training with preference data and rubric-based feedback.
- On-policy rollouts for verifiable rewards (math, coding) and rubric-judged tasks.
Evaluations
- Knowledge, reasoning, math/STEM, coding, and tool-use suites.
- Grounding: citation precision/recall, quote accuracy, coverage.
- Safety: refusal calibration, bias audits, privacy leak tests.
Reproducibility & Reporting
- Seeds, configs, and environment manifests captured per run.
- Action logs: tools called, parameters, and evidence paths.
- Exportable artifacts: HTML/PDF reports and .yaml pipelines.
Deployment & Systems
- Inference engines: vLLM, SGLang, KTransformers, TensorRT-LLM.
- Latency-first voice paths and streaming JSON outputs.
- Multi-channel delivery with RBAC and audit trails.
Research perspective: “Safely aligning capable systems is a scientific challenge and an engineering discipline. Our aim is practical reliability: assistants that cite sources, ask before acting, and make it easy to see what happened and why.”
— Upcube Research Team
Streams & artifacts
Each stream ships repeatable artifacts—benchmarks, example pipelines, incident templates, and reproducible writeups.
Language & Interaction›
Instruction following, structured outputs, long-context dialogue, and grounded generation with citations.
Reasoning & Planning›
Tool-using agents, robust planning, program synthesis, and multi-step task execution.
Vision & Image›
Image generation and editing with controllable style/subject constraints and provenance.
Voice & Audio›
Realtime speech, diarization, low-latency turn-taking, and privacy-preserving pipelines.
Safety & Alignment›
Policy calibration, red-team methods, governance, and incident readiness.
Systems & Efficiency›
Serving stacks, optimization, cost-aware routing, and reliable deployment patterns.
Recent research highlights
Upcube context upgrade
New weights unlock 256K context across chat, search, voice, and images—longer threads, larger inputs, clearer answers.
Agentic evaluation toolkit
Benchmarks for tool selection, plan repair, and recovery from API errors—reporting success@k and cost curves.
Grounded search improvements
Better quote attribution, duplicate collapse, and coverage metrics (precision/recall vs. curated corpora).
Get involved
Collaborations
Join evaluations and build the tooling that makes systems safer and easier to audit.
- Research evaluations and red-team exercises.
- Domain-specific datasets and ground truth building.
- Tool/plugin ecosystems for agentic tasks.
Open materials
Notes, benchmarks, and reproducible examples you can run and verify.
- Research notes, benchmarks, and example pipelines.
- Reproduction guides and config packs.
- Safety checklists and incident templates.
Contact Upcube Research
Send a note
Share context, goals, and what you want to evaluate or build.
Where to find us
For safety, privacy, and evaluation requests, include your timeline and deployment context.