Terminology Reference: Vision-Language-Action (VLA) Systems

This document defines key terms used throughout Module 4 to ensure consistency across all chapters.

Core Terms

VLA (Vision-Language-Action)

An integrated system connecting vision (perception), language (cognition), and action (physical execution) in embodied robots.

Intent

The interpreted purpose or goal extracted from a natural language command, processed by the cognitive planner to generate action sequences.

Planner

An LLM-based component that translates high-level goals into executable action sequences considering physical constraints. The planner is distinct from controllers that execute actions.

Action Sequence

A series of executable steps generated by the cognitive planner to achieve a specific goal, respecting physical constraints and safety requirements.

Embodiment

The concept that AI systems behave differently when physically situated in the real world, as opposed to disembodied systems. Embodied systems must account for physical laws, sensor noise, actuator limitations, and safety considerations.

Cognitive Planning

The process by which LLMs function as high-level planners that generate task plans, rather than directly controlling robot actuators. This maintains separation between cognition and execution.

Voice-to-Action Pipeline

A processing chain converting spoken commands to robotic actions through speech recognition, intent mapping, and safety validation before action execution.

Autonomous Humanoid Architecture

The complete system design integrating perception → cognition → action flow for independent robot operation, incorporating all VLA system components.

Important Distinctions

Planner vs. Controller: LLMs perform cognition and planning; ROS 2 performs execution
Embodied vs. Disembodied: Physical constraints fundamentally alter how AI systems behave
Conceptual vs. Implementation: Focus on system-level understanding rather than technical implementation details

Core Terms​

VLA (Vision-Language-Action)​

Intent​

Planner​

Action Sequence​

Embodiment​

Cognitive Planning​

Voice-to-Action Pipeline​

Autonomous Humanoid Architecture​

Important Distinctions​