We present the design, integration, and evaluation of a full-stack robotic system called RoMan, which can conduct autonomous field operations involving physical interaction with its environment. RoMan offers autonomous behaviors that can be triggered from succinct, high-level human input such as “open this box and retrieve the bag inside.” The robot’s behaviors are driven by a set of planners and controllers grounded in perceptual reconstructions of the environment. These behaviors are articulated by a behavior tree that translates high-level operator input into programs of increasing sensorimotor expressiveness, ultimately driving the lowest-level controllers. The software system is implemented in ROS as a set of independent processes connected by synchronous and asynchronous communication, and distributed across two on-board planning/control computers. The behavior stack drives a novel platform consisting of a pair of custom, 500 Nm/axis manipulators mounted on a rotatable torso aboard a tracked platform. The robot’s head is equipped with forward-looking depth cameras, and the arms carry wrist-mounted force-torque sensors and a mix of three- and four-finger grippers. We discuss design and implementation trade-offs affecting the entire hardware-software stack and high-level manipulation behaviors. We also demonstrate the applicability of the system for solving two manipulation tasks: 1) removing heavy debris from a roadway, where 64% of end-to-end autonomous runs required at most one human intervention; and 2) retrieving an item from a closed container, with a fully autonomous success rate of 56%. Finally, we indicate lessons learned and suggest outstanding research problems.