Tactile and kinesthetic perceptions are crucial for human dexterous manipulation, enabling reliable grasping of objects via proprioceptive sensorimotor integration. For robotic hands, even though acquiring such tactile and kinesthetic feedback is feasible, establishing a direct mapping from this sensory feedback to motor actions remains challenging. In this article, we propose a novel glove-mediated tactile–kinematic perception–prediction framework for grasp skill transfer from human intuitive and natural operation to robotic execution based on imitation learning, and its effectiveness is validated through generalized grasping tasks, including those involving deformable objects. First, we integrate a data glove to capture tactile and kinesthetic data at the joint level. The glove is adaptable for both human and robotic hands, allowing data collection from natural human hand demonstrations across different scenarios. It ensures consistency in the raw data format, enabling evaluation of grasping for both human and robotic hands. Second, we establish a unified representation of multimodal inputs based on graph structures with polar coordinates. We explicitly integrate the morphological differences into the designed representation, enhancing the compatibility across different demonstrators and robotic hands. Furthermore, we introduce the tactile–kinesthetic spatio-temporal graph networks, which leverage multidimensional subgraph convolutions and attention-based long short-term memory (LSTM) layers to extract spatio-temporal features from graph inputs to predict node-based states for each hand joint. These predictions are then mapped to final commands through a force-position hybrid mapping. Comparative experiments and ablation studies demonstrate that our approach surpasses other methods in grasp success rate, finger coordination, contact force management, and both grasp and computational efficiency, achieving results most akin to human grasping. The robustness of our approach is also validated through multiple randomized experimental setups, and its generalization capability is tested across diverse objects and robotic hands.
Nonprehensile actions, such as pushing, are crucial for addressing multiobject rearrangement problems. Many traditional methods generate robot-centric actions, which differ from intuitive human strategies and are typically inefficient. To this end, we adopt an object-centric planning paradigm and propose a unified framework for addressing a range of large-scale, physics-intensive nonprehensile rearrangement problems challenged by modeling inaccuracies and real-world uncertainties. By assuming that each object can actively move without being driven by robot interactions, our planner first computes desired object motions, which are then realized through robot actions generated online via a closed-loop pushing strategy. Through extensive experiments and in comparison with state-of-the-art baselines in both simulation and on a physical robot, we show that our object-centric planning framework can generate more intuitive and task-effective robot actions with significantly improved efficiency. In addition, we propose a benchmarking protocol to standardize and facilitate future research in nonprehensile rearrangement.
Cloud robotics allows low-power robots to perform computationally intensive inference tasks by offloading them to the cloud, raising privacy concerns when transmitting sensitive images. Although end-to-end encryption secures data in transit, it does not prevent misuse by inquisitive third-party services since data must be decrypted for processing. This article tackles these privacy issues in cloud-based object detection tasks for service robots. We propose a cotrained encoder-decoder architecture that retains only task-specific features while obfuscating sensitive information, utilizing a novel weak loss mechanism with proposal selection for privacy preservation. A theoretical analysis of the problem is provided, along with an evaluation of the tradeoff between detection accuracy and privacy preservation through extensive experiments on public datasets and a real robot.
Recent work introduced the concept of human teleoperation (HT), where the remote robot typically considered in conventional bilateral teleoperation is replaced by a novice person wearing a mixed-reality head-mounted display and tracking the motion of a virtual tool controlled by an expert. HT has advantages in cost, complexity, and patient acceptance for telemedicine in low-resource communities or remote locations. However, the stability, transparency, and performance of bilateral HT are unexplored. In this article, we, therefore, develop a mathematical model of the HT system using test data. We then analyze various control architectures with this model and implement them with the HT system, testing volunteer operators and a virtual fixture-based simulated patient to find the achievable performance, investigate stability, and determine the most promising teleoperation scheme in the presence of time delays. We show that instability in HT, while not destructive or dangerous, makes the system impossible to use. However, stable and transparent teleoperation is possible with small time delays ($< \text{200}$ ms) through three-channel teleoperation, or with large time delays through model-mediated teleoperation with local pose and force feedback for the novice.
Learning-based motion planning methods have shown significant promise in enhancing the efficiency of traditional algorithms. However, they often face performance degradation in novel environments with drastic scene changes due to the limited generalization ability of deep neural networks. This article introduces a confidence-driven motion planning network (CDMPNet), comprising a feature extraction autoencoder and a confidence-driven sampling network (CDSNet). The autoencoder compresses point clouds into latent vectors. The CDSNet is a closed-form continuous-time neural network, which predicts hyperparameters of an evidential distribution over the subsequent state’s mean and covariance for robot configuration sampling. We also present a CDMPNet-based neural planner and a CDMPNet-guided RRTConnect algorithm. Simulations and ablation studies are conducted on 2-D, 3-D, and 7-D planning tasks to validate the generalization ability of our method. Furthermore, we transfer the approach to a seven-degree-of-freedom Sawyer robotic arm to demonstrate the potential for real-world deployment.
This article introduces GeoVINS, a vision-based navigation framework designed for large-scale global state estimation. By utilizing geographic information from satellite orthoimagery, GeoVINS tackles scale ambiguity and accumulative drift problems inherent in standard visual-inertial simultaneous localization and mapping systems, and provides accurate, robust, and real-time global localization. In particular, to address the challenge of memory explosion issue for large-scale localization, we propose a novel aerial “classify-then-retrieve” aerial visual place recognition (VPR) approach, where geographic locations can be efficiently identified and memory usage can be reduced by 3 to 4 orders of magnitude compared to classical retrieval-based approaches. In addition, a hierarchical geographic data association scheme, enhanced by state-of-the-art deep learning-based feature matching, guarantees high efficiency and robustness against variations in appearance and viewpoint. Relying on the obtained three-dimensional (3-D) geographic information, GeoVINS achieves efficient global state initialization and precise motion tracking. To address GPU limitations in embedded devices, a collaborative CPU–GPU utilization approach is proposed, seamlessly integrating asynchronous global information to eliminate accumulative errors. Relying solely on satellite imagery and without requiring any other prior information, GeoVINS achieves rapid place recognition in unseen environments of city-scale (e.g., 2500 $\text{km}^{2}$) on an embedded computing device with an inference time of $43 \,\mathrm{m}\mathrm{s}$, and performs state estimation at a frequency of $25 \,\mathrm{Hz}$. The system enables autonomous aerial vehicle navigation as an alternative to Global Navigation Satellite System.
We propose a new formulation for the multirobot task allocation problem that incorporates 1) complex precedence relationships between tasks, 2) efficient intratask coordination, and 3) cooperation through the formation of robot coalitions. A task graph specifies the tasks and their relationships, and a set of reward functions models the effects of coalition size and preceding task performance. Maximizing task rewards is NP-hard; hence, we propose network flow-based algorithms to approximate solutions efficiently. A novel online algorithm performs iterative reallocation, providing robustness to task failures and model inaccuracies to achieve higher performance than offline approaches. We comprehensively evaluate the algorithms in a testbed with random missions and reward functions and compare them to a mixed-integer solver and a greedy heuristic. In addition, we validate the overall approach in an advanced simulator, modeling reward functions based on realistic physical phenomena and executing the tasks with realistic robot dynamics. Results establish efficacy in modeling complex missions and efficiency in generating high-fidelity task plans while leveraging task relationships.
Ice conditions often require ships to reduce speed and deviate from their main course to avoid damage to the ship. In addition, broken ice fields are becoming the dominant ice conditions encountered in the Arctic, where the effects of collisions with ice are highly dependent on where contact occurs and on the particular features of the ice floes. In this article, we present AUTO-IceNav, a framework for the autonomous navigation of ships operating in ice floe fields. Trajectories are computed in a receding-horizon manner, where we frequently replan given updated ice field data. During a planning step, we assume a nominal speed that is safe with respect to the current ice conditions, and compute a reference path. We formulate a novel cost function that minimizes the kinetic energy loss of the ship from ship-ice collisions and incorporate this cost as part of our lattice-based path planner. The solution computed by the lattice planning stage is then used as an initial guess in our proposed optimization-based improvement step, producing a locally optimal path. Extensive experiments were conducted both in simulation and in a physical testbed to validate our approach.
Light detection and ranging (LiDAR) point cloud is essential for autonomous vehicles, but motion distortions from dynamic objects degrade the data quality. While previous work has considered distortions caused by ego motion, distortions caused by other moving objects remain largely overlooked, leading to errors in object shape and position. This distortion is particularly pronounced in high-speed environments, such as highways and in multi-LiDAR configurations, a common setup for heavy vehicles. To address this challenge, we introduce HiMo, a pipeline that repurposes scene flow estimation for nonego motion compensation, correcting the representation of dynamic objects in point clouds. During the development of HiMo, we observed that existing self-supervised scene flow estimators often produce degenerate or inconsistent estimates under high-speed distortion. We further propose SeFlow++, a real-time scene flow estimator that achieves state-of-the-art performance on both scene flow and motion compensation. Since well-established motion distortion metrics are absent in the literature, we introduce two evaluation metrics: compensation accuracy at a point level and shape similarity of objects. We validate HiMo through extensive experiments on Argoverse 2, ZOD and a newly collected real-world dataset featuring highway driving and multi-LiDAR-equipped heavy vehicles. Our findings show that HiMo improves the geometric consistency and visual fidelity of dynamic objects in LiDAR point clouds, benefiting downstream tasks, such as semantic segmentation and 3-D detection.
VINGS-Mono is a monocular (inertial) Gaussian splatting (GS) SLAM framework designed for large scenes. The framework comprises four main components: visual-inertial odometry (VIO) front end, 2-D Gaussian map, novel view synthesis (NVS) loop closure, and dynamic eraser. In the VIO front end, RGB frames are processed through dense bundle adjustment and uncertainty estimation to extract scene geometry and poses. Based on this output, the mapping module incrementally constructs and maintains a 2-D Gaussian map. Key components of the 2-D Gaussian map include a sample-based rasterizer, score manager, and pose refinement, which collectively improve mapping speed and localization accuracy. This enables the SLAM system to handle large-scale urban environments with up to 50 million Gaussian ellipsoids. To ensure global consistency in large-scale scenes, we design a loop-closure module, which innovatively leverages the NVS capabilities of GS for loop-closure detection and correction of the Gaussian map. In addition, we propose a dynamic eraser to address the inevitable presence of dynamic objects in real-world outdoor scenes. Extensive evaluations in indoor and outdoor environments demonstrate that our approach achieves localization performance on par with VIO while surpassing recent GS/NeRF SLAM methods. It also significantly outperforms all existing methods in terms of mapping and rendering quality. Furthermore, we developed a mobile app and verified that our framework can generate high-quality Gaussian maps in real time using only a smartphone camera and a low-frequency IMU sensor. To the best of our knowledge, VINGS-Mono is the first monocular Gaussian SLAM method capable of operating in outdoor environments and supporting kilometer-scale large scenes.
Complying with traffic rules is challenging for automated vehicles, as numerous rules need to be considered simultaneously. If a planned trajectory violates traffic rules, it is common to replan a new trajectory from scratch. We instead propose a trajectory repair technique to save computation time. By coupling satisfiability modulo theories with set-based reachability analysis, we determine if and in what manner the initial trajectory can be repaired. Experiments in high-fidelity simulators and in the real world demonstrate the benefits of our proposed approach in various scenarios. Even in complex environments with intricate rules, we efficiently and reliably repair rule-violating trajectories, enabling automated vehicles to swiftly resume legally safe operation in real time.
Coverage control is the problem of navigating a robot swarm to collaboratively monitor features or a phenomenon of interest not known a priori. The problem is challenging in decentralized settings with robots that have limited communication and sensing capabilities. We propose a learnable perception-action-communication (LPAC) architecture for the problem, wherein a convolutional neural network (CNN) processes localized perception; a graph neural network (GNN) facilitates robot communications; finally, a shallow multilayer perceptron computes robot actions. The GNN enables collaboration in the robot swarm by computing what information to communicate with nearby robots and how to incorporate received information. Evaluations show that the LPAC models—trained using imitation learning—outperform standard decentralized and centralized coverage control algorithms. The learned policy generalizes to environments different from the training dataset, transfers to larger environments with more robots, and is robust to noisy position estimates. The results indicate the suitability of LPAC architectures for decentralized navigation in robot swarms to achieve collaborative behavior.
This article deals with large-scale decentralized task allocation problems for multiple heterogeneous robots. One of the grand challenges with decentralized task allocation problems is the NP-hardness for computation and communication. This article proposes a decentralized decreasing threshold task allocation (DTTA) algorithm that enables parallel allocation by leveraging a decreasing threshold to handle the NP-hardness. DTTA can release both computation and communication burdens for multiple robots in a decentralized network. In addition, DTTA provides a theoretical guarantee of the quality of the solution for maximizing submodular utility functions. Theoretical analysis indicates that DTTA can provide an optimality guarantee of $(1-\epsilon)/2$ with computation complexity of $O(\min (r^{2}, \frac{r}{\epsilon }\ln \frac{r}{\epsilon }))$ for each robot, where $\epsilon$ is the parameter controlling the decreasing speed of the threshold, $r$ is the number of tasks. To examine the performance of the proposed algorithm, we conduct numerical simulations based on a multitarget surveillance scenario. Simulation results demonstrate that DTTA delivers comparable solution quality significantly faster than state-of-the-art task allocation algorithms. Its advantages are particularly pronounced in large-scale missions with thousands of tasks and robots.
Most of today’s simultaneous localization and mapping (SLAM) approaches learn the map of the environment in the first stage (referred to as mapping) and subsequently use this static map for planning and navigation. This method is suboptimal in dynamic contexts because changes in the environment can result in poor performance of the localization components essential for loop closure detection and relocalization. To address the limitations of the mapping-navigation dualism, continual SLAM has been proposed, which focuses on methods that can continually update the knowledge of the environment and the corresponding map. However, continual SLAM poses challenges, particularly for real-time navigation of large maps, and many of the existing techniques are not yet mature for practical application. In this article, we present a continual learning approach aimed at accurate and efficient robot localization on large maps, advancing the goal of continual SLAM. Our approach incrementally trains a region prediction neural network to recognize familiar places and preselect a subset of map nodes for localization and map optimization. We integrate this method into RTAB-Map, a well-known graph-based SLAM system, and validate its practical applicability through assessments on several real-world SLAM datasets.
This article presents a task-oriented computational framework to enhance visual-inertial navigation (VIN) in robots, addressing challenges such as limited time and energy resources. The framework strategically selects visual features using a mean squared error (MSE)-based, nonsubmodular objective function and a simplified dynamic anticipation model. To address the NP-hardness of this problem, we introduce four polynomial-time approximation algorithms: a classic greedy method with constant-factor guarantees; a low-rank greedy variant that significantly reduces computational complexity; a randomized greedy sampler that balances efficiency and solution quality; and a linearization-based selector based on a first-order Taylor expansion for near-constant-time execution. We establish rigorous performance bounds by leveraging submodularity ratios, curvature, and elementwise curvature analyses. Extensive experiments on both standardized benchmarks and a custom control-aware platform validate our theoretical results, demonstrating that these methods achieve strong approximation guarantees while enabling real-time deployment.
To empower mobile robots with usable maps as well as highest state estimation accuracy and robustness, we present OKVIS2-X: a state-of-the-art multisensor simultaneous localization and mapping (SLAM) system building dense volumetric occupancy maps, while scalable to large environments and operating in realtime. Our unified SLAM framework seamlessly integrates different sensor modalities: visual, inertial, measured or learned depth, LiDAR, and Global Navigation Satellite System (GNSS) measurements. Unlike most state-of-the-art SLAM systems, we advocate using dense volumetric map representations when leveraging depth or range-sensing capabilities. We employ an efficient submapping strategy that allows our system to scale to large environments, showcased in sequences of up to 9 km. OKVIS2-X enhances its accuracy and robustness by tightly-coupling the estimator and submaps through map alignment factors. Our system provides globally consistent maps, directly usable for autonomous navigation. To further improve the accuracy of OKVIS2-X, we also incorporate the option of performing online calibration of camera extrinsics. Our system achieves the highest trajectory accuracy in EuRoC against state-of-the-art alternatives, outperforms all competitors in the Hilti22 VI-only benchmark, while also proving competitive in the LiDAR version, and showcases state of the art accuracy in the diverse and large-scale sequences from the VBR dataset.
Simultaneous localization and mapping (SLAM) has achieved impressive performance in static environments. However, SLAM in dynamic environments remains an open question. Many methods directly filter out dynamic objects, resulting in incomplete scene reconstruction and limited accuracy of camera localization. The other works express dynamic objects by point clouds, sparse joints, or coarse meshes, which fails to provide a photorealistic representation. To overcome the aforementioned limitations, we propose a photorealistic and geometry-aware red-green-blue-depth (RGB-D) SLAM method based on Gaussian splatting. Our method is composed of three main modules to map the dynamic foreground including nonrigid humans/quadrupeds and rigid items, reconstruct the static background, and localize the camera. To map the foreground, we focus on modeling the deformations and/or motions. We consider the shape priors of humans/quadrupeds and exploit the geometric and appearance constraints of dynamic Gaussians. For background mapping, we design an optimization strategy between neighboring local maps by integrating appearance constraint into geometric alignment. As to camera localization, we leverage both static background and dynamic foreground to increase the number of observations and introduce more constraints. We explore the geometric and appearance constraints by associating 3-D Gaussians with 2-D optical flows and pixel patches. Experiments on extensive real-world datasets demonstrate that our method outperforms state-of-the-art approaches in terms of camera localization and scene mapping.
Contact planning for multilegged robots is a challenging sequential decision-making problem due to the interplay of gaits, footholds, configurations, and physical constraints from both the robot and the environment. Existing multicontact planners often fail to find feasible sequences within a limited time in complex scenarios and to ensure physical possibility. We propose a parallel Monte Carlo tree search-based planner that leverages multiconstraint reachability to efficiently generate physically valid contact sequences. The method accelerates planning through a hash-driven parallel approach, prioritizing promising candidates while pruning trapped nodes via valueless node evaluation. It employs depth-first backup for long-horizon planning and uses virtual loss to balance parallel exploration. To ensure feasible transitions between contact states, we establish comprehensive reachability conditions for multilegged robots, incorporating stability, collision avoidance, kinematics, joint torques, and contact constraints into the planning framework. In experiments in sparse foothold environments, our planner outperforms mainstream contact planning approaches in traversability, solution quality, and physical feasibility, while achieving a competitive planning speed. Furthermore, simulation and hardware validation on hexapod and humanoid robots exhibit successful locomotion across various terrains while satisfying constraints.
Deep Reinforcement Learning (DRL) controllers for quadrupedal locomotion have demonstrated impressive performance on challenging terrains, allowing robots to execute complex skills such as climbing, running, and jumping. However, existing blind locomotion controllers often struggle to ensure safety and efficient traversal through risky gap terrains, which are typically highly complex, requiring robots to perceive terrain information and select appropriate footholds during locomotion accurately. Meanwhile, existing perception-based controllers still present several practical limitations, including a complex multisensor deployment system and expensive computing resource requirements. This article proposes a DRL controller named MAstering Risky Gap Terrains (MARG), which integrates terrain maps and proprioception to dynamically adjust the action and enhance the robot’s stability in these tasks. During the training phase, our controller accelerates policy optimization by selectively incorporating privileged information (e.g., center of mass, friction coefficients) that are available in simulation but unmeasurable directly in real-world deployments due to sensor limitations. We also designed three foot-related rewards to encourage the robot to explore safe footholds. More importantly, a terrain map generation model is proposed to reduce the drift existing in mapping and provide accurate terrain maps using only one LiDAR, providing a foundation for zero-shot transfer of the learned policy. The experimental results indicate that MARG maintains stability in various risky terrain tasks.
The robotic fish of BCF/MPF hybrid propulsion achieves efficient and stable swimming through the synergistic control of pectoral fins and body. However, the control problem has been less studied of fins-body coupling with multiple degrees of freedom. This study focuses on the development and synergistic control of robotic fish with fins-body. First, a robotic fish with pectoral fin and body co-propulsion was designed, and a gait controller of fin-body synergic was constructed by a central pattern generator. Specifically, the control parameters were simplified, and the synergic movement was realized of fin-body coupling with multiple degrees of freedom. Second, the dataset was obtained with computational fluid dynamics simulations and 6-D force sensors, and the offline hydrodynamic model was obtained by bidirectional long short-time memory networks identification, which is the relationship between the control parameters and force/torque of the robotic fish. The model parameters were updated online with experimental data. Finally, a control framework is constructed for offline–online model and event-triggered nonlinear model predictive control, which compensates for the driving force of the robotic fish, achieves tracking trajectory precisely, and reduces the computational cost.
This study introduces a novel climbing strategy, reconfigurable parallel-type cable-driven climbing designed for long-span, large-scale bridge stay cable robotic applications, which has the potential to revolutionize the stay cable inspection and maintenance practice. The proposed methodology features the development of a collaborative climbing robot squad (CCRobot-S), which builds upon the design principles of the previous CCRobot series. In this study, CCRobot-S implements a parallel-type cable-driven manipulation design, allowing for reconfigurable kinematic morphology by its movable anchor bases and realizing the capacity of crossing over the stay cables for its flying platform. The collaborative robot squad design liberates the dimensions and scales of the robot’s reachable workspace and moves the part of the robotic system that indeed needs to be moved, enhancing the working efficiency and climbing agility. This strategy also utilizes controllable adhesion instead of friction to interact with the bridge cable surface for the flying platform, realizing force multiplication for forceful manipulation. Toward bringing high efficiency and heavy-duty capacity, we propose the applicable climbing frameworks (zero-downtime climbing gait for cable inspection and spider-like climbing gait for cable maintenance) and the optimization frameworks (optimal anchor configuration for the movable anchor bases and optimal grasp arrangement for the flying gripper). This article includes the exploration of the design and climbing gaits of CCRobot-S, the formulation of the CCRobot-S model, a comprehensive analysis of its workspace, and its climbing strategy and optimization. Extensive experiments have assessed the proposed climbing strategy’s effectiveness and showcased CCRobot-S’ capabilities.
In this article, we present irrotational contact fields, a framework for generating convex approximations of complex contact models, incorporating experimentally validated models like Hunt and Crossley coupled with Coulomb’s law of friction alongside the principle of maximum dissipation. Our approach is robust across a wide range of stiffness values, making it suitable for both compliant surfaces and rigid approximations. We evaluate these approximations across a wide variety of test cases, detailing properties and limitations. We implement a fully differentiable solution in the open-source robotics toolkit, Drake. Our novel hybrid approach enables efficient computation of gradients for complex geometric models by reusing factorizations from contact resolution. We demonstrate robust simulation of robotic tasks at interactive rates, with accurately resolved stiction and contact transitions, supporting effective sim-to-real transfer.
Impedance control is a widely adopted approach that ensures the compliant behavior of robot manipulators as they interact with their environment according to specifically designed dynamics. For tasks involving six degrees of freedom (DoF), it is crucial to appropriately manage the position and orientation of the end-effector by controlling dynamic behavior. However, describing orientational displacement and designing the corresponding rotational impedance can be challenging, especially when we use a minimal representation. The well-known minimal representation for orientation, the Euler angle, suffers from representation singularity. As a remedy, the quaternion or dual quaternion can be an alternative, but with nonminimal representations. This lack of minimal representation, which does not suffer from the representation singularity, often leads to handling the impedance design by directly defining the potential energy function in the matrix Lie group. This article proposes a framework for the six-DoF impedance control design that takes advantage of Lie group theory with minimal representation, known as the exponential coordinate. Since the exponential coordinate can be treated as the Euclidean variable within the injectivity radius, it allows for the formulation of the impedance control more systematically and familiarly. In our framework, a detour strategy is utilized; the impedance is designed in the Lie group $SE(3)$, and the control is designed in the Lie algebra $\mathfrak {se}(3)$, which is isomorphic to the vector space $\mathbb {R}^{6}$. The group structure of $SE(3)$ can be maintained using the proposed conversion formula between the Lie group and the Lie algebra, called the differential of the exponential map and its time derivative, with a closed-form expression. Experiments with a 6-DoF robot manipulator verified that the proposed impedance control framework effectively reflects the $SE(3)$ group structure and achieves the desired dynamic behavior as the functionality of the impedance control with minimal parameters.
Model-based manipulation of deformable objects has traditionally dealt with objects while neglecting their dynamics, thus mostly focusing on very lightweight objects at steady state. At the same time, soft robotic research has made considerable strides toward general modeling and control, despite soft robots, and deformable objects being very similar from a mechanical standpoint. In this work, we leverage these recent results to develop a control-oriented, fully dynamic framework of slender deformable objects grasped at one end by a robotic manipulator. We introduce a dynamic model of this system using functional strain parameterizations and describe the manipulation challenge as a regulation control problem. This enables us to define a fully model-based control architecture, for which we can prove analytically closed-loop stability and provide sufficient conditions for steady state convergence to the desired state. The nature of this work is intended to be markedly experimental. We provide an extensive experimental validation of the proposed ideas, tasking a robot arm with controlling the distal end of six different cables, in a given planar position and orientation in space.
Nonrigid structure-from-motion (NRSfM), a promising technique for addressing the mapping challenges in monocular visual deformable simultaneous localization and mapping, has attracted growing attention. We introduce a novel method, called Con-NRSfM, for NRSfM under conformal deformations, encompassing isometric deformations as a subset. Our approach performs point-wise reconstruction using 2-D selected image warps optimized through a graph-based framework. Unlike existing methods that rely on strict assumptions, such as locally planar surfaces or locally linear deformations, and fail to recover the conformal scale, our method eliminates these constraints and accurately computes the local conformal scale. In addition, our framework decouples constraints on depth and conformal scale, which are inseparable in other approaches, enabling more precise depth estimation. To address the sensitivity of the formulated problem, we employ a parallel separable iterative optimization strategy. Furthermore, a self-supervised learning framework, utilizing an encoder–decoder network, is incorporated to generate dense 3-D point clouds with texture. Simulation and experimental results using both synthetic and real datasets demonstrate that our method surpasses existing approaches in terms of reconstruction accuracy and robustness.
Despite significant advancements in multirobot technologies, efficiently and collaboratively exploring an unknown environment remains a major challenge. In this article, we propose AIM-Mapping, an Asymmetric InforMation enhanced Mapping framework based on deep reinforcement learning. The framework fully leverages the privileged information to help construct the environmental representation as well as the supervised signal in an asymmetric actor–critic training framework. Specifically, privileged information is used to evaluate exploration performance through an asymmetric feature representation module and a mutual information evaluation module. The decision-making network employs the trained feature encoder to extract structural information of the environment and integrates it with a topological map constructed based on geometric distance. By leveraging this topological map representation, we apply topological graph matching to assign corresponding boundary points to each robot as long-term goal points. We conduct experiments in both iGibson simulation environments and real-world scenarios. The results demonstrate that the proposed method achieves significant performance improvements compared to existing approaches.
A differential dynamic programming (DDP)-based framework for inverse reinforcement learning (IRL) is introduced to recover the parameters in the cost function, system dynamics, and constraints from demonstrations. Different from existing work, where DDP was usually used for the inner forward problem, our proposed framework uses it to efficiently compute the gradient required in the outer inverse problem with equality and inequality constraints. The equivalence between the proposed and existing methods based on Pontryagin's maximum principle (PMP) is established. More importantly, using this DDP-based IRL with an open-loop loss function, a closed-loop IRL framework is presented. In this framework, a loss function is proposed to capture the closed-loop nature of demonstrations. It is shown to be better than the commonly used open-loop loss function. We show that the closed-loop IRL framework reduces to a constrained inverse optimal control problem under certain assumptions. Under these assumptions and a rank condition, it is proven that the learning parameters can be recovered from the demonstration data. The proposed framework is extensively evaluated through four numerical robot examples and one real-world quadrotor system. The experiments validate the theoretical results and illustrate the practical relevance of the approach.
Reducing undesirable path crossings among trajectories of different robots is vital in multirobot navigation missions, which not only reduces detours and conflict scenarios, but also enhances navigation efficiency and boosts productivity. Despite recent progress in multirobot path-crossing-minimal (MPCM) navigation, the majority of approaches depend on the minimal squared-distance reassignment of suitable desired points to robots directly. However, if obstacles occupy the passing space, calculating the actual robot-point distances becomes complex or intractable, which may render the MPCM navigation in obstacle environments inefficient or even infeasible. In this article, the concurrent-allocation task execution (CATE) algorithm is presented to address this problem (i.e., MPCM navigation in obstacle environments). First, the path-crossing-related elements in terms of, first, robot allocation, second, desired-point convergence, and first, collision and obstacle avoidance are encoded into integer and control barrier function (CBF) constraints. Then, the proposed constraints are used in an online constrained optimization framework, which implicitly yet effectively minimizes the possible path crossings and trajectory length in obstacle environments by minimizing the desired point allocation cost and slack variables in CBF constraints simultaneously. In this way, the MPCM navigation in obstacle environments can be achieved with flexible spatial orderings. Note that the feasibility of solutions and the asymptotic convergence property of the proposed CATE algorithm in obstacle environments are both guaranteed, and the calculation burden is also reduced by concurrently calculating the optimal allocation and the control input directly without the path planning process. Finally, extensive simulations and experiments are conducted to validate that the CATE algorithm, first, outperforms the existing state-of-the-art baselines in terms of feasibility and efficiency in obstacle environments, second, is effective in environments with dynamic obstacles and is adaptable for performing various navigation tasks in 2-D and 3-D, third, demonstrates its efficacy and practicality by 2-D experiments with a multi-autonomous mobile robot (AMR) onboard navigation system, and, first, provides a possible solution to evade deadlocks and pass through a narrow gap.
In this article, a disturbance observer-based model predictive control (DOB-based MPC) strategy is proposed for the trajectory tracking of cable-driven parallel robots (CDPRs). The original nonlinear optimization problem of the MPC explicitly handles the positive bounded constraints of cable tensions and is transformed into a quadratic problem (QP) based on the desired trajectory and offline workspace analysis. Additionally, the uncertainties and external disturbances in the system are considered and derived as the lumped disturbance. Then, a nonlinear DOB is used to estimate the lumped disturbance, and accordingly the estimation is used to enhance the prediction model of the MPC. The control input of the proposed MPC strategy is redesigned to incorporate an auxiliary controller. The estimation error of the DOB and the time-varying characteristics of the disturbance are leveraged to tighten the constraints of the QP less conservatively and generate feasible tubes. Such tube techniques guarantee the recursive feasibility and the input-to-state stability of the proposed MPC strategy. Furthermore, the whole algorithm for deployment including an online constraint updating method is developed. Both simulations and experiments are carried out thoroughly, showing that the DOB-based MPC can effectively improve trajectory tracking accuracy and ensure that the cable tensions satisfy the constraints in the case of unknown disturbances and model uncertainties.
This study presents a model predictive path integral (MPPI) method capable of conducting high-frequency real-time model predictive control (MPC) for robot manipulators. Real-time MPC-based manipulation holds significant potential for controlling an end-effector precisely and reactively while satisfying various constraints in dynamic environments. However, the optimization under a complex robot model and various constraints imposes a heavy computational burden, hindering the realization of high-frequency updates. To address this challenge, we propose a single-instance sampling-based MPPI algorithm and dynamic time horizon to significantly reduce the computational burden while enhancing control performance. The performance and efficacy of the proposed method are verified through experiments conducted on a 7-degree-of-freedom robotic arm, along with comparative simulations and analysis.
This study presents an intelligent bionic amphibious turtle robot (IBATR) featuring a three-degree-of-freedom bionic flipper mechanism, designed to achieve high maneuverability, agility, and adaptive locomotion in dynamic aquatic–terrestrial environments. Specifically, mechanical testing and hydrodynamic analysis validate the robot’s operational capabilities in granular media and aquatic settings. Subsequently, Bayesian optimization generates energy-efficient gait parameters, enabling flexible motion under low-power constraints. To further bridge perception and action, a terrain classification framework is implemented by fusing visual data from an onboard camera and tactile feedback from pressure sensors, enhancing environmental adaptability. This framework utilizes a dual-stream convolutional neural network, achieving 99.17% classification accuracy across four terrestrial substrates and one aquatic condition. Experimental results demonstrate that terrain-aware gait adaptation improves energy efficiency by 19.1% and movement speed by 9.2% compared to static gait configurations. Field tests under wave disturbances further confirm the robot’s capability for seamless land–water transitions. Collectively, this work advances biomimetic robotics by unifying perception-driven control, terrain-optimized actuation, and lightweight structural design, offering novel methodologies for resilient operations in complex amphibious environments.
Evolutionary pressures have pushed humans to become efficient walkers, but inefficient divers. People consume more energy to travel the same distance underwater than on land. In diverse overground locomotion, emerging exoskeletons have reduced the metabolic cost of humans. Can we also improve the energy economy in underwater locomotion via exoskeletons? Here, we propose an underwater exoskeleton to assist scuba diving using flutter kick, by applying assistive knee extension torque during the strike phase of the diving kick cycle. When divers wore the powered exoskeleton, the average net air cost across six experienced divers was reduced by 22.7 $\pm$ 10.0%, and the peak quadriceps activation was decreased by 20.9 $\pm$ 7.5%, compared with normal diving without the exoskeleton. The average gastrocnemius activation also decreased by 20.6 $\pm$ 5.3%, suggesting that the divers sufficiently utilized the exoskeleton assistance. These results indicate that applying exoskeleton assistance is conducive to improving the endurance of human underwater diving and enhancing our ability to explore the underwater world. Our study extends the application boundary of wearable robots, and provides a reference for the design and assessment of future underwater assistive devices, with the potential to strengthen the connection between humans and the ocean.
Recent advances in underwater robotics highlight the potential of fish-like robots for efficient propulsion. However, their motion performance still lags behind real fish due to an incomplete understanding of fluid dynamics. This article hypothesizes that chordwise-oriented vortices dominate the flow evolution of low-aspect-ratio flapping foils. Based on this, a quasi 3-D hydrodynamic model, the Sliding Strip Discrete Vortex Method (SSDVM), is developed. SSDVM tracks chordwise-oriented vortex evolution along the chord, enabling hydrodynamic force calculations for heave and pitch motions across various parameters. The model closely agrees with Computational Fluid Dynamics simulations while reducing computational cost. When integrated with a dynamic model, SSDVM accurately predicts robotic fish motion, with simulated speeds closely matching experimental results and achieving a Mean Absolute Percentage Error of 6.54% . SSDVM offers an analytical tool for robotic fish hydrodynamics, balancing accuracy and efficiency, with potential applications in optimizing bioinspired underwater propulsion.
With the increasing integration of cyber-physical systems (CPS) into critical applications, ensuring their resilience against cyberattacks is paramount. A particularly concerning threat is the vulnerability of CPS to deceptive attacks that degrade system performance while remaining undetected. This article investigates perfectly undetectable false data injection attacks (FDIAs) targeting the trajectory tracking control of a nonholonomic mobile robot. The proposed attack method utilizes affine transformations of intercepted signals, exploiting weaknesses inherent in the partially linear dynamic properties and symmetry of the nonlinear plant. The feasibility and potential impact of these attacks are validated through experiments using a Turtlebot 3 platform, highlighting the urgent need for sophisticated detection mechanisms and resilient control strategies to safeguard CPS against such threats. Furthermore, a novel approach for detection of these attacks called the state monitoring signature function (SMSF) is introduced. An example SMSF, a carefully designed function resilient to FDIA, is shown to be able to detect the presence of an FDIA through signatures based on system states.
Elastomer-based soft manipulators with fibre-reinforced chambers, represent a prevalent design paradigm in soft robotics. These robots incorporate multiple actuation chambers, enabling elongation and bending motions. However, the inherent compliance of materials and the pressurized chambers inevitably introduce significant nonlinearity to these robots. Moreover, design of such robots often relies on a trial-and-error approach. Consequently, a comprehensive robot prototyping framework is of paramount importance. To achieve this, we present a static modeling, design and evaluation framework for soft robots with densely reinforced chambers (i.e., the angle between the reinforcement fibre and the axial direction of soft robots is $\text{90}^\circ$). We first propose a static analytical modeling framework to achieve both the forward kinematics and the tip force generation modeling. This modeling framework accommodates the effects of pressurized chambers and (non)linear material behaviors. Furthermore, our design and evaluation framework incorporates an open-accessible simulation toolbox with a user-friendly graphical interface, along with a physical evaluation platform. The entire framework is validated by eight kinds of manipulators with varying diameters and lengths. Meanwhile, the nonlinearity introduced by geometrical deformation resulting from the elongation, the pressurized actuation chambers (i.e., the chamber stiffening effect), and material hyperelasticity are investigated. Results also enable informed decision-making on design specifications prior to robot fabrication.
This article considers the problem of designing motion planning algorithms for control-affine systems that generate collision-free paths from an initial to a final destination and can be executed using safe and dynamically feasible controllers. We introduce the compatible control lyapunov function control barrier function rapidly exploring random tree (C-CLF-CBF-RRT) algorithm, which produces paths with such properties and leverages rapidly exploring random trees (RRTs), control Lyapunov functions (CLFs), and control barrier functions (CBFs). For linear systems with polytopic and ellipsoidal constraints, C-CLF-CBF-RRT requires solving a quadratically constrained quadratic program at every iteration of the algorithm, which can be done efficiently. We prove the probabilistic completeness of C-CLF-CBF-RRT and showcase its performance in simulation and hardware experiments.
Robotic systems operating in unstructured environments require the ability to switch between compliant and rigid states to perform diverse tasks, such as adaptive grasping, high-force manipulation, shape holding, and navigation in constrained spaces, among others. However, many existing variable stiffness solutions rely on complex actuation schemes, continuous input power, or monolithic designs, limiting their modularity and scalability. This article presents the programmable locking cell (PLC)—a modular, tendon-driven unit that achieves discrete stiffness modulation through mechanically interlocked joints actuated by cable tension. Each unit transitions between compliant and firm states via structural engagement, and the assembled system exhibits high stiffness variation—up to 950% per unit—without susceptibility to damage under high payload in the firm state. Multiple PLC units can be assembled into reconfigurable robotic structures with spatially programmable stiffness. We validate the design through two functional prototypes: first, a variable-stiffness gripper capable of adaptive grasping, firm holding, and in-hand manipulation, and second, a pipe-traversing robot composed of serial PLC units that achieves shape adaptability and stiffness control in confined environments. These results demonstrate the PLC as a scalable, structure-centric mechanism for programmable stiffness and motion, enabling robotic systems with reconfigurable morphology and task-adaptive interaction.
We present HI-SLAM2, a geometry-aware Gaussian SLAM system that achieves fast and accurate monocular scene reconstruction using only RGB input. Existing neural SLAM or 3DGS-based SLAM methods often tradeoff between rendering quality and geometry accuracy, our research demonstrates that both can be achieved simultaneously with RGB input alone. The key idea of our approach is to enhance the ability for geometry estimation by combining easy-to-obtain monocular priors with learning-based dense SLAM, and then using 3-D Gaussian splatting as our core map representation to efficiently model the scene. Upon loop closure, our method ensures on-the-fly global consistency through efficient pose graph bundle adjustment and instant map updates by explicitly deforming the 3-D Gaussian units based on anchored keyframe updates. Furthermore, we introduce a grid-based scale alignment strategy to maintain improved scale consistency in prior depths for finer depth details. Through extensive experiments on Replica, ScanNet, Waymo Open, ETH3D SLAM and ScanNet++ datasets, we demonstrate significant improvements over existing neural SLAM methods and even surpass RGB-D-based methods in both reconstruction and rendering quality.
In this article, we present a framework for multirobot task allocation (MRTA) in heterogeneous teams performing long-endurance missions in dynamic scenarios. Given the limited battery of robots, especially for aerial vehicles, we allow for robot recharges and the possibility of fragmenting and/or relaying certain tasks. We also address tasks that must be performed by a coalition of robots in a coordinated manner. Given these features, we introduce a new class of heterogeneous MRTA problems, which we analyze theoretically and optimally formulate as a mixed-integer linear program (MILP). We then contribute a heuristic algorithm to compute approximate solutions and integrate it into a mission planning and execution architecture capable of reacting to unexpected events by repairing or recomputing plans online. Our experimental results show the relevance of our newly formulated problem in a realistic use case for inspection with aerial robots. We assess the performance of our heuristic solver in comparison with other variants and with exact optimal solutions in small-scale scenarios. In addition, we evaluate the ability of our replanning framework to repair plans online.
As the demand for efficient parcel delivery continues to grow in the logistics industry, optimizing multirobot task assignment has become crucial for enhancing overall delivery performance. This article addresses the precedence-constrained multitruck multidrone package delivery task assignment problem, where each truck coordinates with a drone to serve multiple dispersed customers under precedence constraints that specify the required order of service. While trucks deliver packages to designated customers, drones can simultaneously serve other customers, subject to their limited flight endurance and payload capacity. To tackle this challenge, a three-phase heuristic algorithm is proposed to minimize the total delivery time required to serve the last customer while ensuring all precedence constraints are satisfied. In the first phase, an extended minimum marginal cost algorithm is applied to quickly construct truck-only routes that comply with precedence constraints. In the second phase, a splitting algorithm combined with an endurance checking procedure is employed to generate hybrid truck–drone routes considering drone limitations. In the final phase, a variable neighborhood descent approach is introduced to further improve the solution by strategically perturbing the truck-only routes. Extensive simulations and experiments demonstrate that the proposed three-phase heuristic algorithm consistently achieves higher-quality solutions with reduced computation time compared with the widely used adaptive large neighborhood search method.
This article develops a real-time decentralized metric-semantic simultaneous localization and mapping (SLAM) algorithm that enables a heterogeneous robot team to collaboratively construct object-based metric-semantic maps. The proposed framework integrates a data-driven front-end, for instance, segmentation from either RGBD cameras or light detection and ranging (LiDAR) and a custom back-end for optimizing robot trajectories and object landmarks in the map. To allow multiple robots to merge their information, we design semantics-driven place recognition algorithms that leverage the informativeness and viewpoint invariance of the object-level metric-semantic map for inter-robot loop closure detection. A communication module is designed to track each robot's observations and those of other robots whenever communication links are available. The framework supports real-time, decentralized operation onboard the robots and has been integrated with three types of aerial and ground platforms. We validate its effectiveness through experiments in both indoor and outdoor environments, as well as benchmarks on public datasets and comparisons with existing methods. The framework is open-sourced and suitable for both single-agent and multirobot real-time metric-semantic SLAM applications.
Robotic systems demand accurate and comprehensive 3-D environment perception, requiring simultaneous capture of photorealistic appearance (optical), precise layout shape (geometric), and open-vocabulary scene understanding (semantic). Existing methods typically achieve only partial fulfillment of these requirements while exhibiting optical blurring, geometric irregularities, and semantic ambiguities. To address these challenges, we propose OmniMap. Overall, OmniMap represents the first online mapping framework that simultaneously captures optical, geometric, and semantic scene attributes while maintaining real-time performance and model compactness. At the architectural level, OmniMap employs a tightly coupled 3DGS–Voxel hybrid representation that combines fine-grained modeling with structural stability. At the implementation level, OmniMap identifies key challenges across different modalities and introduces several innovations: adaptive camera modeling for motion blur and exposure compensation, hybrid incremental representation with normal constraints, and probabilistic fusion for robust instance-level understanding. Extensive experiments show OmniMap’s superior performance in rendering fidelity, geometric accuracy, and zero-shot semantic segmentation compared to state-of-the-art methods across diverse scenes. The framework’s versatility is further evidenced through a variety of downstream applications, including multidomain scene Q&A, interactive editing, perception-guided manipulation, and map-assisted navigation.
In this article, we develop and open-source, for the first time, a robust and efficient square-root filter (SRF)-based visual–inertial navigation system (VINS), termed $\sqrt{\text{VINS}}$, which is ultra-fast, numerically stable, and capable of dynamic initialization even under extreme conditions (i.e., extremely small time window). Despite recent advancements in VINS, resource constraints and numerical instability on embedded (robotic) systems with limited precision remain critical challenges. A square-root covariance-based filter offers a promising solution by providing numerical stability, efficient memory usage, and guaranteed positive semidefiniteness. However, canonical SRFs suffer from inefficiencies caused by disruptions in the triangular structure of the covariance matrix during updates. The proposed method significantly improves VINS efficiency with a novel Cholesky decomposition (LLT)-based SRF update, by fully exploiting the system structure and the SRF to preserve the upper triangular structure of square-root covariance. Moreover, we design a fast, robust, and dynamic initialization method, which first quickly recovers the minimal states without triangulating 3D features and then efficiently performs iterative SRF update to refine the full states, enabling seamless VINS operation even in challenging scenarios. The proposed LLT-based SRF is extensively verified through numerical studies, demonstrating superior numerical stability under challenging conditions and achieving robust efficient performance on 32-b single-precision floats, operating at twice the speed of state-of-the-art methods. Our initialization method, tested on both mobile workstations and Jetson Nano computers achieving a high success rate of initialization even within a 100-ms window under minimal conditions. Finally, the proposed $\sqrt{\text{VINS}}$ is extensively validated across diverse scenarios, demonstrating strong efficiency, robustness, and reliability.
PneuNet, consists of a series of interconnected chambers embedded within a soft elastomer material, can exhibit diverse deformations. 3-D printing allows for precise control over both material combinations and geometrical configurations, enabling the fabrication of PneuNets with complicated structures and multifunctionality. However, the increased freedom in material and structures introduced by 3-D printing also presents significant challenges for modeling and design, including material nonlinearities, complex cross-sections, and varying initial curvatures. In this work, we develop 3-D-printed PneuNets with varying initial curvatures and cross-sections demonstrating finite deformation with multiple complete turns. To model the helical shape, we establish a general nonlinear framework based on the minimum potential energy method. The model is validated by PneuNets with various material combinations and geometrical configurations across a range of constitutive models, including Mooney–Rivlin, Ogden, Neo–Hookean, and Yeoh models. Results show that the nonlinear model, especially the Mooney–Rivlin model, accurately captures the deformation without any fitting parameters, achieving an $R^{2}$ value of 0.975, compared to 0.017 for the linear model. Based on the validated model, PneuNets are inverse-designed to achieve desired spatial deformations. Their dynamic responses and payload capacities are also evaluated. We design a 3-D-printed octopus with tentacles composed of PneuNets, capable of mimicking the grasping and movement of a real octopus. In addition, we demonstrate the multifunctional capabilities such as fluid transition and sensing. This study lays a solid foundation for the design and application of 3-D-printed PneuNets.
Tactile and proximity sensing is essential for robotic tasks involving human–robot interaction and manipulation. However, existing dual-mode sensors often face challenges such as environmental interference, large sizes, and task-specific limitations. This study proposes a dual-mode photoelectric sensor that integrates tactile and proximity sensing. The tactile sensing mechanism is based on a variable optical path structure, while the proximity sensing relies on the surface light reflection. The sensor exhibits a high sensitivity (up to 1.12 V/N), compactness (4 mm thickness), and desirable stability with a drift of less than 1% over 8000 repetitive cycles under pressures ranging from 0 to 63 kPa. A general tactile-proximity servoing framework is also proposed for the dual-mode sensor array which enables tactile servoing, proximity servoing, and hybrid tactile-proximity servoing. Under this framework, parameters can be flexibly adjusted to adapt to different servoing tasks including position and orientation control of the robotic arm's end-effector. In more complex robotic tasks, a real-time fruit ripeness classification method is developed based on the proposed sensor. Using the proposed TPNet, the classification method can achieve an accuracy of 94.4% in a four-level tomato ripeness classification task during grasping.
Existing polar robots are constrained by limited energy supply, making it difficult to carry out long-term scientific exploration missions, which highlights an urgent demand for energy conservation. An energy-efficient multimode motion polar robot is proposed to address this challenge. Both increasing external assistance and reducing the driving force are critical for lowering energy consumption. A foldable sail is designed to provide external assistance. When unfolded, the sail generates assistive force. When folded, it maintains stability in extreme polar climates. The sail shape is designed based on a symmetrically extended NACA0018 airfoil, and the influence of different sail parameters on performance is discussed. The transformable tracks realize switching between traction and sliding modes through the separation of the track and teeth chain, using the sliding mode to reduce driving force. The effect of teeth parameter variations on traction performance is analyzed. The system kinematics and dynamics are model, and stability conditions are determined. Based on this, an energy-saving motion control framework for multimode motion is proposed. Finally, experiments are conducted to evaluate the energy-saving contribution of each independent mode under different configurations. Comprehensive experiments in multimode motion demonstrate an overall energy-saving rate of approximately 24%, verifying the effectiveness of the energy-saving motion control strategy. With its energy-saving advantages, this robot shows strong potential for enabling long-term scientific exploration in polar regions.
This article presents a task-space admittance controller applicable to redundant manipulators equipped with torque sensors. It extends Kikuuwe’s (2019) torque-bounded admittance controller, which allows for imposing explicit limits on the joint actuator torques without causing unsafe behaviors, such as oscillation and overshoots. The proposed controller enforces that the end-effector follows predefined task-space dynamics as long as the joint torques are unsaturated and the configuration is away from singularities. The behavior in the nullspace, which arises from the redundant degrees of freedom and singular configurations, is governed by predefined joint-space dynamics. The task-space and joint-space dynamics are combined through a newly proposed continualized pseudoinverse, which employs the singular value decomposition. Results of experiments using a seven-degree-of-freedom Kinova Gen3 robot illustrate the validity of the proposed admittance controller in various scenarios, including the case where the robot is fully stretched.
Expressive motion planning for aerial manipulators (AMs) is essential for tackling complex manipulation tasks, yet achieving coupled trajectory planning adaptive to various tasks remains challenging, especially for those requiring aggressive maneuvers. In this work, we propose a novel whole-body integrated motion planning framework for quadrotor-based AMs that leverages flexible waypoint constraints to achieve versatile manipulation capabilities. These waypoint constraints enable the specification of individual position requirements for either the quadrotor or end-effector, while also accommodating higher order velocity and orientation constraints for complex manipulation tasks. To implement our framework, we exploit spatio-temporal trajectory characteristics and formulate an optimization problem to generate feasible trajectories for both the quadrotor and manipulator while ensuring collision avoidance considering varying robot configurations, dynamic feasibility, and kinematic feasibility. Furthermore, to enhance the maneuverability for specific tasks, we employ imitation learning to facilitate the optimization process to avoid poor local optima. The effectiveness of our framework is validated through comprehensive simulations and real-world experiments, where we successfully demonstrate nine fundamental manipulation skills across various environments.
Integrating generative models with action chunking has shown significant promise in imitation learning for robotic manipulation. However, the existing diffusion-based paradigm often struggles to capture strong temporal dependencies across multiple steps, particularly when incorporating proprioceptive input. This limitation can lead to task failures, where the policy overfits to proprioceptive cues at the expense of capturing the visually derived features of the task. To overcome this challenge, we propose the deep Koopman-boosted dual-branch diffusion policy (D3P) algorithm. D3P introduces a dual-branch architecture to decouple the roles of different sensory modality combinations. The visual branch encodes the visual observations to indicate task progression, while the fused branch integrates both visual and proprioceptive inputs for precise manipulation. Within this architecture, when the robot fails to accomplish intermediate goals, such as grasping a drawer handle, the policy can dynamically switch to execute action chunks generated by the visual branch, allowing recovery to previously observed states and facilitating retrial of the task. To further enhance visual representation learning, we incorporate a deep Koopman operator module that captures structured temporal dynamics from visual inputs. During inference, we use the test-time loss of the generative model as a confidence signal to guide the aggregation of the temporally overlapping predicted action chunks, thereby enhancing the reliability of policy execution. In simulation experiments across six RLBench tabletop tasks, D3P outperforms the state-of-the-art diffusion policy by an average of 14.6%. On three real-world robotic manipulation tasks, it achieves a 15.0% improvement.
Global trend in robotics has shifted toward deploying humanoid robots and mobile manipulators in industrial settings to automate repetitive and structured tasks traditionally performed by human workers. However, most tools and equipment are designed for human hands, and current grippers or end-effectors are highly specialized, limiting their ability to fully replace human handling of simple tools and tasks. This study proposes a novel frictional and prismatic pin-array gripper developed for universal gripping and tool manipulation. A pin-array structure of the gripper mimics the behavior of soft grippers while incorporating rigid components, enabling adaptability to various shapes and sizes. Each pin features semiautomatic actuation through a compression spring, supporting the underactuated mechanism. Most existing studies on grippers focus on simple pick-and-place tasks, whereas the proposed gripper extends functionality to practical tool usage. Enabled by the pin-array structure, it provides increased contact surface and support points, ensuring stable gripping and enhanced manipulation performance. In the evaluation, the pin-array gripper achieved a payload capacity of 2400 g, significantly outperforming the conventional RG2-FT gripper and the frictional flat gripper, which reached maximum capacities of 800 and 400 g, respectively. It also exhibited higher grasping forces, measuring 1.17 times greater than the RG2-FT gripper and up to 23 times greater than the frictional flat gripper. For tool manipulation, the pin-array gripper exhibited significantly lower manipulation errors, with 21.67 and 6.59 times fewer errors than the RG2-FT and flat grippers, respectively, when handling the hammer, and 7.69 and 4.45 times fewer for the metal file. In addition, qualitative demonstrations in universal gripping, omnidirectional gripping, and tool usage further validated the gripper’s performance in mobile manipulator tasks.