Corrections are done in [1] in the section authors affilations. Vidas Raudonis affilation where corrected from “$^{4}$SustAInLivWork Center of Excellence, Kaunas, Lithuania” to “$^{4}$SustAInLivWork Center of Excellence, Kaunas, Lithuania and $^{5}$Faculty of Electrical and Electronics Eng., Kaunas University of Technology, Kaunas, Lithuania”
In this letter, we investigate whether classical function allocation—the principle of assigning tasks to either a human or a machine—holds for physical Human-Robot Collaboration, which is important for providing insights for Industry 5.0 to guide how to best augment rather than replace workers. This study empirically tests the applicability of Fitts' List within physical Human-Robot Collaboration, by conducting a user study (N=26, within-subject design) to evaluate four distinct allocations of position/force control between human and robot in an abstract blending task. We hypothesize that the function in which humans control the position achieves better performance and receives higher user ratings. When allocating position control to the human and force control to the robot, compared to the opposite case, we observed a significant improvement in preventing overblending. This was also perceived better in terms of physical demand and overall system acceptance, while participants experienced greater autonomy, more engagement and less frustration. An interesting insight was that the supervisory role (when the robot controls both position and force) was rated second best in terms of subjective acceptance. Another surprising insight was that if position control was delegated to the robot, the participants perceived much lower autonomy than when the force control was delegated to the robot. These findings empirically support applying Fitts' principles to static function allocation for physical collaboration, while also revealing important nuanced user experience trade-offs, particularly regarding perceived autonomy when delegating position control.
Drawing inspiration from piscine locomotion strategies, biomimetic underwater robotic systems demonstrate enhanced operational efficiency and stealth capabilities when executing target tracking missions within dynamic aquatic environments. However, challenges related to their operational speed and control precision hinder the timely and accurate completion of tasks. In this letter, a biomimetic wave-spiral robot with dual-mode propulsion capabilities is developed, and an autonomous switching tracking control algorithm based on deep reinforcement learning (DRL) is proposed. First, the kinematic models of the fins and fin rays in both spiral and wave modes are derived, and a multi segmented splicing design is employed to produce multi-modal fins, resulting in an integrated propulsion structure compatible with both high-speed and high-maneuverability modes. Second, a tracking control system consisting of two independent DRL policy networks and an adaptive mode switcher is proposed to realize autonomous and stable tracking control. Finally, simulation and swimming pool experiments demonstrate that the wave-spiral robot equipped with the dual-mode adaptive switching algorithm significantly achieved a 95% task completion rate and an average reward of 0.85 in the target tracking control experiment. Moreover, it demonstrates a 29.5% improvement in operation time when dealing with state disturbances and dynamic targets compared with the traditional control method. This study provides a promising and robust solution for developing tracking control for multi-mode biomimetic robots.
We present ClothMate, a general framework for flattening garments of various categories from arbitrary configurations. Prior end-to-end methods are often limited in data efficiency. To address this, ClothMate introduces an intriguing observation: in garment flattening, grasping the same point pair typically results in a consistent fling response, regardless of the current state of the garment. Building on this insight, we adopt a two-stage paradigm: first learning point-wise fling responses under a canonicalized and aligned configuration, then generalizing to arbitrary states via point-to-point projection. This explicit decoupling of state estimation and action prediction mitigates challenges arising from infinite-DOF dynamics and severe self-occlusion, enabling the model to better capture common interaction patterns, thus improving data efficiency to support a broader range of garment categories. Additionally, we propose a novel static pick-and-stretch dual-arm flattening strategy that refines the outcome by heuristically stretching around adaptively selected key points. Compared to the state of the art, ClothMate is trained and evaluated simultaneously on five garment categories, using only 15% of the total data that baselines would require for separate training on all five categories, while achieving higher coverage (91.5% vs. 85.0%) and fewer steps (4.7 vs. 7.0).
Accurate 3D semantic occupancy perception is essential for autonomous driving in complex environments with diverse and irregular objects. While vision-centric methods suffer from geometric inaccuracies, LiDAR-based approaches often lack rich semantic information. To address these limitations, MS-Occ, a novel multi-stage LiDAR-camera fusion framework which includes middle-stage fusion and late-stage fusion, is proposed, integrating LiDAR's geometric fidelity with camera-based semantic richness via hierarchical cross-modal fusion. The framework introduces innovations at two critical stages: (1) In the middle-stage feature fusion, the Gaussian-Geo module leverages Gaussian kernel rendering on sparse LiDAR depth maps to enhance 2D image features with dense geometric priors, and the Semantic-Aware module enriches LiDAR voxels with semantic context via deformable cross-attention; (2) In the late-stage voxel fusion, the Adaptive Fusion (AF) module dynamically balances voxel features across modalities, while the High Classification Confidence Voxel Fusion (HCCVF) module resolves semantic inconsistencies using self-attention-based refinement. Experiments on two large-scale benchmarks demonstrate state-of-the-art performance. On nuScenes-OpenOccupancy, MS-Occ achieves an Intersection over Union (IoU) of 32.1% and a mean IoU (mIoU) of 25.3%, surpassing the state-of-the-art by $+0.7\%$ IoU and $+2.4\%$ mIoU. Furthermore, on the SemanticKITTI benchmark, our method achieves a new state-of-the-art mIoU of 24.08%, robustly validating its generalization capabilities. Ablation studies further confirm the effectiveness of each individual module, highlighting substantial improvements in the perception of small objects and reinforcing the practical value of MS-Occ for safety-critical autonomous driving scenarios.
Oil spills continuously affect marine ecosystems and require rapid monitoring for effective emergency response. This letter tackles the problem of persistent monitoring for continuously changing and scattered oil spill regions through Entropy-Based Incremental Coverage Path Planning (EICPP). By using contour comparison between monitoring cycles, an incremental coverage mechanism is first introduced to focus on newly emerged oil spill regions. Then, a balanced region division algorithm is incorporated to handle scattered oil spill areas while ensuring equal workload distribution among UAVs. The entropy-based path planning enhances oil spill monitoring effectiveness by Drift Information Freshness (DIF) through prioritizing high-entropy regions under limited UAV resources. We evaluate the robustness and effectiveness of our method across multiple scenarios. Our method demonstrates clear advantages in DIF, achieving 19–25% improvements over strong baselines across different spill scales and about 19.6–24% on real-world oil spill datasets. It also substantially reduces total flight distance while consistently satisfying the 90% coverage requirement.
Serving as a fundamental task in robotic navigation and autonomous driving, occupancy prediction is gaining increasing attention for its fine-grained perception of the 3D environment. Most existing methods rely on dense 3D annotations, which are expensive, labor-intensive, and difficult to scale in real-world applications. Recent studies explore the use of cheaper and easy-to-obtain sparse 2D labels as a more scalable alternative. Though achieving some progress, these methods often underperform compared to fully supervised counterparts due to the lack of supervision in unlabeled regions. To bridge the gap, we propose a self-training framework that generates supervision in unannotated areas. A key component of self-training is the use of a teacher-student framework, where the teacher generates pseudo labels to guide student learning. However, a naive teacher tends to produce predictions that are sparse, noisy, and closely resemble the student's output, making it ineffective for guiding student learning. To ensure effective knowledge transfer, we propose three key strategies: (1) strengthening the supervision signal by integrating prior knowledge to guide the teacher network, (2) improving pseudo label quality by filtering out uncertain predictions, and (3) densifying supervision by aggregating predictions across frames. Experiments on Occ3D-nuScenes and SemanticKITTI demonstrate that our method achieves state-of-the-art performance under the weakly supervised setting. Particularly, it achieves 29.05 mIoU and 33.6 RayIoU on Occ3D-nuScenes, which is comparable to some fully supervised ones.
Force sensing on the end effector is crucial for mobile, legged robots to adapt to varying terrain and manipulate objects in complex environments. Since efficient legs have to be light and load bearing, force sensors need to have increased load density while still providing accurate multi-axis forces. Here, we demonstrate a low-cost solution to this problem that integrates four sets of strain gauges, analog-to-digital conversion, and data processing into a single leg of a crab-scale robot. The sensor has a tested range of $\pm$50 N for contact force and $\pm$2.5 Nm for torque, which is more than double the weight of a robot made with six such legs. Then, we demonstrate that our sensor is accurate compared to standard, bulkier force gauges and precise enough to be used to differentiate terrains and even find objects buried in sand. Importantly, the sensor has the greatest force load density of any available sensor and more than twice the torque load density of the next best option. This approach to integrating sensors into the most distal appendages can be applied to many other inexpensive end effectors including robot hands, toes, feet, and other tools.
This letter presents a modular micro-UAV system based on electro-permanent magnet (EPM) technology, addressing the critical challenges of energy-constrained docking mechanisms in resource-limited micro aerial platforms. Built upon the Crazyflie 2.1 platform, our 75 g modular design (with battery) features an EPM docking mechanism that combines high holding force ($\approx$1.5N) with zero static power consumption, making it particularly suitable for micro-UAVs with severe energy and payload constraints. Unlike existing approaches that rely on either uncontrollable permanent magnets or power-intensive electromagnets, our EPM mechanism achieves controllable docking/undocking through short current pulses while maintaining connections with no continuous power draw. Each EPM unit weighs only 1.5 g while providing a 100:1 force-to-weight ratio. To accommodate the binary switching characteristics of the EPM mechanism, we implement a tailored two-phase control strategy that ensures smooth docking transitions. Experimental validation demonstrates a successful docking dynamic flight. This work provides a practical hardware solution for self-reconfigurable modular UAV arrays, with significant potential for collaborative aerial missions requiring frequent reconfiguration.
Semantic Scene Completion (SSC) is a task that simultaneously predicts the occupancy and semantic labels of the environment. Compared with separate processing, SSC leverages the coupled nature of scene completion and semantic segmentation. Although this multitask integration can utilize complementarity and correlation between tasks, it also increases the training difficulty. To address this, in this letter, we propose a Semantic Decoupling based Semantic Scene Completion (SD-SSC) network from a single depth image. The semantic segmentation task is decoupled from the semantic scene completion task, and we use 2D and 3D semantic supervision to simplify the scene completion task and improve SSC performance. Specifically, our network first performs 2D semantic segmentation on the depth image and transforms features into 3D voxel space as semantic priors. Then, the 3D SSC is performed based on the voxel features and the flipped Truncated Signed Distance Field (f-TSDF). We use multi-scale 3D semantic supervision to further enhance the semantic information and fuse semantic and geometric features through the Planar Attention Fusion Module (PAFM) to obtain accurate SSC results. The proposed SD-SSC network achieves state-of-the-art performance on the NYU dataset (51.1% mIoU) and the NYUCAD dataset (61.9% mIoU) among all single depth-image based methods. It is even better than most RGB-D fusion-based SSC methods.
Surgical workflow prediction is critical for enhancing safety and providing real-time guidance in Computer-Assisted Surgery (CAS), particularly in laparoscopic and Robot-Assisted Surgery (RAS). We propose a novel visual information-based method for predicting surgical workflows at fine temporal scales. Our model adopts a two-stage training strategy. First, a ResNet50 backbone is trained to extract robust spatial features from individual surgical frames. Then, a predictive network is trained on sequential features. This network integrates a novel Temporal Feature Recorder (TFR) module, which aggregates both local and global temporal information from observed frame sequences, together with a Cross-Attention mechanism that fuses these temporal features with spatial information. A Temporal Convolutional Network (TCN) subsequently leverages this aggregated spatiotemporal representation to predict future feature states and corresponding workflow classifications. We evaluate our method on the JIGSAWS (activity level) and Cholec80 (phase level) datasets. It achieves superior classification accuracy (60.37% ), significantly outperforming baselines on JIGSAWS, and delivers competitive accuracy (85.73% ) with the lowest prediction error (MAE: 0.3672, MSE: 0.2148) on Cholec80 compared to existing baselines.
This letter presents a unified framework that jointly predicts behavioral intentions and vectorized occupancy, leveraging them as priors to dynamically prune context information during trajectory decoding, thereby enhancing prediction accuracy, interpretability, and efficiency. While most prior work has focused on boosting the precision of multimodal trajectory prediction, explicit modeling of behavioral intentions (e.g., yielding, overtaking) remains underexplored. To this end, we employ a shared context encoder for both intention and trajectory predictions, thereby reducing structural redundancy and information loss. Moreover, we address the lack of ground-truth behavioral intention labels in mainstream datasets (Waymo, Argoverse) by auto-labeling these datasets, thus advancing the community’s efforts in this direction. We further introduce a vectorized occupancy prediction module that infers the probability of each map polyline being occupied by the target vehicle’s future trajectory. By leveraging these intention and occupancy predictions priors, our method conducts dynamic, modality-dependent pruning of irrelevant agents and map polylines in the decoding stage, effectively reducing computational overhead and mitigating noise from non-critical elements. Our approach ranks first among LiDAR-free methods on the Waymo Motion Dataset and achieves SOTA performance on the Waymo Interactive Prediction Dataset. Remarkably, even without model ensembling, our single-model framework improves the softmAP by 10% compared to the previous SOTA method, BETOP, in Waymo Interactive Prediction Leaderboard. Furthermore, the proposed framework has been successfully deployed on real vehicles, demonstrating its practical effectiveness in real-world applications.
Efficient Multi-Agent Path Finding (MAPF) is pivotal for warehouse logistics. While existing learning-based methods primarily rely on computationally intensive grid-based representations, topological maps offer a more flexible and scalable alternative - though this approach remains understudied. To address this gap, we propose a novel Multi-Agent Reinforcement Learning (MARL) framework for topological MAPF with three key innovations: (1) a graph-structured POMDP formulation utilizing our Breadth-First Neighbor-Limited Search (BFNLS) algorithm to define scalable observation/action spaces while maintaining fixed dimension; (2) a Graph Structure Awareness (GSA) model that combines spectral (eigenvalue-based) and spatial (graph convolutional network-based) analysis to integrate local subgraph features with global topological importance metrics; and (3) a cooperative MARL architecture employing Value Decomposition Networks (VDN) to explicitly model agent dependencies through graph-aware credit assignment. Simulation results show our method achieves superior success rates compared to baseline methods and planning efficiency than search-based methods, and the real-robot experiments show the effectiveness in a physical setting.
Language-guided localization within 3D environments continues to pose a significant challenge for autonomous systems, primarily due to the need for precise alignment between sparse point cloud data and inherently ambiguous natural language descriptions. To address this, we present a novel vision-language localization framework, namely FourierPlace, that leverages frequency-domain representations to enhance the alignment of complex geometric features with imprecise linguistic cues. At the core of our approach is the Frequency Fusion Enhancement (FFE) module, which converts raw point cloud data into frequency-domain signals. Complementing this, the Fourier Gate Attention (FGA) mechanism operates on these frequency-domain features to strengthen cross-modal correspondence. Furthermore, we introduce the Hierarchical Language Understanding Network (HiLUNet), which progressively refines linguistic features through a multi-stage architecture. Additionally, the Multiscale Cascade Cross-Attention (MCCA) module incorporates geometric information at multiple scales in the fine stage. Experiments conducted on the KITTI360Pose benchmark demonstrate that FourierPlace achieves state-of-the-art performance, outperforming existing methods with 3.7% improvement in Top-1 coarse retrieval accuracy and 4.0% increase in Top-1 localization accuracy within a 15-meter threshold on the test set. Our proposed framework presents a robust and scalable solution for language-guided localization in large-scale autonomous applications, including delivery robotics and augmented reality (AR) navigation systems.
In recent years, Compressed Sensing (CS) has gained significant attention as a method for acquiring high-resolution sensory data with fewer measurements than traditional Nyquist sampling. Simultaneously, autonomous robotic platforms such as drones and rovers have become valuable tools for remote sensing and environmental monitoring tasks, including temperature, humidity, and air quality measurements. This letter presents, to the best of our knowledge, the first study on exploiting the structure of CS measurement matrices to design optimized sampling trajectories for robotic environmental data collection. We introduce a Monte Carlo optimization framework that generates measurement matrices minimizing both the robot's path length and the CS reconstruction error. Central to our approach is Dictionary Learning (DL), which provides a data-driven sparsifying transform to enhance reconstruction accuracy and further reduce the number of required samples. Experiments on $NO_{2}$ pollution and temperature map reconstruction across two geographical areas demonstrate that our method can cut robot travel distance to under 10% of a full-coverage path while improving reconstruction accuracy by over fivefold compared to traditional CS methods and twofold compared to prior Informative Path Planning (IPP) techniques.
Robot-assisted in situ bioprinting offers a superior workspace-to-occupied-space ratio and enables direct deposition of bioink onto damaged tissues, surpassing the capabilities of traditional benchtop systems. However, most current platforms lack effective localization strategies to ensure accurate spatial correspondence between the end-effector and the anatomical target, often resulting in mismatches between the printed path and the actual defect. To overcome this limitation, optical tracking from computer-assisted surgery is introduced to enhance printing accuracy in in situ bioprinting procedures. By collecting fiducial points with a probe tracked by an optical camera, the patient's anatomy can be registered to the point cloud of the virtual hand model, providing a spatial reference for precise toolpath generation. The system was validated through in situ printing of two-layer grid scaffolds on a hand-defect phantom using a six-degree-of-freedom robotic arm, demonstrating high geometric fidelity and positional accuracy. These results underscore its potential for tissue reconstruction and future clinical translation.
Microscale metal fabrication for microelectronics, sensors, and MEMS devices requires precise automated positioning and shape control, particularly challenging on conductive substrates where non-transparency limits conventional vision-based positioning. This letter presents a high-precision vision-guided automated electrochemical 3D printing system that integrates meniscus-confined electrodeposition (MCED) with real-time visual feedback for accurate microscale metal fabrication in air. The system employs a cantilevered micropipette design to avoid obstructing the top-view field, combined with a vision-guided tracking algorithm using closed-loop feedback control to achieve 1.24 $\upmu$m positioning accuracy with a relative positioning error of 5.17${\%}$ on non-transparent conductive substrates. An integrated substrate calibration method based on four-point plane fitting and coordinate transformation compensates for surface irregularities and assembly errors, reducing angular deviations to 3.2${\%}$. The self-adjusting voxelated MCED approach utilizes pre-defined trajectories to ensure consistent structural geometry despite environmental variations. Experimental validation demonstrates fabrication of complex microscale structures including intersections, arrays, and controlled electrode interconnections. The integrated vision-guidance, automated calibration, and adaptive deposition control establish this system as an automated solution for refined microscale metal fabrication.
Effective navigation in dynamic three-dimensional (3D) environments is essential for autonomous uncrewed aerial vehicles (UAVs), but existing methods often lack computational efficiency and robustness. To address these challenges, this letter presents a path planning framework that combines a lightweight mapping module and an adaptive planner. A spherical obstacle map is proposed to represent dynamic environments efficiently, converting dense point clouds into sparse spherical representations while estimating obstacle motion for safe planning. An enhanced adaptive artificial potential field based on the spherical obstacle map is introduced to improve path reachability and safety, integrating an attractive force with terminal acceleration to avoid terminal local minimum, a repulsive force with a 3D vortex to improve safety, and an emergency-deflection mechanism to mitigate intermediate local minimum. In addition, an adaptive path optimization algorithm dynamically tunes planning coefficients to prioritize reachability and safety, while also guaranteeing efficiency. Extensive simulations in dynamic sphere and pedestrian environments, along with a real-world UAV experiment, demonstrate that the proposed framework outperforms existing methods in success rate and safety, while maintaining balance in other performance metrics.
In this paper, we propose a novel real-time disparity refinement method that enables precise structure perception. We construct a compact full-resolution cost volume from residuals around the initial disparity and adaptively eliminate redundant information on a per-pixel basis by leveraging the confidence. The core idea of our method comprises residual cost volume construction and an adaptive range masking strategy. The residual cost volume is constructed from refinement candidates around the initial disparity, based on the assumption that the ground-truth disparity is near the initial disparity. Compared to the conventional cost volume constructed over the entire set of disparity candidates, our approach achieves computational efficiency and maintains precise structural information by operating at full-resolution. Moreover, we propose an adaptive range masking strategy that filters refinement candidates for each pixel by leveraging confidence values. This approach effectively eliminates redundant information present in cost volumes that are composed of uniformly sampled refinement candidates. Experimental results on the Scene Flow and KITTI 2012 benchmarks demonstrate that our method achieves real-time performance and sets a new state-of-the-art among real-time stereo matching algorithms.
In multi-automated guided vehicle (AGV) environments, inefficient service placement increases energy consumption, and charging cycles, lowering battery lifespan. Consequently, minimizing energy consumption is key for maintaining operational efficiency and sustainability. Additionally, the unpredictable arrival of service requests in multi-AGV systems can lead to system saturation. However, previous research overlooked the energy costs of on-device computation, especially under dynamic service arrivals. To address these challenges, this work proposes an energy minimization service placement algorithm (EMSPA). The results demonstrate that EMSPA outperforms a baseline random selection (RS) algorithm for different numbers of AGVs, services, and tasks per service, reducing normalized energy consumption by up to 2.34% and improving mean service acceptance rates by up to 16.09% with lineal execution time overhead. Further, EMSPA outperforms a queue-aware scheduling and deadlock mitigation strategy (QASDMS) in terms of processing power ratio by over 58.94%.
Recent online methods for HD map construction directly infer local maps from sensor observations, yet suffer from limited perception range, particularly under challenging scenarios such as occlusions by large vehicles or poor visibility in rainy conditions. Inspired by human perception, which incrementally integrates previous observations to form stable prior knowledge about the environment, several methods have proposed utilizing global priors constructed from historical observations to enhance local map inference. However, existing global prior approaches often require significant storage overhead or complex post-processing, limiting their practical real-time usability. To address these challenges, we propose InstGPMap, a novel online framework that explicitly represents and maintains global prior map (GPMap) elements at the instance level. Specifically, our method explicitly leverages historical predictions instead of relying on implicit intermediate representations such as Bird's-Eye-View (BEV) features. We assign consistent instance identifiers (IDs) to map elements detected across frames, enabling direct instance-level association and updating. InstGPMap comprises two core modules: (1) the GlobalMapUpdate Module, which dynamically associates and manages GPMap elements across frames to form instance-level GPMap elements; (2) the PriorMapEncode Module, which encodes these instance-level GPMap elements into track queries, significantly enhancing real-time prediction accuracy. Extensive experiments demonstrate that InstGPMap achieves state-of-the-art performance on the nuScenes datasets, demonstrating superior accuracy and storage efficiency.
A novel path planning algorithm, RewardRRT, is proposed to address the challenge of Multi-degree-of-freedom robot path planning in narrow environments. In this approach, RewardRRT conceptualizes the sampling tree of Rapidly-exploring Random Trees (RRT) as an agent, assigning a reward value function to each sampled state. Simultaneously, the cumulative reward and reward increment are utilized as the state space, with a linear Kalman Filter applied to predict the state transitions. To enhance convergence speed, a bidirectional expansion strategy is implemented, wherein the tree with the lower cumulative reward prediction is prioritized for iteration. Here, the sampling bias is regulated using a sigmoid function and the prediction value. Finally, simulation tests in 4 distinct point cloud scenarios and a real experiment are conducted using a self-designed 21-degree-of-freedom wheeled humanoid robot. Compared to the best-performing algorithm from the Open Motion Planning Library (OMPL) in the same scenarios, RewardRRT achieves improvements in speed by 38.45%, 8.18%, 9.88%, and 14.98%, respectively. Furthermore, RewardRRT exhibits an average planning success rate of 88.25%, surpassing OMPL's best-performing algorithm by 29.75%. These results underscore the effectiveness of RewardRRT in solving path planning challenges in narrow environments.
Typical LiDAR SLAM architectures feature a front-end for odometry estimation and a back-end for refining and optimizing the trajectory and map, commonly through loop closures. However, loop closure detection in large-scale missions presents significant computational challenges due to the need to identify, verify, and process numerous candidate pairs for pose graph optimization. Keyframe sampling bridges the front-end and back-end by selecting frames for storing and processing during global optimization. This article proposes an online keyframe sampling approach that constructs the pose graph using the most impactful keyframes for loop closure. We introduce the Minimal Subset Approach (MSA), which optimizes two key objectives: redundancy minimization and information preservation, implemented within a sliding window framework. By operating in the feature space rather than 3-D space, MSA efficiently reduces redundant keyframes while retaining essential information. Evaluations on diverse public datasets show that the proposed approach outperforms naive methods in reducing false positive rates in place recognition, while delivering superior ATE and RPE in metric localization, without the need for manual parameter tuning. Additionally, MSA demonstrates efficiency and scalability by reducing memory usage and computational overhead during loop closure detection and pose graph optimization.
Many real-world tasks, such as assembly, cooking, and object handovers, require bi-manual coordination. Learning such skills via imitation remains challenging due to dataset scarcity, mainly caused by the high cost of bi-manual robotic platforms and barriers to entry in robotics software. To address these challenges, we introduce (1) OpenPyRo-A1, a low-cost, bi-manual humanoid robot priced at approximately $14 K. OpenPyRo-A1 achieves $\text{0.2}\,\text{mm}$ repeatability and supports a $\text{5}\,\text{kg}$ payload per arm, and (2) a Python-first distributed control framework for seamless teleoperation, data collection, and policy deployment, designed for ease of use; moreover, the code-base is installable via pip. We conducted imitation learning experiments in both simulation and the real world, integrating the robot with perception models, motion planning, and a large language model. The results demonstrate that OpenPyRo-A1 is a stable, user-friendly, and high-precision dual-arm platform. We expect that the OpenPyRo-A1 hardware, control system, and curated dataset of bi-manual manipulation episodes will advance affordable and scalable dual-arm robotics.
Robot-assisted therapy has long promised to advance stroke rehabilitation by delivering intensive and personalized training, yet its clinical impact remains limited. Closing the sensorimotor loop with brain–computer interfaces offers a better strategy than passive mobilization, directly linking user intent to robotic assistance and potentially driving neuroplasticity. However, a brain-computer interface requires subject-specific calibration that is time-consuming and often impractical. Moreover, brain decoding remains error-prone due to variability of neural signals, thus resulting in unintended robot actions that could reduce engagement during closed-loop control. Here, we demonstrate that a decoder trained on an expert subject can be transferred to naïve users for online control of a rehabilitation exoskeleton in a rest-versus-reaching paradigm, a functional task with clinical relevance. We then characterize error-related potentials arising from expectation mismatches between brain commands and robot actions during closed-loop control. Finally, we show that these mismatches can be reliably decoded in a subject-independent framework (mean area under the receiver operating characteristic curve: 0.77), a crucial step toward rehabilitation scenarios where collecting subject-specific error-related potential data is challenging. Our findings highlight the potential for integrating real-time error-detection to enhance human–robot interaction by correcting unintended robot behaviors, which could significantly improve rehabilitation outcomes where accurate and contingent feedback is essential.
Robotic contact manipulation involves applying controlled forces at contact points to guide an object along a desired trajectory while respecting the underlying physical interactions. This letter presents a novel framework that integrates dynamic modeling and Reinforcement Learning (RL) to achieve robust object pushing with a redundant robotic arm. First, a comprehensive dynamic contact model is formulated, incorporating unilateral constraints and a box friction model to capture the nonlinearities present in real-world contact dynamics. Second, the model is extended to handle multiple simultaneous point contacts, enabling effective trajectory planning and tracking for a redundant robotic arm in multi-contact pushing tasks. Third, an RL strategy is introduced as a residual module that augments a model-based controller to improve pushing performance. Simulation and real-world experiments with a Kinova Gen2 arm demonstrate that the proposed method achieves accurate trajectory following and stable contact interactions, significantly outperforming traditional PD control strategies in dynamic pushing scenarios.
As the core system in autonomous vehicles, Autonomous Driving Systems (ADSs) are highly configurable, where misconfigurations can significantly impact control safety, reliability, and overall performance. Although several testing methods have been proposed to detect misconfiguration-induced violations, most primarily focus on identifying the presence of incorrect configurations rather than pinpointing the specific configuration parameters responsible for these violations. However, identifying and understanding the root causes of misconfiguration-induced violations is essential for effective debugging and rapid system recovery. In this paper, we propose a CounterFactual-based Root Cause Analysis (RCA) method, CF-RCA, to identify the root causes of misconfiguration-induced violations by performing counterfactual attribution. Specifically, CF-RCA first formalizes the relationships between various configuration parameters and violations by learning a structural causal model. Then, based on the causal model, CF-RCA employs counterfactual attribution to estimate the impact of each configuration parameter on violations and identifies the most impactful parameter as the RCA result. We evaluate CF-RCA on the MetaDrive simulator with 12,926 driving scenarios, and the results show CF-RCA can efficiently identify violation-causing parameters, achieving 98.3% accuracy. Finally, the experimental comparisons with existing methods and tests across different ADSs further demonstrate the superiority and generalizability of CF-RCA.
This paper presents a portable, cable-driven upper-limb rehabilitation robot designed for home-based activities of daily living (ADLs). The proposed robot is intended to assist hemiplegia patients with ADL-based rehabilitation, such as drinking water from a cup, in home settings. The robot features a rehabilitation control framework based on dynamic movement primitives (DMPs) that personalizes spatial profiles from healthy-limb demonstrations. It modulates movement velocity and inter-joint coordination online via human–robot interaction. The system was evaluated in a drinking task with ten healthy participants. Compared with a classic (non-adaptive) DMP controller, the proposed robot increased voluntary participation (EMG amplitude: +64.71%) and reduced robotic assistance (assistance force: -56.10%), while also improving inter-joint coordination. These results indicate that the proposed system supports personalized, task-specific assistance and holds promise for home-based upper-limb rehabilitation.
Robotic dressing assistance has the potential to improve the quality of life for individuals with limited mobility. Existing solutions predominantly rely on rigid robotic manipulators, which have challenges in handling deformable garments and ensuring safe physical interaction with the human body. Prior robotic dressing methods require excessive operation times, complex control strategies, and constrained user postures, limiting their practicality and adaptability. This letter proposes a novel soft robotic dressing system, the Self-Wearing Adaptive Garment (SWAG), which uses an unfurling and growth mechanism to facilitate autonomous dressing. Unlike traditional approaches, the SWAG conforms to the human body through an unfurling-based deployment method, eliminating skin-garment friction and enabling a safer and more efficient dressing process. We present the working principles of the SWAG, introduce its design and fabrication, and demonstrate its performance in dressing assistance. The proposed system demonstrates effective garment application across various garment configurations, presenting a promising alternative to conventional robotic dressing assistance.
End-to-end paradigm has emerged as a promising approach to autonomous driving. However, existing single-agent end-to-end pipelines are often constrained by occlusion and limited perception range, resulting in hazardous driving. Furthermore, their closed-box nature prevents the interpretability of the driving behavior, leading to an untrustworthiness system. To address these limitations, we introduce Risk Map as Middleware (RiskMM) and propose an interpretable cooperative end-to-end driving framework. The risk map learns directly from the driving data and provides an interpretable spatiotemporal representation of the scenario from the upstream perception and the interactions between the ego vehicle and the surrounding environment for downstream planning. RiskMM first constructs a multi-agent spatiotemporal representation with unified Transformer-based architecture, then derives risk-aware representations by modeling interactions among surrounding environments with attention. These representations are subsequently fed into a learning-based Model Predictive Control (MPC) module. The MPC planner inherently accommodates physical constraints and different vehicle types and can provide interpretation by aligning learned parameters with explicit MPC elements. Evaluations conducted on the real-world V2XPnP-Seq dataset confirm that RiskMM achieves superior and robust performance in risk-aware trajectory planning, significantly enhancing the interpretability of the cooperative end-to-end driving framework. The codebase will be released to facilitate future research in this field.
This letter presents a novel decentralized approach for achieving emergent behavior in multi-agent systems with minimal information sharing. Based on prior work in simple orbits, our method produces a broad class of stable, periodic trajectories by stabilizing the system around a Lie group-based geometric embedding. Employing the Lie group SO(3), we generate a wider range of periodic curves than existing quaternion-based methods. Furthermore, we exploit SO(3) properties to eliminate the need for velocity inputs, allowing agents to receive only position inputs. We also propose a novel phase controller that ensures uniform agent separation, along with a formal stability proof. Validation through simulations and experiments showcases the method's adaptability to complex low-level dynamics and disturbances.
Hugging provides psychological benefits and is common in supportive dialogue, leading to probabilistic models for a robot’s intra-hug gestures (e.g., patting, rubbing). However, the psychological effects of this human-derived model have not been adequately verified. In this study, we aimed to clarify the psychological effects by implementing a model and a dialogue scenario for organizing user worries and goals in the huggable robot. We experimentally evaluated the effectiveness of the model-based system by comparing it with a system that performs random gestures according to uniform distributions without human characteristics. The results showed that participants who used the system with the model perceived the robot as significantly easier to use, felt that the robot was friendlier, and rated the overall goodness of the interaction session higher. Additionally, the model demonstrated a significant reduction in negative user comments regarding the frequency of gestures. Our quantitative and qualitative findings will help design interactions with huggable robots for mental health support.
Transformable robots adapt to various environments by changing their shape or functionality. The robots are able to further expand their task range by replacing their end-effectors (EEs). In this letter, we propose an adaptive-limb transformable robot capable of replacing multiple types of mounted EEs. First, 7 degrees of freedom (DoF) limbs can reach multiple types of EEs mounted on the front body surface, and replace them using that single limb without relying on external devices. Second, we develop a compact Lock-Spin mechanism that integrates a locking mechanism into the rotor of the motor to enable continuous rotation. Experimental results demonstrate that the proposed transformable robot can replace EEs on-site and that this replacement enables locomotion and manipulation adapted to the environment.
Robust and accurate localization technologies are crucial for autonomous vehicles and mobile robots. Precisely perceiving the 3D environment and its semantic attributes in the real world greatly enhances the ability of these systems to perform localization tasks effectively. This letter proposes GS-Loc, the first visual relocalization framework based on a vision foundation model that utilizes 3D Gaussian Splatting (3DGS) as a map representation. GS-Loc leverages the powerful visual feature extraction capabilities of a vision foundation model as a Global Locator, performing global descriptor matching between query images and a reference image database. This enables each query image to be matched with the most optimal candidate image in the map and an initial pose within the map. Leveraging the high-dimensional feature embeddings extracted by vision foundation models, we propose the Foundation Feature-Consistent Matcher (FFC-Matcher), which establishes a feature similarity filtering mechanism to extract geometrically consistent feature subsets from high-quality images and depth maps rendered by the 3DGS model, thereby enabling pose refinement through 2D-3D feature matching. To enhance the efficiency of GS-Loc, we introduce sparsification of the 3DGS model by retaining only the Gaussians that contain relevant feature information. Extensive evaluations on the KITTI360 and ROBOTCAR datasets show that even with a model size reduced by approximately 50%, our approach still achieves state-of-the-art performance compared to other methods.
In the field of safe navigation for mobile robots, control barrier functions (CBFs) have garnered significant attention due to their ability to transform complex safety constraints into real-time solvable optimization problems. In this letter, we propose a novel Lyapunov-based CBF framework. It offers the following key advantages: (1) Using a single Control Lyapunov Function (CLF), this method synthesizes spatially shifted CBFs to construct an expansive safe invariant set in obstacle-dense environments. (2) The framework is capable of incorporating existing approaches for constructing quadratic CLF, making it applicable to a wide range of complex nonlinear systems and enhancing its generality and extensibility. (3) It enables real-time synthesis of CBFs, and ensures safety in large-scale 3D environments through efficient CBF-based quadratic programming (CBF-QP). (4) The method ensures safety while inheriting the stability properties of the CLF, allowing the asymptotic convergence of the system state to equilibrium, thus unifying safety and motion stability. To validate efficacy, we rigorously tested the framework in both simulations and hardware experiments.
This letter presents Decoupled STAR (DSTAR), a novel reconfigurable robot fitted with a sprawling mechanism that allows the wheel rotation axes to vary relative to the body, and two independently activated four-bar extension mechanisms (FBEM). These mechanisms enable the robot to move its center of mass (COM) in any direction, and increase its maneuvering capabilities by selecting a variety of locomotion gaits. A kinematic model of the robot and a quasi-static force analysis are used to optimize the design and evaluate its motor requirements. Experiments demonstrate that combining the sprawling mechanism with FBEM enables the DSTAR to both crawl and drive, overcome a wide range of challenging obstacles, and improve its climbing capability by 66% compared to symmetric FBEM designs (such as RSTAR). The robot can crawl and maneuver over rough terrain using its unique turtle-gait method, roll sideways to surmount wall obstacles up to 20 cm high, travel horizontally across uneven ground, and switch between wheels and whegs to adapt to different terrain types, including dirt, stones, and grass. (see attached video).
Robots in uncertain realworld environments must perform both goaldirected and exploratory actions. However, most deep learning-based control methods neglect exploration and struggle under uncertainty. To address this, we adopt deep active inference, a framework that accounts for human goal-directed and exploratory actions. Yet, conventional deep active inference approaches face challenges due to limited environmental representation capacity and high computational cost in action selection. We propose a novel deep active inference framework that consists of a world model, an action model, and an abstract world model. The world model encodes environmental dynamics into hidden state representations at slow and fast timescales. The action model compresses action sequences into abstract actions using vector quantization, and the abstract world model predicts future slow states conditioned on the abstract action, enabling low-cost action selection. We evaluate the framework on object-manipulation tasks with a real-world robot. Results show that it achieves high success rates across diverse manipulation tasks and switches between goal-directed and exploratory actions in uncertain settings, while making action selection computationally tractable. These findings highlight the importance of modeling multiple timescale dynamics and abstracting actions and state transitions.
Loco-manipulation demands coordinated whole-body motion to manipulate objects effectively while maintaining locomotion stability, presenting significant challenges for both planning and control. In this work, we propose a whole-body model predictive control (MPC) framework that directly optimizes joint torques through full-order inverse dynamics, enabling unified motion and force planning and execution within a single predictive layer. This approach allows emergent, physically consistent whole-body behaviors that account for the system’s dynamics and physical constraints. We implement our MPC formulation using open software frameworks (Pinocchio and CasADi), along with the state-of-the-art interior-point solver Fatrop. In real-world experiments on a Unitree B2 quadruped equipped with a Unitree Z1 manipulator arm, our MPC formulation achieves real-time performance at 80 Hz. We demonstrate loco-manipulation tasks that demand fine control over the end-effector’s position and force to perform real-world interactions like pulling heavy loads, pushing boxes, and wiping whiteboards.
Soft robots possess an inherent mechanical compliance that enables safe and adaptive interaction with delicate structures and confined environments. However, their virtually infinite degrees of freedom introduce significant challenges in achieving precise and repeatable motion control. In laser-assisted surgeries, it is critical to ensure both safe tissue interaction and high precision laser targeting for effective and controlled intervention. In this work, a soft–rigid hybrid robot with onboard proprioceptive optical sensing for laser-assisted surgery is presented. The robot guides a simulated laser via fiber steering. Two model-free controllers, an adaptive Jacobian and a multi-layer perceptron (MLP), were implemented and compared for both position and speed control. Under position control, the robot tracked complex planar and spatial trajectories with an average error below 0.14 mm. Using speed control, the robot followed a circular trajectory at up to 1 mm/s with sub-millimeter accuracy. It also maintained stable behavior under unexpected external disturbances. Clinical feasibility was demonstrated by operating the robot on an in-vitro tissue phantom doped with a photochromic dye. Direct visualization of the simulated laser path on the tissue phantom demonstrated that the robot can maintain an approximately constant tip-to-target distance and spot size along the trajectory.
This letter provides a systemic method for continuously identifying human joint impedance parameters during physical interaction with a robotic system without the need for external sensors. To this end, several identification methods combining payload identification methods with online identification techniques are proposed and compared. The passive behavior of the human joint is modeled by classical spring-damper-inertia equations. Monte Carlo simulations are first carried out to compare the expected performance of the proposed methods. Next, experimental validations are conducted on two robotic systems interacting with elements simulating a passive human operator with varying parameters. Results show that the proposed identification combining a separate identification of robot and human parameters with an exponentially-weighted-past recursive least squares method gives the best overall results in terms of accuracy. A preliminary example of identifying the wrist mechanical impedance of two non-disabled subjects during flexion/extension motions is provided. The proposed methods show promising results in continuous monitoring of the human operator’s state during physical human-robot interaction, which could be used to detect long-term fatigue or rehabilitation performance.
LiDAR-inertial odometry (LIO) has been widely used in robotics due to its high accuracy. However, its performance degrades in degenerate environments, such as long corridors and high-altitude flights, where LiDAR measurements are imbalanced or sparse, leading to ill-posed state estimation. In this letter, we present LODESTAR, a novel LIO method that addresses these degeneracies through two key modules: degeneracy-aware adaptive Schmidt-Kalman filter (DA-ASKF) and degeneracy-aware data exploitation (DA-DE). DA-ASKF employs a sliding window to utilize past states and measurements as additional constraints. Specifically, it introduces degeneracy-aware sliding modes that adaptively classify states as active or fixed based on their degeneracy level. Using Schmidt-Kalman update, it partially optimizes active states while preserving fixed states. These fixed states influence the update of active states via their covariances, serving as reference anchors–akin to a lodestar. Additionally, DA-DE prunes less-informative measurements from active states and selectively exploits measurements from fixed states, based on their localizability contribution and the condition number of the Jacobian matrix. Consequently, DA-ASKF enables degeneracy-aware constrained optimization and mitigates measurement sparsity, while DA-DE addresses measurement imbalance. Experimental results show that LODESTAR outperforms existing LiDAR-based odometry methods and degeneracy-aware modules in terms of accuracy and robustness under various degenerate conditions.
In this letter, we propose a safety-critical compliant control strategy designed to strictly enforce interaction force constraints during the physical interaction of robots with environments. The interaction force constraint is interpreted as a new force-constrained control barrier function (FC-CBF) by exploiting the generalized contact model with the prior information of the environment, e.g., the prior stiffness, for robot kinematics. The difference between the real environment and the generalized contact model is approximated by constructing an uncertainty observer, and its estimation error is quantified on the basis of Lyapunov theory. By interpreting strict interaction safety specifications as a dynamic constraint and restricting the desired joint angular velocities in kinematics, the proposed approach modifies nominal compliant controllers using quadratic programming, ensuring adherence to interaction force constraints in partially uncertain environments. The strict force constraint and the stability of the closed-loop system are rigorously analyzed. Experimental tests using a UR3e industrial robot with different environments verify the effectiveness of the proposed method in achieving the force constraints.
This letter presents the mechatronic design and implementation of a hybrid-driven disc-shaped autonomous underwater vehicle (HD-AUV). The hybrid-driven system integrates a buoyancy adjustment system and propeller thrusters, enabling the HD-AUV to achieve both high maneuverability motion and energy-efficient gliding. These capabilities correspond to two distinct motion modes: AUV mode and glider mode. In AUV mode, the HD-AUV leverages the rotational symmetry of its disc-shaped design to achieve four degree-of-freedom (4-DOF) motion control. This configuration, supported by four propeller thrusters, facilitates high-maneuverability actions, including fixed-point hovering and in-place turning. In glider mode, the integration of rotatable dorsal fins and horizontal propeller thrusters enables the HD-AUV to transition seamlessly between diving and ascending phases without the conventional mass-shifting mechanisms used in torpedo-type gliders. Numerical simulations are conducted to evaluate the steady glide performance of the HD-AUV, focusing on lift-to-drag ratios and hydrodynamic coefficients. Comprehensive pool experiments, encompassing multi-DOF maneuvers and gentle gliding, demonstrate the exceptional locomotion capabilities of the HD-AUV and validate the accuracy of the proposed dynamic model. These hybrid motion modes hold significant promise for underwater operations in complex seafloor environments.
High-quality motion planning for mobile manipulators remains a challenging task due to the high dimensionality and complex constraints involved. While existing methods perform well in specific scenarios, their efficiency and the quality of the resulting paths and trajectories often degrade in dense or irregular environments. In this letter, we propose a unified motion planning framework for mobile manipulators. It consists of two parts: a sampling-based path planner with feasibility-aware focus regions for efficient path generation, and a WA-QN optimizer that adaptively filters and aggregates curvature information to enhance convergence in high-dimensional spaces. Comprehensive experiments in simulated and real-world scenarios demonstrate that our method achieves faster planning, higher success rates, and superior path and trajectory quality, while exhibiting strong generalization across diverse environments.
Residual Reinforcement Learning (RL) is a popular approach for adapting pretrained policies by learning a lightweight residual policy that provides corrective actions. While Residual RL is more sample-efficient than finetuning the entire base policy, existing methods struggle with sparse rewards and are designed for deterministic base policies. We propose two improvements to Residual RL that further enhance its sample efficiency and make it suitable for stochastic base policies. First, we leverage uncertainty estimates of the base policy to focus exploration on regions in which the base policy is not confident. Second, we propose a simple modification to off-policy residual learning that allows it to observe base actions and better handle stochastic base policies. We evaluate our method with both Gaussian-based and Diffusion-based stochastic base policies on tasks from Robosuite and D4RL, and compare against state-of-the-art finetuning methods, demo-augmented RL methods, and other Residual RL methods. Our algorithm significantly outperforms existing baselines in a variety of simulation benchmark environments. We also deploy our learned policies in the real world to demonstrate their robustness with zero-shot sim-to-real transfer.
Monocular visual odometry (VO) is accurate in controlled settings yet drifts sharply under aggressive motion and sensor noise. We offer a fundamental rethinking of VO robustness as a training-schedule problem rather than an architectural challenge, introducing a novel dual-paradigm curriculum learning framework that operates at both trajectory and loss-component levels. (i) A motion-based curriculum orders trajectories by measured motion complexity. (ii) A hierarchical component curriculum adaptively re-weights optical-flow, pose, and rotation losses via Self-Paced and in-training Reinforcement Learning (RL) schedulers. Integrated into an unmodified DPVO baseline, these strategies cut TartanAir ATE by 33% with only 31% extra training wall-time, and reach baseline accuracy 47% faster (Self-Paced). Without fine-tuning, the same models improve zero-shot performance on EuRoC (13% ATE reduction), TUM-RGBD (9% ; 46% on dynamic scenes), KITTI (21% ), and ICL-NUIM (32% ). We show that explicit difficulty progression or adaptive loss weighting provides a practical, zero-inference-overhead path to robust monocular VO and could extend to other geometric vision tasks.
Robots executing iterative tasks in complex, uncertain environments require control strategies that balance robustness, safety, and high performance. This letter introduces a safe information-theoretic learning model predictive control (SIT-LMPC) algorithm for iterative tasks. Specifically, we design an iterative control framework based on an information-theoretic model predictive control algorithm to address a constrained infinite-horizon optimal control problem for discrete-time nonlinear stochastic systems. An adaptive penalty method is developed to ensure safety while balancing optimality. Trajectories from previous iterations are utilized to learn a value function using normalizing flows, which enables richer uncertainty modeling compared to Gaussian priors. SIT-LMPC is designed for highly parallel execution on graphics processing units, allowing efficient real-time optimization. Benchmark simulations and hardware experiments demonstrate that SIT-LMPC iteratively improves system performance while robustly satisfying system constraints.
This letter presents empirical research on the non-reproducibility of light detection and ranging sensor (LiDAR)-inertial odometry (LIO) systems. Although the LIO community has made commendable efforts toward reproducible localization accuracy, noteworthy non-reproducibility remains, thus hindering a fair evaluation of method effectiveness. To better understand such non-reproducibility, we first define non-reproducibility and introduce a quantitative criterion to identify noteworthy non-reproducibility. We then propose five significant non-deterministic implementations that are included in state-of-the-art LIO systems and present solutions for modifying these non-deterministic implementations into deterministic ones. A general procedure is also introduced to identify and pinpoint non-deterministic implementations, regardless of whether they are covered in this letter. Extensive experiments demonstrate that the non-deterministic implementations are the major or potentially sole causes of non-reproducibility under constant experimental conditions. Additionally, the non-reproducibility is noteworthy in datasets obtained from low-vertical-resolution LiDARs or recorded in geometrically degenerate scenes.
Recent foundation models demonstrate strong generalization capabilities in monocular depth estimation. However, directly applying these models to Full Surround Monocular Depth Estimation (FSMDE) presentstwo major challenges: (1) high computational cost, which limits real-time performance, and (2) difficulty in estimating metric-scale depth, as these models are typically trained to predict only relative depth. To address these limitations, we propose a novel knowledge distillation strategy that transfers robust depth knowledge from a foundation model to a lightweight FSMDE network. Our approach leverages a hybrid regression framework combining the knowledge distillation scheme–traditionally used in classification–with a depth binning module to enhance scale consistency. Specifically, we introduce a cross-interaction knowledge distillation scheme that distills the scale-invariant depth bin probabilities of a foundation model into the student network while guiding it to infer metric-scale depth bin centers from ground-truth depth. Furthermore, we propose view-relational knowledge distillation, which encodes structural relationships among adjacent camera views and transfers them to enhance cross-view depth consistency. Experiments on DDAD and nuScenes demonstrate the effectiveness of our method compared to conventional supervised methods and existing knowledge distillation approaches. Moreover, our method achieves a favorable trade-off between performance and efficiency, meeting real-time requirements.
This letter presents a data-driven control optimization framework for flexible joint robots (FJR) based on frequency response function (FRF) data, enabling automated controller synthesis without explicit model identification. Unlike conventional model-based approaches that rely on accurate parameter estimation, the proposed method directly utilizes measured FRF data and formulates the controller design as a convex optimization problem. The controller maximizes control bandwidth while ensuring stability across a wide range of configurations. Experimental validation on a FJR demonstrates superior tracking accuracy, vibration suppression, and robustness compared to model-based methods. Furthermore, a high-speed drumming task demonstrates the ability of the controller to handle repeated impacts and inertia variations, highlighting the potential of FRF-based control for the fast and precise operation of flexible robotic systems.