Science Sub-Issue Cover Focus: Robots are the main surgery, and 8 surgeries have zero mistakes!

  

  Last week, the latest cover article of Science’s top robotics magazine, Science Robotics, published its major results: SRT-H (Hierarchical Surgical Robot Transformer), a surgical robot from the Johns Hopkins University team, independently completed the key steps of 8 cholecystectomy, with a success rate of 100%!

  A netizen asked, "Is the surgeon going to be replaced?"

  The surgeon said: Don't panic at all...


  

  It is worth noting that SRT-H is not completely innovative at the hardware level, but is based on the da Vinci Research Kit (dVRK) Si dual robotic arm system modified by the da Vinci Si surgical robot from Intuitive Surgical Company.

  Its core breakthrough lies in software and artificial intelligence architecture: through a language-guided imitation learning framework, "learning" the process from human surgical videos, and relying on a layered control system to achieve independent decision-making - simply put, it is like a "robot surgeon", which can not only understand the surgical scenario, but also independently plan and execute actions.


  

  "Surgery Thinking" driven by layered control and language

  The difficulties faced by this research can be regarded as "layers of levels".

  First of all, there are huge differences in appearance, anatomy and morphology in real surgery, such as the diameter, length, spacing of the bile duct and artery, which is a great test for the robot's perception ability.

  So, how does SRT-H break through these difficulties? Its core innovation lies in the combination of hierarchical architecture and language conditional imitation learning.


  

  Its high-level (HL) control module is similar to "Commander", based on a large Transformer model that is homologous to ChatGPT, and can generate task instructions (such as "Clamping the gallbladder duct") and correction instructions (such as "Re-Clamping the Left Move the Robot Arm") in natural language. These instructions not only plan surgical steps, but also correct errors in real-time when errors occur at low levels are executed. The low-level (LL) control module is like the "executor". After receiving language instructions, it generates a specific motion trajectory, and controls the robotic arm to complete fine operations such as grabbing, clamping, and cutting.

  The cleverness of this design lies in the "bridge function" of language: language is not only a medium for communication at high and low levels, but also provides an intuitive interface for human intervention - doctors can adjust operations in natural language at any time, and these interventions will be "remembered" by the system through DAgger-style cycles and are used for continuous learning and optimization.

  What is even more rare is that SRT-H relies solely on ordinary RGB image training, without the need for depth sensors or complex segmentation modules, greatly reducing the hardware threshold for clinical transformation.

  How does SRT-H perform?

  SRT-H independently completed all 17 tasks on 8 new ex vivo pig gallbladders, with a success rate of 100%. The average operation was about 5 minutes and 17 seconds, and it was able to automatically correct about 6 errors. Compared with other variants, such as versions without high-level DAgger data or lack of wrist cameras, SRT-H is significantly better in success rate and recovery, which confirms the importance of layered design and wrist cameras.



  

  17 How to achieve accurate operation in step?

  In the cholecystectomy experiment, SRT-H performs the most critical stage in the operation - isolating and resecting the gallbladder from the liver, which includes 17 consecutive steps, each step reflects the accuracy of independent decision-making:

  Identify anatomical structure: Through the built-in camera system, the robot can automatically locate the boundary between the gallbladder and the liver, gallbladder duct, artery and other key structures based on the differences in tissue color and morphology. Even if the diameter and arterial spacing of different samples are significant, it can be accurately identified.

  Accurate grasping and traction: According to surgical specifications, the robotic arm uses grasping forceps to grasp the bottom of the gallbladder and lift it, tightening the tissue to expose the pipes and blood vessels. If the initial grasping position is deviated, the robot will automatically adjust and reposition to ensure stable traction - this step requires fine coordination between hands, similar to the logic of human doctors' "left hand fixed and right hand operation".

  Close blood vessels and pipes: The robot needs to place 6 titanium clips in specific locations in the gallbladder duct and artery. It strictly follows the surgical specifications and clamps in the correct position in turn. Each clip placed will be visually verified. If there is a deviation, it will be adjusted immediately to ensure that "no error clamping and no omissions are left out."

  Cutting and removing gallbladder: After clamping is completed, the robot manipulates microscissors to separate the gallbladder and liver along the predetermined incision line, and monitors surrounding tissues in real time during the process to avoid damaging adjacent structures such as the liver. Ultimately, the gallbladder is completely free and the entire process does not require manual intervention.


  

  Compared with human experts (about 12-19 seconds/subtask), although SRT-H completes tasks 30%-60%, its trajectory is shorter, its jitter is smaller, and its movements are smoother. This means it has unique advantages in accuracy and consistency, and may become a right-hand assistant for surgeons in the future, especially in scenarios where repeated operations or manpower shortages are needed.

  How far is it to be clinically applied?

  It is close to say it is close, and it is far away.

  Currently, SRT-H has only been practiced on the "existing gallbladder" - it means that it has left the body's tissues, no blood flow, no breathing, and no organs move, which is equivalent to performing surgery on the "static model". But if you really do it for living people, there will be more trouble: the organs will be beating with the breathing and the heartbeat, and may suddenly bleed and blur the vision, and even the patient's anatomy looks completely different from the "textbook" (such as the gallbladder ducts walk around the arteries).

  Why does it practice cholecystectomy first? Because the "looking" of the gallbladder is relatively regular and the steps are fixed; if you change the appendix surgery, it sometimes takes several hours to find the appendix, and the robot still can't handle it now. And now it only does "clip clamp + cut off", and has not practiced complex steps such as peeling off the gallbladder and stopping bleeding.

  Even if you practice all the steps in the future, you still have to pass two levels: one is whether you can handle various accidents (such as sudden heavy bleeding), and the other is ethics and trust - if the robot cuts the wrong way, who is responsible? Just like the current smart driving, it is very powerful in theory, but do you dare to let it go completely?

  However, the research team optimistically predicts that as the technology matures, small-scale human experiments are expected to be carried out within 10 years. In the short term, autonomous robots are still limited to laboratories, and the mainstream clinical practice is still "human-controlled robot assisted surgery"; but in the long run, their adaptability and safety will gradually improve.

  Back to the initial question, will robots replace surgeons?

  Now, it is more likely to be a "cooperation model".

  Today's robots can help doctors reduce the number of assistants; in the future, robots may be able to cover all auxiliary activities such as "retracting hooks, exposing vision, and clamping things", and doctors concentrate on making key decisions.

  The assistant position may be replaced, but the surgeon is more important - after all, it depends on people to judge whether to cut or not? Just like the chefs today, assistants may be replaced by machines, but the chef who takes charge of the spoon will always be the soul.