Last week, the latest cover article of Science’s top robotics magazine,
Science Robotics, published its major results: SRT-H (Hierarchical Surgical
Robot Transformer), a surgical robot from the Johns Hopkins University team,
independently completed the key steps of 8 cholecystectomy, with a success rate
of 100%!
A netizen asked, "Is the surgeon going to be replaced?"
The surgeon said: Don't panic at all...

It is worth noting that SRT-H is not completely innovative at the hardware
level, but is based on the da Vinci Research Kit (dVRK) Si dual robotic arm
system modified by the da Vinci Si surgical robot from Intuitive Surgical
Company.
Its core breakthrough lies in software and artificial intelligence
architecture: through a language-guided imitation learning framework, "learning"
the process from human surgical videos, and relying on a layered control system
to achieve independent decision-making - simply put, it is like a "robot
surgeon", which can not only understand the surgical scenario, but also
independently plan and execute actions.

"Surgery Thinking" driven by layered control and language
The difficulties faced by this research can be regarded as "layers of
levels".
First of all, there are huge differences in appearance, anatomy and
morphology in real surgery, such as the diameter, length, spacing of the bile
duct and artery, which is a great test for the robot's perception ability.
So, how does SRT-H break through these difficulties? Its core innovation
lies in the combination of hierarchical architecture and language conditional
imitation learning.

Its high-level (HL) control module is similar to "Commander", based on a
large Transformer model that is homologous to ChatGPT, and can generate task
instructions (such as "Clamping the gallbladder duct") and correction
instructions (such as "Re-Clamping the Left Move the Robot Arm") in natural
language. These instructions not only plan surgical steps, but also correct
errors in real-time when errors occur at low levels are executed. The low-level
(LL) control module is like the "executor". After receiving language
instructions, it generates a specific motion trajectory, and controls the
robotic arm to complete fine operations such as grabbing, clamping, and
cutting.
The cleverness of this design lies in the "bridge function" of language:
language is not only a medium for communication at high and low levels, but also
provides an intuitive interface for human intervention - doctors can adjust
operations in natural language at any time, and these interventions will be
"remembered" by the system through DAgger-style cycles and are used for
continuous learning and optimization.
What is even more rare is that SRT-H relies solely on ordinary RGB image
training, without the need for depth sensors or complex segmentation modules,
greatly reducing the hardware threshold for clinical transformation.
How does SRT-H perform?
SRT-H independently completed all 17 tasks on 8 new ex vivo pig
gallbladders, with a success rate of 100%. The average operation was about 5
minutes and 17 seconds, and it was able to automatically correct about 6 errors.
Compared with other variants, such as versions without high-level DAgger data or
lack of wrist cameras, SRT-H is significantly better in success rate and
recovery, which confirms the importance of layered design and wrist cameras.

17 How to achieve accurate operation in step?
In the cholecystectomy experiment, SRT-H performs the most critical stage
in the operation - isolating and resecting the gallbladder from the liver, which
includes 17 consecutive steps, each step reflects the accuracy of independent
decision-making:
Identify anatomical structure: Through the built-in camera system, the
robot can automatically locate the boundary between the gallbladder and the
liver, gallbladder duct, artery and other key structures based on the
differences in tissue color and morphology. Even if the diameter and arterial
spacing of different samples are significant, it can be accurately
identified.
Accurate grasping and traction: According to surgical specifications, the
robotic arm uses grasping forceps to grasp the bottom of the gallbladder and
lift it, tightening the tissue to expose the pipes and blood vessels. If the
initial grasping position is deviated, the robot will automatically adjust and
reposition to ensure stable traction - this step requires fine coordination
between hands, similar to the logic of human doctors' "left hand fixed and right
hand operation".
Close blood vessels and pipes: The robot needs to place 6 titanium clips in
specific locations in the gallbladder duct and artery. It strictly follows the
surgical specifications and clamps in the correct position in turn. Each clip
placed will be visually verified. If there is a deviation, it will be adjusted
immediately to ensure that "no error clamping and no omissions are left
out."
Cutting and removing gallbladder: After clamping is completed, the robot
manipulates microscissors to separate the gallbladder and liver along the
predetermined incision line, and monitors surrounding tissues in real time
during the process to avoid damaging adjacent structures such as the liver.
Ultimately, the gallbladder is completely free and the entire process does not
require manual intervention.

Compared with human experts (about 12-19 seconds/subtask), although SRT-H
completes tasks 30%-60%, its trajectory is shorter, its jitter is smaller, and
its movements are smoother. This means it has unique advantages in accuracy and
consistency, and may become a right-hand assistant for surgeons in the future,
especially in scenarios where repeated operations or manpower shortages are
needed.
How far is it to be clinically applied?
It is close to say it is close, and it is far away.
Currently, SRT-H has only been practiced on the "existing gallbladder" - it
means that it has left the body's tissues, no blood flow, no breathing, and no
organs move, which is equivalent to performing surgery on the "static model".
But if you really do it for living people, there will be more trouble: the
organs will be beating with the breathing and the heartbeat, and may suddenly
bleed and blur the vision, and even the patient's anatomy looks completely
different from the "textbook" (such as the gallbladder ducts walk around the
arteries).
Why does it practice cholecystectomy first? Because the "looking" of the
gallbladder is relatively regular and the steps are fixed; if you change the
appendix surgery, it sometimes takes several hours to find the appendix, and the
robot still can't handle it now. And now it only does "clip clamp + cut off",
and has not practiced complex steps such as peeling off the gallbladder and
stopping bleeding.
Even if you practice all the steps in the future, you still have to pass
two levels: one is whether you can handle various accidents (such as sudden
heavy bleeding), and the other is ethics and trust - if the robot cuts the wrong
way, who is responsible? Just like the current smart driving, it is very
powerful in theory, but do you dare to let it go completely?
However, the research team optimistically predicts that as the technology
matures, small-scale human experiments are expected to be carried out within 10
years. In the short term, autonomous robots are still limited to laboratories,
and the mainstream clinical practice is still "human-controlled robot assisted
surgery"; but in the long run, their adaptability and safety will gradually
improve.
Back to the initial question, will robots replace surgeons?
Now, it is more likely to be a "cooperation model".
Today's robots can help doctors reduce the number of assistants; in the
future, robots may be able to cover all auxiliary activities such as "retracting
hooks, exposing vision, and clamping things", and doctors concentrate on making
key decisions.
The assistant position may be replaced, but the surgeon is more important -
after all, it depends on people to judge whether to cut or not? Just like the
chefs today, assistants may be replaced by machines, but the chef who takes
charge of the spoon will always be the soul.