A truly capable AI agent needs both smarts and real action-taking ability-it has to deeply understand what the user wants and then actually get things done. That's how Yuan Yuan, dean of Alibaba Research Institute, described the ideal form of AI agents during a recent seminar on "AI Agent Behavior Safety and Development."

She stressed that while AI agents hold huge promise for turning AI tech into real money and dramatically improving user experience, their rollout shouldn't upend existing rules, governance, or business ecosystems. Instead, everything should happen in a safe, controllable way, with deep collaboration across the industry so that phone makers and app developers can both share in the benefits of advanced models.
Over the past year, the big question in the market has been: "Can AI really handle tasks for us?" We've seen different approaches emerge.
Some products try to "take over" phone operations by understanding what's on the screen and mimicking human taps and swipes to do things like edit videos, book tickets, or order food across different apps. These are usually called GUI Agents (short for Graphical User Interface Agents).
Technically, they work by having the AI "see" the screen, figure it out, and then simulate clicks and gestures. But this immediately raises a bunch of tricky issues: Who gives it permission? Who's responsible if it messes up? What services can it access? And who keeps it in check?
At the seminar, experts noted that this screen-based approach has real short-term value-it lets AI agents jump into everyday use without forcing big changes to existing apps. But in the long run, it has built-in limits around reliability, speed, and how well it can be governed. Many see it as more of a temporary bridge than the endgame.
Jiao Haitao, a professor at China University of Political Science and Law, argued that permissions for AI agents should be handled scene by scene. Critical actions need a second confirmation from the user, and things involving personal attributes, subjective decisions, or social interactions shouldn't be delegated at all. He pointed out the challenges with double-authorization rules-not every scenario should rely on a third-party platform alone-and suggested the industry work together through negotiation and standards to sort this out gradually.
Phone manufacturers are actively exploring their own paths for AI agents.
In a recent media briefing, Jiang Yuchen, OPPO's director of smart product R&D for ColorOS, told reporters that products like the Doubao phone have had a positive impact on the industry and ecosystem by pushing things forward. But she was clear: "It's not the final form of an AI phone-it's still basically a method of operating the old GUI interfaces."
For OPPO, the choice between GUI-style approaches isn't about ideology-it's first and foremost an engineering and scale problem. Jiang explained that something like Doubao can afford to be more aggressive as an engineering prototype, but big phone makers deal with massive user bases. "If you launch a feature and the next day most services stop working properly, that's a quality incident for us-we can't accept that." At that scale, any unstable system-level capability gets magnified quickly.
In her view, screen-operation-based agents are "somewhat of an intermediate stage." The mainstream path ahead will lean more toward A2A (Agent-to-Agent) collaboration, where AI agents talk directly to each other and work together.
She emphasized that in this evolution, what really matters for phone makers isn't the model size or parameters-it's their deep, long-term understanding of users. "We don't think the large model is the soul of the phone," Jiang said. "We believe 'memory' is the soul. Once your phone truly understands you, it's really hard to switch to something else."
At the seminar, several experts agreed that the real challenge for AI agents isn't just "can it get the job done?"-it's about defining clear boundaries for what they can do and how they're managed.
Yuan Yuan noted that the current GUI wave has created a healthy "catfish effect," shaking up and energizing the whole industry. But she urged China's AI sector not to get stuck on the GUI path. Instead, build on it to find better routes that balance safety and progress. She pointed to Apple's approach as a good example: It sets up open API-based collaboration between agents and apps, while using screen awareness to keep safety boundaries intact and precisely pass user intent to apps, making them smarter at carrying out commands.
Wang Yue, deputy director of the Information Systems Institute at Tsinghua University's Department of Electronic Engineering, sees AI agents as a major turning point-AI systems are starting to directly interact with the outside world. This won't just change how information systems are built; it could reshape economic operations too. He warned that truly disruptive innovation risks weakening system manageability and eroding trust foundations, so we need better authorization mechanisms and A2A checks and balances, eventually moving toward market-driven, competition-based credibility systems.
Overall, the conversation is shifting from whether AI can act on phones to how to do it responsibly and sustainably-and A2A-style collaboration between agents looks like a promising way to get there without breaking everything else.
