郭立 (leeguoo)

# Desktop Apps Are Actually Web Pages: Once You Realize This, Operating Them Is No Longer “Desktop Automation”

The first time I connected chrome-use to the Slack desktop app and got a snapshot full of named buttons, I was taken aback. Later it clicked: the Slack desktop app is a web page to begin with. An Electron app is the Chromium rendering engine wrapped in a native shell, with a clearly structured DOM inside. This article explains why that turns the traditionally messy work of “desktop automation” into operating a web page, and one trade-off you need to know compared with browser scenarios.

Jul 2, 2026 · Posts · Public · Article

ON THIS PAGE

The first time I connected chrome-use to the Slack desktop app and snapshot returned a bunch of named buttons, I was taken aback: this was no different from operating a web page. Later it clicked, because the Slack desktop app is a web page to begin with.

An Electron app is a Chromium rendering engine plus a layer of Node.js shell. The Slack, VS Code, Discord, Figma, and Notion you think of as “desktop software” all have interfaces rendered by Chromium as HTML, just wrapped in a native window shell. Once you accept this fact, something that was originally very hard suddenly collapses into something simple: operating a desktop app degenerates into operating a web page.

The shell of Electron apps such as Slack and VS Code is lifted to reveal that they are actually Chromium web pages inside; chrome-use connects through the CDP remote debugging port and locates buttons using @e1/@e2 element references rather than screenshot coordinates

How Awkward Desktop Automation Used to Be

First, let’s talk about how hard it was to make a program operate a desktop app before this path existed.

One approach was to take screenshots, find the pixel coordinates of a button, and click there. Change the resolution, switch the theme, or move the window, and the coordinates are all wrong. More importantly, the program does not “understand” what is on the interface; it only sees an image. Another approach was to use system accessibility APIs, such as Accessibility on macOS or UIAutomation on Windows. These can retrieve the control tree, but each platform has its own API, and the information Electron apps expose to these APIs is often incomplete, because their “controls” are actually divs, not native controls.

Both approaches are tiring, and both work around one fact: inside the app there is clearly a well-structured, semantic DOM, yet you are trying to reach it from the outside through pixels or through a translation layer.

Chromium Comes With a Door

Because the rendering engine is Chromium, Electron apps naturally speak Chrome DevTools Protocol. Start one with --remote-debugging-port, and it opens a CDP port, directly exposing the internal DOM:

$ bash
open -a "Slack" --args --remote-debugging-port=9222
chrome-use connect 9222
chrome-use snapshot -i

Just give each app a non-conflicting port. The documentation site has cross-platform launch commands, so I won’t expand on them here.

What snapshot returns is the accessibility view of that DOM: every button and input field is a named element, located by @ref. You are not guessing that “the Send button is probably around those pixels in the lower right.” You get the actual button "Send" itself. Moving the window or changing the theme does not matter, because you are bound to the element, not coordinates. For an agent, this difference is fundamental: it can now understand what this desktop app looks like, instead of blindly clicking at a screenshot.

One Trade-Off You Need to Know

There is one difference from browser scenarios worth calling out separately, because it is easy to take for granted.

When operating your everyday Chrome, chrome-use uses a browser extension plus native messaging. It does not touch the debugging port, so there is no “allow remote debugging” prompt and no protocol-level traces such as Runtime.enable. But Electron apps cannot install extensions from the Chrome Web Store, so this path can only use --remote-debugging-port. In other words, the “zero trace” advantages available in browser scenarios are not available here. The prerequisite is that you open a debugging port yourself and connect to it.

For automating “your own apps” such as Slack or VS Code, this is not a problem at all: you are not trying to fool anyone; you are automating tools you are already logged into. But if you come to Electron with the browser-world expectation of being “undetectable,” you need to adjust that expectation first: this is “connecting into your own app,” not “invisibly operating someone else’s website.”

An Interesting Detail: Apps Can Contain Apps

Electron apps often embed <webview> internally. For example, a third-party app panel opened inside Slack is essentially another independent web page. In CDP, these are separate targets. After connecting, run chrome-use tab first, and you will see the main window, settings window, and those webviews each appear as their own target. You can switch into each one separately to operate it.

So an Electron app is not “one web page”; it is “a stack of web pages inside a native shell.” Once you understand this, scenarios that used to be headache-inducing, such as multiple windows and nested panels, simply become a matter of switching to the correct target.

Boundaries

This approach only applies to apps based on the Chromium engine. Pure native apps written with Swift/AppKit do not have a CDP port and cannot be connected to this way. For those, you still need to go back to system accessibility, or use another tool. For native apps on iPhone, I use a different stack.

The judgment is simple: desktop apps such as Slack, VS Code, Discord, Figma, Notion, and Spotify, which run one codebase across multiple platforms, are basically all Electron.

After realizing that “desktop apps are actually web pages,” my first reaction now when I see a desktop tool is to confirm whether it is Electron. If it is, operating it is no longer the kind of “desktop automation” work I have always found messy, but the familiar web-opening approach I already know well.

← previous
Getting Web Data for Agents: I Tried Three Clumsy Methods and Kept Only One
next →
Anti-Bot Detection Counts How Many Times You Lie: Why Real Browsers Are Naturally Flawless

Comments

Replies are public immediately and may be moderated for policy violations.

Max 1000 characters.