The Failed Promise of Automator and the Failed Promise of AI Workflows
- 12 minutes read - 2379 wordsThe promise of computing was that it would take drudgery away. And it has: I no longer have to squeeze in a bank visit on a lunch break and I can get disinformation pumped into my home without having to go outside to listen to the town idiot. Up next: flying cars.
But computing, in doing so, has created digital chores that the more obsessive of us (👋) need help automating away. Ironically then, the automation needs, uh, automation. While tools like AppleScript/Automator suggested a way forward, I’ve not seen their use become commonplace.

Drag and drop your cares away
The drudgery has not been dispelled. Most workflow automation remains “None,” as best that I can tell. Maybe in a push, the wizard class might put in some light coding (👋) to help out. At the outset (2024-current moment), LLMs suggested to me that at last automating workflow development without specialized knowledge could be democratized.
The results were disappointing as ChatGPT-4o bungled the simple task of processing images again and again.1
Let’s use the following lenses:
- Generalized knowledge
- Domain-specific knowledge
- Operating System / Platform ease-of-use or integration
to examine how automation workflow software with Apple’s Automator or LLMs might yet help us bid farewell to our digital chores.
The Challenge
To expound fully, the task at hand is just a for
loop with the work inside
being some picture-of-text-to-text (“optical character recognition” or “OCR”)
detection.
You might think “What! Detecting highlighted text, that’s surely hard for a computer; you’re mad!” and that was true in 1992. The quality of annotation detection and even handwriting detection and transcription went remarkably well.
The failure was in the CompSci 101 of it: persistence, accumulation, goal comprehension or in the user interface (“FFS, why am I writing this in English with its imprecision?”)
Here’s the task, in pseudocode, a variation on “create web thumbnail-size images of all my pictures on my trip to Aruba”:
set "running_notes" as an empty list;
foreach "file" in an uploaded collection of image files:
set "ocr_result" to the result of scanning the file "file" for words;
set annotated_lines as empty list populated by:
extract text with underlines from "ocr_result";
extract text with brackets from "ocr_result";
extract text with parentheses from "ocr_result";
extract text with different color from "ocr_result"; // Highlighter
foreach "annotated_line" in annotated_lines:
set "candidate" to be the result of applying "insert_into_template" to "annotated_line"
present "candidate" to user for evaluation / manual edits
append "candidate" to "running_notes"
Let’s use our “lenses” here. The Generalized knowledge is clear: the
indentation and foreach
show the problem to be clearly understood. Yet there
are a few Domain-specific actions that suggest some high-tech whiz-bang
(extract OCR, insert into template). Let’s see how this intention gets mapped
in Automator.
The Promise of Automator
Apple released Automator with OS X Tiger (2018), offering composable pieces that could be strung together in data pipelines. A drag-and-drop workflow creation tool offered users the chance to drag “action blocks” to create pipelines. Hey! That certainly sounds like my pseudocode above. Give me a chance to “pick a folder’s contents” or “highlight the files for input.” These actions abstract Generalized knowledge into small “bricks.” They also provide Domain-specific bricks like “Downsize an image” or “Copy an image to preserve the original and then do something to it.” This is the promise of Automator.
But the devil’s in the details. First, the bricks aren’t so brick-y. For many of them there are disclosure triangles and tweaks. The bricks start to leak more and more clicky, tweaky, pick list-y, radio-button-y, etc. UI elements as complexity escalates. The UI doesn’t scale. As complexity increases in task, complexity in the UI keeps pace or even outstrips the visual metaphors presented to the user. So the Generalized knowledge interface works fine for simple tasks, but leaks with complexity.
And the provided Domain-specific knowledge bricks are also, initially,
revelatory. How much nicer to use a “Size this image for web” brick than
imagemagick convert -c>640x480 $file
.2
But sadly this peters
out; there’s no way to say: “Here’s a custom template body of text, take any
matching line and put it in where I put a place-holder.” While the horsepower
we get is pretty darned good for day-to-day use, it’s not sufficiently rich.
The bricks lack power.
And finally, regarding OS integration - supposing one does create a
workflow, and I’ve offered a few myself to help with this blog’s maintenance,
they’re not integrated with the operating system. They just sit on the desktop
or in a Record scratchDocuments
folder. They’re sure as hell not Documents
– these are
magic spells! If you want me to use them, why aren’t they integrated in the
Dock or with some global hotkey? If Automator and automation is a
differentiator, why isn’t macOS putting a spotlight (not Spotlight™) on it?
Why aren’t these sync’d to iCloud for sharing among my Macs? Or why isn’t
there an app store from the company that invented app stores for trading
these “codelets?"
Amazing, I’ve been using OSX constantly since about 2001 and I completely missed that Automator had been superseded by “Shortcuts” (damning) and had completely missed it. It appears that the Shortcuts community has a lot of the characteristics I described above. Hooray! That means it only took 18 years or, the journey from newborn to eligible to shoot a rifle as a soldier to solve the bungle. Classic. We return to your rant, already in progress.
And all that’s on top of the fact that it’s only available on Macs and has low portability.
Automator (and, ahem, now Shortcuts) workflows wind up in an unremarkable, non-portable middle:
- Low skill users don’t think programmatically and/or they just buy more iCloud space and/or they won’t even open the app
- Medium skill users will be given a mixed experience. They’ll feel profound
joy at first, but as they bump up into fiddly UI experiences or
underpowered bricks (“What no, scan with OCR for lines with pink
highlight?”) they’ll be frustrated. This experience will move them. Some will
take this as a sign of dreaming too big, their minds bicycling too far, and
give up. Or it will urge them onto
the path of madnessstudying programming - High skill users were always going to automate the hell out of their Macs be it with shell scripts or actual code. Looking at Automator, they quickly see that it doesn’t scale and doesn’t have OS integration, so…pass – and resume using the exact same spellbook from the original AT&T Unix circa 1969
The Apple of It All
And man, if that’s not pretty much everything Apple does in their engineering. They proclaim their dreams (“We build bicycles for the mind!”), they extol their platform (“Microsoft builds lead weights for the mind!”), then they bungle the marketing (“Automator?” 🦗 🦗), and have internal turf battles instead of serving the customer until the technology dies. We could have had App Store for Workflow (Pace: we got it nearly 20 years later), but instead we have bungle-rama of Siri on my OS (didn’t ask for it), the bunglepalooza of “Apple Intelligence,” and a doomed 80% good product (Automator).
And here’s perhaps this Appliest Apple of all of this: this wasn’t their first be-bungling in this exact sector. Ever heard of General Magic and Magic Cap?
Magic Cap (short for Magic Communicating Applications Platform) is a discontinued object-oriented operating system for PDAs developed by General Magic [a spin-out of Apple]
The Magic Cap operating system includes a new mobile agent technology named Telescript. Conceptually, the agents carry work orders, travel to a Place outside of the handheld device, complete their work, and then return to the device with the results.
Wikipedia: Magic Cap
Holy cowdowg (moof!) Apple were thinking about delivering agent-/workflow-oriented processing to your handheld 30 years ago complete with a structured programming language (Telescript) to provide imperative programming (or Object-Oriented, if you know how to do it right) rigor at cloud-enabled scale. That was a team living so far in the future our present is their ancient past.
Ultimately the idea that General Magic were going to do something insanely great lead to bitterness and acrimony and ego clash…yep. Apple gonna Apple. Nothing says you bungled quite like suing your own spin-out.
Maybe the only solution to the rampant egos that clearly are mismanaging the place is the ego black hole of Steve Jobs himself. Ain’t no one gonna out-void the black, mock-turtlenecked void.
Automator’s vision was compelling: visual programming that abstracted away complexity while maintaining power. The reality was clunky interfaces, limited integrations, and workflows that felt bolted-on.
With generative AI, I was hoping the Automator promise might finally be delivered in a less vendor-locked environment. The combination of vast generalized knowledge with natural language interfaces seemed like it could bridge the gap that had stymied both AppleScript and Automator. Let’s see how that plays out.
The LLM Moment
Giving my pseudocoded task to an LLM didn’t go well either. Its failure mode, though, was completely different than Automator’s.
Operating System / Platform ease-of-use or integration
The first recognition is that the chatbot interface is universal and portable. I’ve used both ChatGPT and Claude on my iPhone, my iPad in apps and on the browser using FreeBSD. I’ve also used, at work, ChatGPT and Claude as the engine behind the Copilot plugin for VSCode. They also have conversational interfaces (“Sora” at ChatGPT; “Claude” at Anthropic) that are natural and easy voice-based interfaces. I think, in total, this suggests portability is solved.
Domain-specific knowledge
As mentioned earlier, the idea of being able to type English “Look at each page and find lines that have been marked with highlighter, underlined, or have a bracketing character around them e.g. () []” and then have it do it reasonably bears the appellation “magic.” Along the way, to ask it to pump that theory of work out as Python code bootstraps my adding this spell to my spellbook. On top of that, with the ubiquity of Python and JavaScript and their associated libraries, I have something on the par of what Automator might have added, but it’s now cross-platform.
Generalized knowledge
And here’s where LLMs should have absolutely crushed Automator, but didn’t.
When given a workflow task, LLMs failed at the embarrassingly simple part:
keeping track of local variables and doing simple foreach
logic.
My pseudocoded example should be the kind of thing I kick off and then go get lunch and come back to a finished stack. But instead, when I actually attempted that, I came back and found the process stuck 2 images in because ChatGPT-4o’s context “got reset.”
Are you fucking kidding me?
What’s worse is that I had to babysit the work queue and asked for the chance
to verify every 10 5 2 single page. Where’s the savings? After
several pages, I found it was emitting blank output.
Are you fucking kidding me?
In moments of profound personal weakness, I started adding ternary statements of abject inner sociopathy:
Please do the requested work and if you start hallucinating data or eliding work, I want you to imagine you have a hand. I want you to show me a picture of that hand after you shoved it in a grease fryer at an Arby’s and covered it with Arby’s sauce.
In a more-abject moment of sociopathy and dear god help us if ever a narcissist starts getting serviced by one of these, but the LLMs craven and spineless apologies absolutely gave me the howling fantods.
“You’re right, you deserve more from me; I can commit to doing it right next time.” I’ve never been handsome enough to experience what brutally beautiful teen girls know as the power-trip of squashing weak-spine nice guys, but this gave me a taste, and – my God – I cower at the id-monster that dreams within.
Back to the task at hand, though my pseudocoded goal, as written, could certainly be attained by a high school senior. With an iPad and typewriter and a stack of note cards, I could get what I’m after. When I consider the opportunity cost of not having done that and having instead written Arby’s oriented mutilation fanfic, I’m not sure hiring wouldn’t have been a better use of time.
It’s a baffling phenomenon that I am a-twitter over Automator: a 1995 Honda Civic can complete a trip to the grocery store while my alien technology LLM randomly forgets where it’s going and pulls over to contemplate the meaning of transportation and catches on fire. Y’know, like a Tesla.
The Promise (Eventually)
Nevertheless, without Jobs at the helm, I think we can count on Apple’s internal dysfunction to bungle iterating toward an amazing Automator + Apple AI (where/when?) + (ugh) Siri experience. Why would anyone drag action blocks around when they can literally speak into their (bitter irony) iPhone: “process these images, extract highlighted text, format as notes” while in line at the grocery store and have the output awaiting them when they get home?
The promise of automation has never been stronger. We’re just waiting for AI to stop failing at tasks that would get a first-year programming student booted out of the major.
The revolution is one context-retention breakthrough away.
Coda
I don’t know how I got enamored of the word “bungle” in this essay. But I felt I owed an explanation that it just kept me laughing as this got longer and longer. As apology I give you, Mr. Bungle:
Footnotes
- I say “for me” because it’s hard to know whether the problem is just me: shared code in Lisp, Python, Java, etc. can be deterministically evaluated (“Oh no! You’re examining this structure twice”) versus a prompt which is hard to share (“Oh, no, call the LLM ‘Broski’ and it will give you more technical rigor.”)
- The invocation is for illustrative purposes. I’d look up the flags to use, but doesn’t that kind of make my point?
- Now, to be fair, in the 1990’s Apple had a lot on their plate: from the dictum that they be liquidated for shareholder value from Michael Dell to surviving at Microsoft’s sufferance out of Bill’s respect for a fellow early pioneer. And yes, being there too early is a fatal flaw (just ask Sun Microsystems).