ChatGPT Shortcut

JAN 3 2023

My primary way of interacting with automated assistance is through Siri. I used a Nest Pod (or whatever they're called now) for a year or two, but found it fell short due to lack of deep integration with my accounts and my devices' ecosystems. Captured within the tender walls of Apple's garden once again...

As of late, I have been utilizing ChatGPT for various tasks in an effort to understand it's innerworkings in a practical, or even emotional, sense. ChatGPT outperforms the other assistants on the majority of tasks that do not depend on liveness nor action. These were the category of tasks for which I wouldn't bother with attempting queries against Siri or Google: they were guaranteed to disappoint.

I wanted to build a RaspberryPi based text-to-speech interface for ChatGPT, but it felt too much like creating an imaginary friend and I wasn't prepared to deal with the social ramifications. Rather I chose to start with a ChatGPT Siri integration.

Shortcuts

There are four ways to integrate custom functionality into Siri's interface:

The first three options are a heavy lift: equivalent to creating an iOS app. With low interest in sinking a week into the prototype, Shortcuts was the only option. Shortcuts enables an Apple Script equivalent GUI programming interface for iOS and MacOS.

Within Shortuts, there are two options for running arbitrary or external code:

"Run Script Over SSH"
"Get contents of URL"

"Run Script Over SSH" is an appealing workflow. I'd be able to do all of my API integration through a normal dev workflow of updating a binary on a server. If my aim was to extensively modify the API requests or maintain a continuous context, this option would likely have been my preferred choice. In the interest of expediency, I stuck with "Get contents of URL".

OpenAI API

The OpenAI API offers a series of models with varying degrees of competence and cost. The docs describing these models can be found here and the full list can be found with the following query:

curl https://api.openai.com/v1/models -H 'Authorization: Bearer <OpenAI API Key>'

text-davinci-003 appears to be the biggest and baddest–presumably what ChatGPT is using.

The API can be queried with the following POST request.

curl https://api.openai.com/v1/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <OpenAI API Key>" \
    -d '{"model": "text-davinci-003", "prompt": "<Prompt text>", "max_tokens": 2048}'

Response for the prompt "Say this is a test":

{
    "id": "cmpl-6Ur8qzP3KBxKunnoYolInCwxZ2H56",
    "object":"text_completion",
    "created":1672812136,
    "model":"text-davinci-003",
    "choices":[{"text":"\n\nThis is indeed a test","index":0,"logprobs":null,"finish_reason":"length"}],
    "usage": {
        "prompt_tokens": 5,
        "completion_tokens": 7,
        "total_tokens": 12
    }
}

"Programming"

I strung together a text input request, URL POST request, and a few dictionary parses and displayed the result.

Here's what I ended up with.

Result

Concluding

Your first two questions might be:

why is the input interface a text box?
why is the output interface a text box?
this isn't what you set out to do!

And you'd be right. During testing, I realized Shortcuts was ill-equipped to give reasonable UX for either of these features.

Input: You can add input speech only through the "Dictation" widget. Dictation TTS is highly error prone. ChatGPT doesn't handle malformed input well.

Output: When using ChatGPT you want quick cycle-speed to stop and adjust if the response trends down the wrong path or you've hit OpenAI prompt moderation. 200 word responses take too long on Siri text-to-speech, with no look-ahead, thus a visual format was used.

To conclude this experiment: A Shortcuts version of ChatGPT is strictly worse than a homescreen bookmark to chat.openai.com. I'm excited to see the formation of novel interfaces around modern LLMs. I suspect they'll bear minimal resemblance to their predecessors.