Alexa Skill Idea: MU* Client

Kumakun

I've been doing quite a bit of research into Alexa Skills recently. Interesting stuff! NodeJS is an everyday tech layer for me, so I started digging further. I eventually came across a few choose your own adventure games and thought... Mu* might transfer well to audiobook format!

I automatically wonder if I should use MUX's HTML flag/XML feature, or if Rhost or Penn offer something similar. Penns WebSocket support in enticing. JSON support makes everything so much easier.

I imagine it wouldn't be much different than adding support for screen readers. Admittedly I haven't played much with Alexa yet myself, it just recently popped up during a client interview, and my ADHD took off with it! I wanted to get a bit of community feedback before I fully committed to the idea, or moved on.

I thought It'd be a great interface for the visually impaired.

Thoughts? Suggestions?

surreality

@Kumakun I have no idea about how this would work in any way, but it sounds like a very interesting concept. Anything that aids in accessibility is always going to be a bonus.

Sparks

My only thought is that the last time I toyed with Alexa skills—which, to be fair, was about two years ago—the speech recognition could only do 10 second chunks of audio. Plus, the skills system had a request/response mechanic that meant when Alexa was waiting for your next request, you couldn't have the Echo then interrupt them to start saying something else. (I.e. the text from the MU* server would have to be processed in chunks, and once Alexa was waiting for your input any further text from the server would have to be stored to speak after the user input was done.)

So unless the skills system has changed—which is possible!—I don't really see an efficient way to actually enter longer poses/pages/etc., plus you would have to have a command like "continue" to start speaking the next lines of input once Alexa paused to wait for your own input, if you didn't want to enter a pose or page or whatever. And you'd need to come up with some logic as to when Alexa stopped talking and waited for that input. If there's 14 new lines of input that have come in, do you say all of them? Stop and wait for input after a maximum of 10 lines, then speak the remainder after the user has spoken their input? Stuff like that.

But if there is now a good way around those limitations, it could be really neat for some folks!

Kumakun

@Sparks Wow! Great input! I'm going to have to look into that further myself!

faraday

@Sparks Can Alexa hold onto a connection like a MUSH client or a web browser websocket does? I didn't really think it worked that way and would need a completely asynchronous API. But I confess I haven't really looked into it much so I could be completely wrong.

Sparks

@faraday said in Alexa Skill Idea: MU* Client:

@Sparks Can Alexa hold onto a connection like a MUSH client or a web browser websocket does? I didn't really think it worked that way and would need a completely asynchronous API. But I confess I haven't really looked into it much so I could be completely wrong.

The Alexa skills themselves cannot, no—or at least could not do so two years ago, which was the last time I mucked about with Alexa skills. You would indeed need an asynchronous API.

But it's trivial for a skill to make a remote web connection, so what you'd really have to do would be to implement a backend server which handled the connection to the game on the skill's behalf; when you connected it would send a 'connect to this IP and port' and return a unique session token for you to use, and then all your further operations—'get status and/or pending lines', 'send line to game', 'disconnect', etc.—would just be calls to that web service, which would connect to the game on your behalf. Bundle up the responses into a JSON or XML format and the skill can easily parse them. (Obviously you would want some sort of additional security key, so that merely stealing someone else's session token isn't enough to take over their session.)

It's a bit convoluted, yes, but that part would be doable. It's the actual skill UX that gives me more technical concerns.

faraday

@Sparks Yeah that does sound feasible, but it's still not something that your off-the-shelf Penn/Tiny/Rhost game is going to support. You'd have to have that intermediate server to manage the connections to the various games. That's a bit more involved than just making an Alexa skill to talk to games directly, and it invokes privacy concerns and so forth.

And I share your concern about the skill UX too. Having to say "get me new activity from <game>" over and over is not great UX.

Sparks

@faraday said in Alexa Skill Idea: MU* Client:

And I share your concern about the skill UX too. Having to say "get me new activity from <game>" over and over is not great UX.

Yeah. The Choose-Your-Own-Adventure skills are simpler, because each 'page' of the choose-your-own adventure is a predefined block of text, and each page has a finite number of pre-defined options. So you have a much easier flow.

Request: "play CYOA"
Response: first page of CYOA is read, possible choices are listed, and Alexa now waits for input
Request: makes a choice from the list
Response: next page of CYOA is read, choices are listed, and Alexa now waits for input

And so on. There's a clear and defined request/response cycle; each response is known in full, and there's a clear place to stop and wait for input.

With a MU* client, you don't have any convenient predefined place to transition from 'response' back to waiting for request, nor do you have any way to move on from waiting-for-a-request to the response phase—and thus continue speaking the text from the server—unless the user issues a request, necessitating that the skill has a "no pose yet" or "continue" type request, in addition to a 'send <whatever should be transcribed and sent>' type command.

And this is without getting into transcription issues. "page Faraday Hello" is easy to transcribe from dictation, because Faraday is a known word. But "page Pax Hello." might well transcribe as "page Packs Hello." instead, given that "packs" (as in, "Jim packs the boxes.") is a far more common word than "pax" is. And if you have people with completely unique names that aren't likely to be in a transcription database it's likely to be even more difficult.

Transcription of small notes into Evernote or similar is a much easier task, since if you dictated, "Remember to talk to Pax about Atlantis support for AresMUSH by Tuesday." and it transcribed the note as "Remember to talk to packs about Atlantis support for aeries mush by Tuesday.", you can easily edit that note in Evernote after the fact to correct the mis-transcription.

But transcribing a page or pose for this scenario is one-and-done, you don't have any way to edit it if the transcription turns out wrong—or even to reliably see whether or not it did. Even if you say "page Pax Hello." and it transcribes it as "page packs Hello.", then somewhere in that blob of text the game returns you'll hear "No such player 'packs' found." Which, sadly, still sounds when spoken aloud as "No such player 'Pax' found." (Oh, dear, I vanished from the game!)

This isn't to say it's not an interesting set of problems to potentially solve, just that I think the "how do you design the UX for this in the most usable fashion" question is a much harder one than "how do you implement the backend".

Kumakun

@faraday said in Alexa Skill Idea: MU* Client:

@Sparks Yeah that does sound feasible, but it's still not something that your off-the-shelf Penn/Tiny/Rhost game is going to support. You'd have to have that intermediate server to manage the connections to the various games. That's a bit more involved than just making an Alexa skill to talk to games directly, and it invokes privacy concerns and so forth.

And I share your concern about the skill UX too. Having to say "get me new activity from <game>" over and over is not great UX.

Yeah, the UX is something that I've been really thinking about. It could get hella repetitive, VERY quick. It looks like you can activate a skill and then just use relevant utterances. Interesting. That might cut down some. Plus I think if new poses notified of their existence first, and you told the skill to read it, it could be the audio version of the 'more' command.

I just did a quick search on Alex and WebSockets, and looks like they're actively supported by Alexa/AWS services now. I'll have to look into the limits on data chunk size limits, but I doubt a text pose is going to be greater than what it can handle. Alexa Skills Kit has come a long way in the past few years!

Sparks

@Kumakun said in Alexa Skill Idea: MU* Client:

I just did a quick search on Alex and WebSockets, and looks like they're actively supported by Alexa/AWS services now. I'll have to look into the limits on data chunk size limits, but I doubt a text pose is going to be greater than what it can handle. Alexa Skills Kit has come a long way in the past few years!

AWS has supported WebSockets in the AWS API Gateway service and similar stuff since... I wanna say December of last year? But I didn't remember anything about Alexa in that particular announcement. If Alexa skills have now got a way to use persistent WebSockets to maintain a connection from request to request, that definitely makes it somewhat easier; it used to be you had to use a proxy server for Alexa to do anything with websockets.

(Though every MU* that supports websockets uses a different protocol over that socket; the skills would have to be specific to the server family—like one for PennMUSH, etc.—or else you'd end up using a proxy backend to connect to games anyway. Even if that proxy server used websockets rather than being polled, which makes it marginally less horrible, albeit still a privacy concern for some players since that proxy server could log everything going through it: passwords in the connect command, pages to people, the @mail you read, etc.)

And yeah, I assumed the UX would be utterance based; you would definitely not want to leave the context of the MU* client skill while using it. But even maintaining context, you have that request/response UX cycle to deal with, which is the bigger issue.

And transcription seems the worst bit of all. Leaving aside the 10-second limit on audio transcription, there's the matter of syntax on the average MUSH. Pose and page and look and movement are probably easy enough, but many standard MUSH commands could be absolutely wretched for an interface like this. Imagine trying to tell it how to transcribe +bboard commands to write a post, or to dictate the sort of commands that many WoD games use in their chargen systems.

And I don't even want to think about stuff like Arx's plots or goals system, where the syntax can get particularly convoluted: something like goals/rfp <goal>,<story-beat>=<IC description of goal achivement>/<OOC note to staff about goal achievement> (...which I may even have gotten wrong, because I'm not on the game to check the helpfile right now!) is not the most friendly single-line command syntax to remember even when typing. Trying to dictate that to Alexa could end up being an infuriating experience.

That said, despite all these hassles, it's an interesting project to tackle! If you choose to go forward with it, I wish you luck; it'd be an interesting result to see, and might well be useful to folks out there!

faraday

@Sparks said in Alexa Skill Idea: MU* Client:

That said, despite all these hassles, it's an interesting project to tackle! If you choose to go forward with it, I wish you luck; it'd be an interesting result to see, and might well be useful to folks out there!

Yeah I have concerns about it being feasible from a technical and UX perspective, but I think it's a worthwhile goal to try out.

Kumakun

@Sparks said in Alexa Skill Idea: MU* Client:

@Kumakun said in Alexa Skill Idea: MU* Client:

I just did a quick search on Alex and WebSockets, and looks like they're actively supported by Alexa/AWS services now. I'll have to look into the limits on data chunk size limits, but I doubt a text pose is going to be greater than what it can handle. Alexa Skills Kit has come a long way in the past few years!

AWS has supported WebSockets in the AWS API Gateway service and similar stuff since... I wanna say December of last year? But I didn't remember anything about Alexa in that particular announcement. If Alexa skills have now got a way to use persistent WebSockets to maintain a connection from request to request, that definitely makes it somewhat easier; it used to be you had to use a proxy server for Alexa to do anything with WebSockets.

(Though every MU* that supports websockets uses a different protocol over that socket; the skills would have to be specific to the server family—like one for PennMUSH, etc.—or else you'd end up using a proxy backend to connect to games anyway. Even if that proxy server used websockets rather than being polled, which makes it marginally less horrible, albeit still a privacy concern for some players since that proxy server could log everything going through it: passwords in the connect command, pages to people, the @mail you read, etc.)

And yeah, I assumed the UX would be utterance based; you would definitely not want to leave the context of the MU* client skill while using it. But even maintaining context, you have that request/response UX cycle to deal with, which is the bigger issue.

And transcription seems the worst bit of all. Leaving aside the 10-second limit on audio transcription, there's the matter of syntax on the average MUSH. Pose and page and look and movement are probably easy enough, but many standard MUSH commands could be absolutely wretched for an interface like this. Imagine trying to tell it how to transcribe +bboard commands to write a post, or to dictate the sort of commands that many WoD games use in their chargen systems.

And I don't even want to think about stuff like Arx's plots or goals system, where the syntax can get particularly convoluted: something like goals/rfp <goal>,<story-beat>=<IC description of goal achivement>/<OOC note to staff about goal achievement> (...which I may even have gotten wrong, because I'm not on the game to check the helpfile right now!) is not the most friendly single-line command syntax to remember even when typing. Trying to dictate that to Alexa could end up being an infuriating experience.

That said, despite all these hassles, it's an interesting project to tackle! If you choose to go forward with it, I wish you luck; it'd be an interesting result to see, and might well be useful to folks out there!

I'm really going to have to look into the time limit and connectivity. It's worth researching now though if there's interest. Now to see if the tech can handle it, and the UX isn't a nightmare. I'm going to have to experiment soon. I have a feeling that I'd have to write some custom backend to make it work.. if it will work. That and not being able to recognize 'original' names might be a breaking problem. We'll see!