computational media

Using Amazon Lex in Graduate MA Course

John T. Murray

Aug 9, 2021 • 2 min read

The following assignment is from my Computational Media course, a Digital Media MA course that takes place during the second year in the UCF Digital Media MA program. The goal of the course is to introduce contemporary machine learning and computational media through projects and discussions, with a focus on critiquing use in media work and understanding the implications for the technology in society. There are three creative projects, each with a focus on either Text (GPT-2), Audio (this assignment) or Visuals (Generative Adversarial Networks). This is the version of the assignment from Spring 2020, when the course was offered online due to COVID-19, and will likely be revised further for Spring 2021.

Each project follows an exercise where students are led through a particular tool or service, and a critical paper on the topic examining existing works that use the technique or modality.

Voice & Audio Project

The second creative project will involve one of three possible project genres: voice synthesis from text generation (Tracery or GPT-2), or a creative conversational interface (text or speech, Amazon Lex), or interactive sounds accompanying interactive text or graphics

As before, the project should be created in a GitHub repository that has been shared with the instructor. You can submit the URL here.

The repository should contain a "readme.md" consisting of a correctly formatted markdown file. The file should contain:

A summary of the work, around 50-100 words.
Instructions for running the project, including any downloads or libraries.
Selected sample outputs (videos or transcripts, if a conversation agent)
An artist statement that describes any previous work that the project either directly builds on or is otherwise influenced by.

The project itself may be developed completely in Google Collab using .ipynb files saved to your GitHub repository, or on Amazon Lex (using the sample code provided to send text to the bot from a web interface or P5.js) with the bot and intents saved as an export in the git repo (along with any Amazon lambda functions)

OPTION A: Voice Synthesis

Minimum requirements:

May use browser-based voice synthesis (OR recognition) or API-based voice synthesis (OR recognition) (recorded, not live online).
Should include continuous production of novel text from a generator (either pre-generated GPT-2 or online generation from Tracery or similar)

OPTION B: Conversation Agent

Should include at least 5 intents. These intents should be thematically unified. They do not have to connect to one another but they should.
You may not reuse an intent used in the related exercise, but may modify it and create your own.
The agent may be either artistic or practical.

OPTION C: Synthesized Sound

You must modify the sounds at runtime, or produce them at random times or based on events generated by the user.
May use p5.js, or any platform/tool of your choice.

For full credit, the work should include a theme or commentary that goes beyond the respective simple exercise.

OPTION A: Voice Synthesis

OPTION B: Conversation Agent

OPTION C: Synthesized Sound

Sign up for more like this.