Text this: Context-driven Audio Input and Output Control