A Bit of Personal Research on Tay, Bots, and Microsoft

Ever since Tay was introduced to twitter, and how her equivalent Xiaoice did not meet the same type of reaction in China, there has been a lot of hyperbole from "just a simple chatbot" to "martyred for our sins". This is a bit of my own personal digging around in my off time on the matter, a collation of links, papers on just what Microsoft, and Bing by extension, has been working on in the past few years in the realm of deep learning.

Tay (Official Website)

Tay is an artificial intelligent chat bot developed by Microsoft's Technology and Research and Bing teams to experiment with and conduct research on conversational understanding. Tay is designed to engage and entertain people where they connect with each other online through casual and playful ...

www.tay.ai
Tay (bot) Wikipedia Article

Gives an overview of Tay, how twitter first responded to being able to teach an official Microsoft tool word associations, and the initial aftermath.

en.wikipedia.org
Microsoft Bot Builder SDK (GitHub)

The Microsoft Bot Builder SDK is one of three main components of the Microsoft Bot Framework. The Microsoft Bot Framework provides just what you need to build and connect intelligent bots that interact naturally wherever your users are talking, from text/sms to Skype, Slack, Office 365 ...

github.com
Recent Advances In Deep Learning For Speech Research At Microsoft [PDF] [2013]

Deep learning is becoming a mainstream technology for speech recognition at industrial scale. In this paper, we provide an overview of the work by Microsoft speech researchers since 2009 in this area, focusing on more recent advances which shed light to the basic capabilities and limitations ...

research.microsoft.com
Why an old paper on speech recognition can help understanding of the chatbot

And it has everything to do with a section on language modelling.

In the approach described here, we explore the capability of a neural net to combine and exploit information of diverse types, and apply it to the task of language modeling. This approach has been proposed in dialog systems with a feed-forward net[57]and more recently for recurrent nets[47][37]. In all these approaches the basic idea is to augment the input to the network with information above-and-beyond the immediate word history. In [37], we propose the architecture of Fig. 2, which adds a side-channel of information to the basic recurrent network of [38]. By providing a side-channel consisting of a slowly changing Latent Semantic Analysis (LSA) representation of the preceding text in the Penn Treebank data, we improved perplexity over a Kneser-Ney 5-gram model with a cache from 126 to 110 –to our knowledge the best single-model perplexity reported for this dataset. Interestingly, these gains hold up even after interpolating a standard recurrent netwith a cache model, indicating that the context-dependent recurrent net is indeed exploiting the topic information of the LSA vector, and not just implementing a cache. In subsequent experiments, we have found that this architecture is able to use other sources of information as well; for example, in initial experiments with a voice-search application, conditioning on a latent-semantic analysis representation of a user’s history has reduced the perplexity of a language model from 135 to 122.
Recurrent neural network based language model [PDF] [2010] [citation 38 from above]

Main outcomes: Recurrent neural network language model is probably the simplest language model today. And very likely also the most intelligent.
It has been experimentally proven that RNN LMs can be competitive with backoff LMs that are trained on much more data ...

www.fit.vutbr.cz
Rapid Adaptation for Deep Neural Networks through Multi-Task Learning [PDF] [2015]

With the original senone classification task as the primary task, and adding auxiliary mono-phone/senone-cluster classification as the secondary
tasks, multi-task learning (MTL) is employed to adapt the DNN parameters. With the proposed MTL adaptation framework,we improve the learning ability...

research.microsoft.com
Learning and inference in the presence of corrupted inputs [PDF] [2015]

Consider the following classification setting. Inputs, described by attributes, are to be classified as having one of two labels. For example, an input might represent an employee, the attributes might represent her performance under various measures, and the classification function maps ...

www.jmlr.org
And this quick links to different papers on their past research topics isn't even that deep, and made quick use of different scholar.google features, bouncing between citations for a given paper, or using researcher profiles like here or here, and choosing papers I have the most familiarity with in terms of material.

Needless to say, the work is impressive, and indicative of what I think I have seen elsewhere referred to as "A.I. arms race" that is being pushed by the likes of Microsoft, Google, and Apple, among others. The underlying tech, and its application for the likes of Cortana, Siri, Watson, etc, is extremely impressive, and barely gets the coverage it deserves, unless it is part of a big story, like winning some Go matches, or learning racial epithets from social media.

(And this may have all been spurned by an anonymous forum user going from "What makes Tay tick?" coupled with "I totally have an idea how to improve it." when just my lazy self can put together all this in an evening + afternoon from publicly available sources, and not having to rely on paid journals/sci-hub to delve deeper).