+18 18 0
Published 8 years ago by Project2501 with 0 Comments

A Bit of Personal Research on Tay, Bots, and Microsoft

Ever since Tay was introduced to twitter, and how her equivalent Xiaoice did not meet the same type of reaction in China, there has been a lot of hyperbole from "just a simple chatbot" to "martyred for our sins". This is a bit of my own personal digging around in my off time on the matter, a collation of links, papers on just what Microsoft, and Bing by extension, has been working on in the past few years in the realm of deep learning.

  • Why an old paper on speech recognition can help understanding of the chatbot

    And it has everything to do with a section on language modelling.

    In the approach described here, we explore the capability of a neural net to combine and exploit information of diverse types, and apply it to the task of language modeling. This approach has been proposed in dialog systems with a feed-forward net[57]and more recently for recurrent nets[47][37]. In all these approaches the basic idea is to augment the input to the network with information above-and-beyond the immediate word history. In [37], we propose the architecture of Fig. 2, which adds a side-channel of information to the basic recurrent network of [38]. By providing a side-channel consisting of a slowly changing Latent Semantic Analysis (LSA) representation of the preceding text in the Penn Treebank data, we improved perplexity over a Kneser-Ney 5-gram model with a cache from 126 to 110 –to our knowledge the best single-model perplexity reported for this dataset. Interestingly, these gains hold up even after interpolating a standard recurrent netwith a cache model, indicating that the context-dependent recurrent net is indeed exploiting the topic information of the LSA vector, and not just implementing a cache. In subsequent experiments, we have found that this architecture is able to use other sources of information as well; for example, in initial experiments with a voice-search application, conditioning on a latent-semantic analysis representation of a user’s history has reduced the perplexity of a language model from 135 to 122.

  • And this quick links to different papers on their past research topics isn't even that deep, and made quick use of different scholar.google features, bouncing between citations for a given paper, or using researcher profiles like here or here, and choosing papers I have the most familiarity with in terms of material.

    Needless to say, the work is impressive, and indicative of what I think I have seen elsewhere referred to as "A.I. arms race" that is being pushed by the likes of Microsoft, Google, and Apple, among others. The underlying tech, and its application for the likes of Cortana, Siri, Watson, etc, is extremely impressive, and barely gets the coverage it deserves, unless it is part of a big story, like winning some Go matches, or learning racial epithets from social media.

    (And this may have all been spurned by an anonymous forum user going from "What makes Tay tick?" coupled with "I totally have an idea how to improve it." when just my lazy self can put together all this in an evening + afternoon from publicly available sources, and not having to rely on paid journals/sci-hub to delve deeper).

 
Additional Contributions:

Join the Discussion

  • Auto Tier
  • All
  • 1
  • 2
  • 3
Post Comment

Here are some other snaps you may like...