XRCE - Seminars

Universal, unsupervised, and understandable deep image representations

ezp@xrce.xerox.com (Irene MAXWELL) — Tue, 13 Jun 2017 09:57:10 +0000

Monday, June 26th, 11am

Speaker: Andrea Vedaldi , associate professor at University of Oxford, U.K.

Abstract: Modern deep neural networks have taken computer vision by storm. In this talk, I will demonstrate a few applications of large scale neural nets to problems such as object recognition and text spotting. I will also discuss one of the main difficulties in using these models in new applications, namely the need of providing very large datasets of annotated images. I will show how this problem can be alleviated by the use of synthetic data. Then, I will discuss some of our recent research at the core of this technology, aimed at solving some of these problems at a more fundamental level. I will focus on three areas: unsupervised representations, universal representations, and understanding representations.

Learning novel spatio-temporal representations with and without language

ezp@xrce.xerox.com (Claudia HEYER) — Tue, 13 Jun 2017 09:08:39 +0000

Thursday, June 22nd, 11am

Speaker: Efstratios Gavves , assistant professor at University of Amsterdam, The Netherlands

Abstract: In recent years rethinking video-related tasks and their optimal representations have attracted a significant amount of attention, including but not limited to action and event recognition, visual object tracking and spatio-temporal localization. In this talk, I will present our CVPR 2017 accepted work on spatio-temporal representations with and without the use of language.

I first discuss our latest work on rethinking visual object tracking. The basic assumption by visual trackers in the last 30 years was the presence of a bounding box in the first frame. Rather than specifying the target in the first frame of a video by a bounding box, we propose to track the object based on a natural language specification of the target. Tracking by natural language specification not only allows for a more natural human-machine interaction but also improves tracking results. Most importantly, it allows for novel tracking scenarios, for instance, live-streaming tracking or multiple video simultaneous tracking.

Next, I focus on event detection. By definition, events are a complex combination of actions, which can be non-trivially described textually in significant detail, e.g. as in WikiHow . In this work, we show that such external databases that textually describe events allow for improved representation learning. By casting novel events via their textual description as a mixture of known events, we can retrieve relevant videos even with zero-exemplars at training time.

Last, I present a new self-supervised methodology specialized for video, relying on a novel auxiliary task, called odd-one-out learning. In odd-one-out learning, the machine is asked to identify the unrelated or odd element from a set of otherwise related elements. Adapting this to video, we sample sub-videos with the correct and the wrong (odd) order of frames. Our learning machine is implemented as a multi-stream convolutional neural network, which is learned end-to-end and generalizes to other related tasks such as action recognition.

Technical mediation of social cognition: minimalist experimental paradigm of perceptual interactions

ezp@xrce.xerox.com (Claudia HEYER) — Sun, 04 Jun 2017 20:46:51 +0000

Monday, June 12th, 11am.

Speaker: Charles Lenay, Université de technologie de Compiègne, France

After presenting the context, issues and ambitions of a technological research in the Humanities as promoted by the COSTECH UTC team, I will give some examples of research in cognitive science. These studies, both scientific, technological and philosophical on perceptual supplementation systems are in line with the so-called "4E" approaches in cognitive sciences (Enactive, Embodied, Embedded, Extended). It is a question of constructing very simple experimental situations to highlight the dynamics of interactions that constitute the skills of social cognition as the "recognition of others", the "perception of her intentions" or "imitation of her facial expressions".

Evolving dialogue with machines

ezp@xrce.xerox.com (Irene MAXWELL) — Sun, 04 Jun 2017 20:44:06 +0000

Thursday, June 1st at 11am.

Speaker: Milica Gasic, lecturer, University of Cambridge, UK.

Abstract:

In the last decade we have witnessed machine learning trigger a revolution in dialogue research. Using a variety of reinforcement and supervised learning methods and innovative architectures, we can now build fully data-driven dialogue systems. These techniques are part of a user-in-the-loop framework, where systems can be deployed quickly, handle speech recognition errors gracefully and learn continuously from interaction with real users.

Methods based on Gaussian processes are particularly effective as they enable good models to be estimated from limited training data. Furthermore, they provide an explicit estimate of uncertainty, which is particularly useful for reinforcement learning. This talk explores the additional steps that are necessary to extend these methods to support adaptation to different dialogue domains and more data efficient learning, an important step for scaling up and building evolving systems.

The final part of the talk will focus on the evolution of the next generation of spoken dialogue systems. These systems will need to operate on large and dynamic domains and, more importantly, be capable of conducting rich and natural interaction. We will present a research roadmap towards this goal. A typical application where such a level of complexity is needed is a mental health application that we will discuss to illustrate the need for such research

Sensitivity classification for assisting digital sensitivity review of government documents

ezp@xrce.xerox.com (Irene MAXWELL) — Sun, 04 Jun 2017 20:38:02 +0000

Wednesday, May 31st at 11am.

Speaker, Graham McDonald , doctoral candidate, University of Glasgow, U.K.

Freedom of Information (FOI) laws legislate that government documents should be opened to the public. However, many government documents contain sensitive information, such as personal information or information that would be likely to damage the international relations of countries if it were to be released. Therefore, sensitive information is exempt from released through FOI and all government documents must be sensitivity reviewed to ensure that any such sensitivities are not released to the public. With the emergence of born-digital government documents in recent decades, traditional (paper-based) sensitivity review processes are no longer viable, and there is a timely need for automatic sensitivity classification techniques to assist the sensitivity review of born-digital documents. In this talk, I will begin by providing an overview of some of the main characteristics of sensitive information and the associated challenges for automatic sensitivity classification, before presenting some of our recent work developing sensitivity classifiers that make use of syntactic evidence from sequences of Parts-of-Speech, and semantic evidence derived from word embeddings, to effectively classify personal information and international relations sensitivities in government documents.

Towards joint understanding of images and language

ezp@xrce.xerox.com (Claudia HEYER) — Wed, 24 May 2017 09:41:46 +0000

Monday, 29th May at 2:00PM

Speaker: Svetlana Lazebnik, associate professor at University of Illinois at Urbana-Champaign, Urbana, IL, U.S.A

Abstract: Numerous real-world tasks can benefit from practical systems that can identify objects in scenes based on language and understand language grounded in visual context. This presentation will focus on my group's recent work on developing systems for jointly modeling images and language. I will talk about neural models for learning cross-modal embeddings for text-to-image and image-to-text search, and about the challenging task of grounding or localizing of textual mentions of entities in an image. Finally, I will discuss applications of our models to automatic image description and visual question answering.

Neural belief tracker: data-driven dialogue state tracking using semantically specialised vector spaces

ezp@xrce.xerox.com (Claudia HEYER) — Thu, 04 May 2017 12:51:00 +0000

18^th May, 2017

Speaker: Nikola Mrkšić ,doctoral candidate, University of Cambridge, Cambridge, U.K.

Abstract: One of the core components of modern spoken dialogue systems is the belief tracker, which estimates the user's goal at every step of the dialogue. However, most current approaches have difficulty scaling to larger, more complex dialogue domains. This is due to their dependency on either: a) Spoken Language Understanding models that require large amounts of annotated training data; or b) hand-crafted lexicons for capturing some of the linguistic variation in users' language. We propose a novel Neural Belief Tracking (NBT) framework which overcomes these problems by building on recent advances in representation learning. NBT models reason over pre-trained, semantically specialised word vectors, learning to compose them into distributed representations of user utterances and dialogue context. Our evaluation on two datasets shows that this approach surpasses past limitations, matching the performance of state-of-the-art models which rely on hand-crafted semantic lexicons and outperforming them when such lexicons are not provided. Finally, we will discuss how the properties of underlying vector spaces impact model performance, and how the fact that the proposed model operates purely over word vectors allows immediate deployment of belief tracking models for other languages.

E2D2: Episodic exploration for deep deterministic policies for Starcraft

ezp@xrce.xerox.com (Claudia HEYER) — Thu, 20 Apr 2017 12:20:48 +0000

28^th April, 2017 11:00 AM

Speaker: Gabriel Synnaeve , research scientist at Facebook AI Research, New-York, NY, U.S.A.

Abstract: We consider scenarios from the real-time strategy game StarCraft as new benchmarks for reinforcement learning algorithms. We propose micromanagement tasks, which present the problem of the short-term, low-level control of army members during a battle. From a reinforcement learning point of view, these scenarios are challenging because the state-action space is very large, and because there is no obvious feature representation for the state-action evaluation function. We describe our approach to tackle the micromanagement scenarios with deep neural network controllers from raw state features given by the game engine. In addition, we present a heuristic reinforcement learning algorithm which combines direct exploration in the policy space and backpropagation. This algorithm allows for the collection of traces for learning using deterministic policies, which appears much more efficient than, for example, ε-greedy exploration. Experiments show that with this algorithm, we successfully learn non-trivial strategies for scenarios with armies of up to 15 agents, where both Q-learning and REINFORCE struggle.

Computer Vision @ Scale

ezp@xrce.xerox.com (Claudia HEYER) — Thu, 20 Apr 2017 12:17:21 +0000

27^th April, 2017 11:00 AM

Speaker: Manohar Paluri , research lead at Facebook AI Research, Menlo Park, CA, U.S.A.

Abstract: Over the past 5 years the community has made significant strides in the field of Computer Vision. Thanks to large scale datasets, specialized computing in form of GPUs and many breakthroughs in modeling better ConvNet architectures, Computer Vision systems in the wild at scale are becoming a reality. At Facebook AI Research, we want to embark on the journey of making breakthroughs in the field of AI and using them for the benefit of connecting people and helping remove barriers for communication. In that regard, Computer Vision plays a significant role as the media content coming to Facebook is ever increasing and building models that understand this content is crucial in achieving our mission of connecting everyone. In this talk I will gloss over how we think about problems related to Computer Vision at Facebook and touch various aspects related to supervised, semi-supervised, unsupervised learning. I will jump between various research efforts involving representation learning. I will highlight some large-scale applications that use the technology and talk about limitations of current systems.

Massive online analytics for the Internet of Things

ezp@xrce.xerox.com (Claudia HEYER) — Wed, 08 Mar 2017 15:43:02 +0000

Speaker: Albert Bifet , professor at Telecom-Paristech, Paris, France
Abstract: Big Data and the Internet of Things (IoT) have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream mining, and I will introduce some popular open source tools for data stream mining.