Approaching Active Learning as a Reinforcement Learning Problem

Jump to: navigation, search

There are situations in which unlabeled data is abundant but labeling data is expensive. In such a scenario the learning algorithm can actively query the user for labels. This type of iterative supervised learning is called active learning. Since the learner chooses the examples, the number of examples to learn a concept can often be much lower than the number required in normal supervised learning. With this approach there is a risk that the algorithm might focus on unimportant or even invalid examples.



Motivation : A major problem with machine learning approaches is the high cost of collecting labeled examples. Most of the time it is easy to amass vast quantities of unlabeled data (images and videos off the web, speech signals from microphone recordings, and so on) but costly to obtain their labels. Active learning seeks to make efficient use of the labeler's time by querying for the most informative samples which can quickly discriminate the rest of the data accurately. So finding the data sample to query for (either incrementally or collectively) is the main aim of an Active Learning framework.

Current Work : Recent research in Active learning are in two main directions. One way is to do efficient search through the hypothesis space . We proceed by selecting the sample which can reduce the current search space. The second direction is in exploiting the cluster structure in data.

Our Approach : In this context we are trying to model Active Learning as a Reinforcement Learning Problem, where our goal is to find the next unlabeled data to query. A Reinforcement Learning Agent tries to find actions in an environment so as to maximize some notion of longĀ­term reward. In other words, it attempt to find a policy that maps states of the world to the actions to be taken in those states. So here the action the agent has to learn is to pick the next sample data. We are trying to model this problem considering the informativeness of this sample as a form of reward and current set of labeled and unlabeled data as the observation. But since answering each query is a costly operation, we should also try to decrease the number of queries that it makes. Currently, there is no fixed way of deciding when to stop requesting for more labels. So we also try to find a measure that can reveal the adequateness of the current set of labeled samples, that can help us to make a decision as to when to stop requesting for more labels.

Minutes of Meetings

January 13, 2009

Discussion in the Meeting :

  • Evaluation methods of a classifier- Either use a validation set or find the expected number of labels required for learning the concept
  • Active learning for multiclass classifications problems that has fixed distribution for the input data can be done by RL techniques wherein the informitiveness measures(margin, least confidence, entropy measure, similarity to the labelled data etc.) form the state.

Progress after the meeting:

  • Read Literature survey by Burr Settles [1]
  • Reading the paper "An Analysis of Active Learning Strategies for Sequence Labeling Tasks"[2] and other works of Settles.

January 12, 2009

Discussed the scope of POMDPs

November 17, 2008

Discussion in the Meeting :

  • POMDP representation for the AL problem

Progress after the meeting:

November 03, 2008

Discussion in the Meeting :

  • discussed the significance of taking some exploratory moves in sample selection.
  • Simulated Annealing - Find initial temperature from the labelled data and update temperature based on the smoothness of unlabelled data
  • online AL with multiple classifiers

Progress After the Meeting :

  • Read about maximum curiosity methods(expensive way of Active learning in which for each unlabelled sample, all the possible classers it can take are considered and the improvement in the accuracy of the classifier for each of these cases is determined. The sample which can obtain a maximum improvement in the accuracy is selected for labelling.)
  • SUMO Matlab toolbox: used mainly for surrogate modelling. Active learning is used to generate a new point which can reduce the error in the model
  • Started implementation in C (using libsvm)- switching to MATLAB now.

October 06, 2008

Discussion in the Meeting :

  • Electronic personal assistants - CALO and electronic elves
  • Prepare summery report
  • Siemen's data of medical images - terms of use

Progress After the Meeting :

  • preparing report
  • reading a survey of Active learning by Anitha Krishnakumar[3]

September 22, 2008

Discussion in the Meeting :

  • Feasibility of synthesizing a data sample in a kernel based model
  • Finding out a way to handle the newly coming data(query it, store it for future consideration, discard it)

Progress After the Meeting :

  • Read papers-
    • Analysis of a greedy active learning strategy - Sanjoy Dasgupta
  • Read an overview of papers-
    • A. Kapoor and R. Greiner. Reinforcement learning for active model selection. In Proc. ACM SIGKDD Workshop on Utility-based Data Mining, 2005. There is a budget - cost involved in getting the feature values - RL is used to find out how to use the budget(as it is an MDP) - the method turned out to be inferior to other spending policies
    • Sebastian Thrun, Exploration in Active Learning(1995)
      • Earlier definitions and approaches to Active learning
      • Mainly RL and ANN approaches(paper focused on ANN based approaches)

  • to read - Reinforcement learning with immediate rewards and linear hypotheses(2003) by Naoki Abe, Alan W. Biermann, Philip M. Long, Algorithmica
  • Potential Domain that came to my mind: Medical Image Classification
    • Labelling is an expensive work
    • Online nature of the problem


  • Even in the case of a set of available labelled data, can we remove noisy data if we use Active Learning(assuming 0 labelled data)?
    • Ans: One possibility is to query for more samples around suspected noisy data in order to reduce variance. This related to variance based exploration strategies in RL. There is a paper by Kaelbling, I cannot recall the title immediately, on using confidence interval of value estimates to drive exploration. Similarly one could think of using confidence intervals on class labels, to drive active learning

September 18, 2008

Discussion in the Meeting :

  • Reinforcement Learning and Sequential Decision Problems
  • Obtain Decision theoretic formulation for the Active Learning Problem
  • Ad and Click through problem and Ranvir's Dataset.

August 29, 2008

Discussion in the Meeting :

  • Applicability of RL in Online Active Learning
  • How to increase the Long Term Reward
  • Tradeoff between Sample Complexity and Time Complexity
  • Choose samples that enables better function approximation
  • Try to get a Domain and look for existing Active Learning Packages

Progress After the Meeting :

  • Survey of existing Active Learning Packages

August 18, 2008

Discussion in the Meeting :

  • Co-Learning, Bias-Variance Trade off
  • RL modeling of Active Learning problem
  • Possibility of infinite actions and relation among actions
  • Read Learning to Learn

Progress After the Meeting :

  • Read Paper on Active Kernel Learning and Hierarchical Sampling for Active Learning

Interesting links

Papers of Sanjoy Dasgupta

Home page of Claire Monteleoni

Langford's blog

Personal tools