Imagine your phone could recognize your mood. As your personal assistant, it could then play you a happy song, give you a hint about some good news of your friends or even call the police in case of existential threat. All the sensory armament of todays devices give them quite a good interface to tap into your sentimental condition.
These extreme capabilities are still science fiction, but they relate to a techology already in use: sentiment analysis. Brand marketeers heavily use sentiment analysis to track customers opinions about product and company related news. With natural language processing, text mining, and data mining, these digital merceneries gather, categorize and analyze social media content, online comments and product reviews that consumers make about a given brand or product. These customer opinions are valuable and are the holy grail of all marketing.
But imagine a more private use case of sentiment analyis, something that maybe awaits us in t
Twitter Sentiment Analysis for Security-Related Information Gathering, A. Jurek et.al., 2014 IEEE Joint Intelligence and Security Informatics Conference
Probability theory (PT) is not liked by many people. This is due mostly because probabilities are kind of messy and not always easy to grasp. It needs reflection. It is easily forgotten or overlooked that during our daily routines our brain mastered this technique elegently. But our brain displays its mastery mostly in silent mode, therefore our dependence and reliance on probability thinking is easily ignored.
One way to remedy this is thinking about concepts of PT. If we consider an event and observe its outcome, we immediately ask for its cause or reason. The underlying assumption is, that, in general, nothing comes nothing. Understanding such processes that lead to different events is the business of science. This is not to say that everything is deterministic in a strong philosophical way. Consider human actions. Most of them can be said to be quite random, either because their underlying motives and causes are too complex, or they are truely unpredictable. But this does not mean that over a large amount of these events no structures can be discovered. And this is part of what statistical modelling is about.
Part of statistical modelling are probability distributions (PD). They are in simple terms a representation of all possible outcomes and their respective probabilities. Probability distributions (PD) are a main ingredience for machine learning techniques. There their function is to represent the observed data, which is finite and independent, in some mathematical form that encodes our assumptions and knowledge of a specific data set. The last part is important! Because we typically don't the exact nature of the process that generates this data, we have to rely on a good approximation and assume they follow this or that distribution. To pin down this distribution for a particular problem can be quite artistic. And as you have probably already guessed there many PDs out there. So don't get scared to find out more about Gaussian, Laplace, Exponential or Poisson Distributions. There are many interesting things to discover.
Life is full of challenges. So is the life of intelligent machines. Although some problems tend to be easy, others a horribly hard (->AI-completness). Interestingly problems are categorized very differently by humans and by machines. The question of the meaning of life is trivial for a machine ("=42"), but (mostly) undecidable for humans. Understanding a joke or catching a ball is fairly easy for most of us, but machines are rather clumsy succeeding in these tasks.
Fortunately most hard problems are not without structure. And structure is good in order to tackle the void of the search space, which can be astronomically huge for some problems. To tackle problems that are simply too complex in the number of variables, formal science invented a technique called Constraint Satisfaction. Constraint Satisfaction Problems (CSP) can be thought of as providing a formal frame to represent and solve some problems in artificial intelligence. Canonically a constraint satisfaction problem (CSP) requires a value, selected from a given finite domain, to be assigned to each variable in the problem, so that all constraints relating the variables are satisfied. Examples are manyfold:
Algorithms for Constraint-Satisfaction Problems: A Survey, Vipin Kumar
Constraint satisfaction problems: Algorithms and applications, S.C. Brailsford
A Filtering Algorithm for Constraints of Difference in CSP, Jean-Charles Regin
It's sounding magical, but it not science fiction. Computation over encrypted data is becoming increasingly sexy. This should not surprise anyone who felt privacy an underrepresented topic in the contemporary Big Data debates As machine learning applications sweep across different industries, use cases with strong privacy constraints are getting more and more attention: medical data, financial data, even personal preferences raise the privacy attention of many data engineers.
Until recently using classification techniques seemed straight forward: gather quality data, train a model, retrieve classification results. Taking privacy issues into account, all three steps get a different twist. How to train a model if you don't want your data to be transparent? How do you train your model robustly, so that it can't be influenced in the training phase by malicious third parties (-> secure training and classifier construction, adverserial examples)? How do you secure your model from leaking information about the model structure (-> model protection)? How do you prevent attacks on the classification transparancy (-> differential privacy)?
Privacy-preserving classification and machine learning over encrypted data is therefore an important technological improvement. And it is based on a mathematical magic trick, that allows operations on fully encrypted data. It's called homomorphic encryption. It allow specific operations, such as addition and multiplication on encrypted data, therefore making many machine learning algorithms possible to be implenented in this manner. Sounds cool? Check out the research papers in the Link-section.
Machine Learning Classification over Encrypted Data, Raphael Bost et.al., Proceedings of the 2015 Network and Distributed System Security Symposium (NDSS 2015)