Thoughts about Intelligent Machines

... and Things that Aspire to Become it!


"Computer science is no more about computers than astronomy is about telescopes." Edsger Dijkstra

Emotions of Computers. The uncanny Science of Sentiment Analysis
13.10.2017

Imagine your phone could recognize your mood. As your personal assistant, it could then play you a happy song, give you a hint about some good news of your friends or even call the police in case of existential threat. All the sensory armament of todays devices give them quite a good interface to tap into your sentimental condition.


These extreme capabilities are still science fiction, but they relate to a techology already in use: sentiment analysis. Brand marketeers heavily use sentiment analysis to track customers opinions about product and company related news. With natural language processing, text mining, and data mining, these digital merceneries gather, categorize and analyze social media content, online comments and product reviews that consumers make about a given brand or product. These customer opinions are valuable and are the holy grail of all marketing.


But imagine a more private use case of sentiment analyis, something that maybe awaits us in t

monitor the public mood events analytics sentiment analysis of social media content lexicon-based approach to sentiment analysis of Twitter content
Statistical Modelling. When everything becomes a Probablity Distribution
11.10.2017

Probability theory (PT) is not liked by many people. This is due mostly because probabilities are kind of messy and not always easy to grasp. It needs reflection. It is easily forgotten or overlooked that during our daily routines our brain mastered this technique elegently. But our brain displays its mastery mostly in silent mode, therefore our dependence and reliance on probability thinking is easily ignored.


One way to remedy this is thinking about concepts of PT. If we consider an event and observe its outcome, we immediately ask for its cause or reason. The underlying assumption is, that, in general, nothing comes nothing. Understanding such processes that lead to different events is the business of science. This is not to say that everything is deterministic in a strong philosophical way. Consider human actions. Most of them can be said to be quite random, either because their underlying motives and causes are too complex, or they are truely unpredictable. But this does not mean that over a large amount of these events no structures can be discovered. And this is part of what statistical modelling is about.


Part of statistical modelling are probability distributions (PD). They are in simple terms a representation of all possible outcomes and their respective probabilities. Probability distributions (PD) are a main ingredience for machine learning techniques. There their function is to represent the observed data, which is finite and independent, in some mathematical form that encodes our assumptions and knowledge of a specific data set. The last part is important! Because we typically don't the exact nature of the process that generates this data, we have to rely on a good approximation and assume they follow this or that distribution. To pin down this distribution for a particular problem can be quite artistic. And as you have probably already guessed there many PDs out there. So don't get scared to find out more about Gaussian, Laplace, Exponential or Poisson Distributions. There are many interesting things to discover.

Constraint Satisfaction. A Lesson for Life
05.10.2017

Life is full of challenges. So is the life of intelligent machines. Although some problems tend to be easy, others a horribly hard (->AI-completness). Interestingly problems are categorized very differently by humans and by machines. The question of the meaning of life is trivial for a machine ("=42"), but (mostly) undecidable for humans. Understanding a joke or catching a ball is fairly easy for most of us, but machines are rather clumsy succeeding in these tasks.


Fortunately most hard problems are not without structure. And structure is good in order to tackle the void of the search space, which can be astronomically huge for some problems. To tackle problems that are simply too complex in the number of variables, formal science invented a technique called Constraint Satisfaction. Constraint Satisfaction Problems (CSP) can be thought of as providing a formal frame to represent and solve some problems in artificial intelligence. Canonically a constraint satisfaction problem (CSP) requires a value, selected from a given finite domain, to be assigned to each variable in the problem, so that all constraints relating the variables are satisfied. Examples are manyfold:

  • Floor Plan Design
  • Scheduling and Timetabling Tasks
  • Diagnostic Reasoning
  • Cryptarithmetic puzzles and games (like Sudoku)
Constraint satisfaction problems are combinatorial in nature. The problem of the existence of solutions in a CSP is unfortunately NP-complete, that means, an effcient algorithm for these problems is unlikely to exist. Nonetheless methods like intelligent backtracking and soft constraints have been invented to tackle these problems. If you want to find out more, check out the research papers in the link section and have fun with CSPs!

Homomorphic Encryption. Magic in the Crypto-Realm
01.10.2017

It's sounding magical, but it not science fiction. Computation over encrypted data is becoming increasingly sexy. This should not surprise anyone who felt privacy an underrepresented topic in the contemporary Big Data debates As machine learning applications sweep across different industries, use cases with strong privacy constraints are getting more and more attention: medical data, financial data, even personal preferences raise the privacy attention of many data engineers.


Until recently using classification techniques seemed straight forward: gather quality data, train a model, retrieve classification results. Taking privacy issues into account, all three steps get a different twist. How to train a model if you don't want your data to be transparent? How do you train your model robustly, so that it can't be influenced in the training phase by malicious third parties (-> secure training and classifier construction, adverserial examples)? How do you secure your model from leaking information about the model structure (-> model protection)? How do you prevent attacks on the classification transparancy (-> differential privacy)?


Privacy-preserving classification and machine learning over encrypted data is therefore an important technological improvement. And it is based on a mathematical magic trick, that allows operations on fully encrypted data. It's called homomorphic encryption. It allow specific operations, such as addition and multiplication on encrypted data, therefore making many machine learning algorithms possible to be implenented in this manner. Sounds cool? Check out the research papers in the Link-section.

Links
Computer Science TU Darmstadt