Machine Learning, Datamining, OCR

Machine Learning (l’Apprentissage automatique) :
Give the computer the ability to learn without being explicitly programmed. – Arthur Samuel – 1959

Ils utilisent des mécanismes de “Machine Learning”:
– les recommandations de NetFlix, Amazon
– la reconnaissance d’ecriture, Computer Vision, Natural Language Processing (NLP),
– Le DataMining pour classifié les immenses bases de données ds la biologie, ingénierie, le Web

Trois grandes catégories d’apprentissage :
Supervisé : C’est un mentor qui ajoute des règles d’apprentissage.
2 categories de pb d’apprentissage supervisé: Regression pb, ou classification pb.

Par renforcement : C’est l’utilisateur qui donne des indications, et elles sont conservée pour les prédictions futures

Non supervisée : C’est du clustering, c’est-à-dire de la classification, mais c’est le logiciel qui trouve lui-même les attributs sur lesquels il va classer. Le clustering est plutôt utilisé en data mining pour essayer de trouver des corrélations qui ne sont pas évidentes à priori
Pour l’instant, j’ai fait du supervisé, et je regarde comment je peux introduire du renforcement.

Vue générale du sujet :
http://blogs.msdn.com/b/big_data_france/archive/2013/04/24/apprentissage-automatique-machine-learning.aspx

https://www.rocq.inria.fr/axis/modulad/archives/numero-42/WORD-11Articles/2.%20GAM-VOVK-final.pdf
http://web.stanford.edu/~hastie/Papers/svmtalk.pdf

A chaque fois, l’apprentissage automatique repose sur des stats. En fait, c’est de la prédiction basée sur de l’inférence statistique.
Il existe plusieurs méthodes :
– Classification naïve Bayésienne
– Machine à Vecteur de Support
– Boosting
– Plus proches voisins
– Clustering

But: avoir une intuition de chaque méthode pour pouvoir utiliser les boîtes à outils qui fleurissent un peu partout (Azure machine Learning, Google prediction Api, …).
Ou alors, il faudra deux ans de cours et d’exercices de Maths avant de commencer quoique ce soit, donc on a pas le temps.

The technological advances made in the modern era have raised a number of key problems.

Foremost of these is that there is a great volume of information generated that is accessible via the internet and other mediums, but fewer ways to determine whether the quality of the information is good. It is necessary to find a way to refine the information available to us into usable knowledge. This is where Machine Learning (ML) comes into play. Machine Leaning is a system where by a set of complex algorithms are applied to data, which, through an iterative process, will help to turn the data into more useful and quality information. There are a number of vital applications for Machine Learning, and it can be used for pressing real world problems. So much so that Machine Learning is replacing traditional statistical predictive models in many fields.

Learning from it’s mistakes
In essence the standard Machine Learning process works thusly: information is assembled, it is then fed through a series of algorithms, which, as they are used successfully, produce results which can be used iteratively to adjust the algorithms – and so the machine learns from its mistakes. We have improved upon this basic design by incorporating ensemble classification techniques as an alternative to the standard iterative process. These techniques include attitribute/feature selection, surpervised/semi-supervised leaning, clustering, classification, association rules, filters and estimators. These techniques help to provide a far greater clarity to the information that is fed into the algorithms, and results in a much more comprehensive understanding of the relative merits of various classifiers. This information is then very useful for making quick and clear decisions, which has a lot of value for the business world, which requires a very high pace of information turn over.

Dealing with uncertainity
The merit of our experts’ method of ensemble classification is that they enable a user to quickly combine multiple classifiers across a great number of different feature sets in order to produce a highly accurate ensemble classifier. These classifiers include Support Vector Machines, Fuzzy Decision Trees, Naive Bayes classifiers and Neural Networks. This enables the Machine Learning system to deal with a greater degree of uncertainty In the data it is presented with, and to learn at a far more rapid and accurate pace. The concept has a lot of application in the field of reverse engineering – where the level of uncertainty and volume of data is very high – as in, similarly, software mining where existing software can be examined and used to produce models – like entity-relationship diagrams.

Our team of experts have successfully applied these algorithms in a number of different fields: equity trading, fraud detection, NLP, predictive coding in risk models, auto-discovery and environmental modelling applications. In all these fields, which yield a great amount of data and high levels of uncertainty, these algorithms have been very successful. These algorithms are key to accurately assess large quantities of raw data, and as the modern era becomes more advanced, it will become more and more crucial to develop new and more successful algorithms.

Coursera – Apprentissage automatique – Machine Learning:
https://www.cousrera.org/course/ml

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *