Abstract: Machine learning systems have made rapid progress in the past few years, as evidenced by the remarkable feats they have accomplished on fields as diverse as computer vision or reinforcement learning. Yet, as impressive as these achievements are, they rely on learning algorithms that require orders of magnitude more data than a human learner would. This disparity could be rooted in many different factors. In this talk, we will draw on the hypothesis that compositional learning — that is, the ability to recombine previously acquired skills and knowledge to solve new problems– could be one important element of fast and efficient learning (Lake et al, 2017). In this direction, we will discuss our ongoing efforts towards building systems that can learn in compositional ways. Concretely, we will present a simple benchmark based on function composition to measure the compositionality of learning systems and use it to draw insights for whether current learning systems learn or can learn in a compositional manner.
Lake, Brenden M., et al. “Building machines that learn and think like people.”Behavioral and Brain Sciences 40 (2017).
Biography: Germán Kruszewski is a postdoctoral researcher joining Facebook in February 2017. He obtained my PhD at the University of Trento under the direction of Marco Baroni where he worked in the area of distributional semantics, understanding the strengths and limitations of distributional models when trying to account for the richness of human conceptual knowledge.
Title: Scaling-up Sentiment Analysis through Continuous Learning
Abstract: Sentiment analysis (SA) or opinion mining is the computational study of people’s opinions, sentiments, emotions, and evaluations. Due to numerous research challenges and almost unlimited applications, SA has been a very active research area in natural language processing and text mining. In this talk, I will first give a brief introduction to SA, and then move on to discuss some major difficulties with the current technologies when one wants to perform sentiment analysis in a large number of domains, e.g., all products sold in a large retail store. To tackle the scaling-up problem, I will describe our recent work on lifelong machine learning (LML) (or lifelong learning) that tries to enable the machine learn like humans, i.e., learning continuously, retaining or accumulating the knowledge learned in the past, and using the knowledge to help future learning and problem solving. This paradigm is quite suitable for SA and can help scale up SA to a large number of domains with little manual involvement.
Bing Liu is a distinguished professor of Computer Science at the University of Illinois at Chicago. He received his Ph.D. in Artificial Intelligence from the University of Edinburgh. His research interests include sentiment analysis, lifelong learning, data mining, machine learning, and natural language processing (NLP). He has published extensively in top conferences and journals. Two of his papers have received 10-year Test-of-Time awards from KDD. He also authored four books: two on sentiment analysis, one on lifelong learning, and one on Web mining. Some of his work has been widely reported in the press, including a front-page article in the New York Times. On professional services, he served as the Chair of ACM SIGKDD (ACM Special Interest Group on Knowledge Discovery and Data Mining) from 2013-2017. He also served as program chair of many leading data mining conferences, including KDD, ICDM, CIKM, WSDM, SDM, and PAKDD, as associate editor of leading journals such as TKDE, TWEB, DMKD and TKDD, and as area chair or senior PC member of numerous NLP, AI, Web, and data mining conferences. He is a Fellow of ACM, AAAI and IEEE.
Title: La textométrie comme outil d’expertise : application à la négociation de crise.
Pour aborder la pertinence de la pratique textométrique dans des problématiques de terrains et comme outil d’expertise, on étudiera les échanges réels impliquant les négociateurs des Forces d’intervention de Police, dans des contextes de barricades, prises d’otages, terrorisme ou intention suicidaire à haut niveau de dangerosité.
Nous envisagerons donc la négociation au travers des dynamiques de choix lexical et nous chercherons à cartographier le lexique, classer des segments de textes et comparer des profils de locuteurs et de situations.
On se propose ainsi de répondre aux questions suivantes :
- Y a-t-il des thèmes récurrents dans les crises ?
- Y a-t-il une chronologie lexicale de la crise ?
- Comment se gèrent les émotions ?
- Quelles sont les spécificités des situations « radicalisées » ?
L’objectivation des échanges et la mise en évidence des séquences formelles peut alors fournir une aide au diagnostic, dans le but de tirer des éléments concrets pour des objectifs de retour d’expérience et de formalisation des pratiques des professionnels de la négociation.
George K. Mikros
The aim of this presentation is to describe a new method of attributing texts to their real authors using combined author profiles, modern computational stylistic methods based on shallow text features (n-grams) and machine learning algorithms. Until recently, authorship attribution and author profiling were considered similar methods (nearly identical feature sets and classification algorithms), but with different aims, i.e. in the former to identify the author’s identity and in the latter to detect author’s characteristics such as gender, age, psychological profile etc. Both of these methods have been used independently aiming at different research aims and in different real-life tasks.
However, in this talk we will present a unified methodological framework where standard authorship attribution methodology and author profiling are combined so that we can approach more effectively open or semi-open authorship attribution problems, a category known as authorship verification which is particularly difficult to tackle with present computational stylistic methods. More specifically, we will present preliminary research results from the application of this blended methodology to a real semi-open authorship problem, the Ferrante’s authorship case. Using a corpus of 40 modern Italian literary authors compiled by Arjuna Tuzzi and Michele Cortelazzo from the University of Padua (Tuzzi & Cortelazzo, under review) , we will explore the dynamics of author profiling in gender, age and region and various methods we can combine the extracted profiles so that we can entail the identity of the real author behind Ferrante’s books.
Moreover, we will extend this methodology and validate its usefulness in social media texts using the English Blog Corpus (Argamon, Koppel, Pennebaker, & Schler, 2007). Using, simulated scenarios of authorship attribution cases (the real author to be included in the training data and the real author to be missing from the training corpus) we will further evaluate the usefulness of the proposed blended methodology which can lead to some exciting new possibilities for investigating author identities in both closed and open authorship attribution tasks.
Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2007). Mining the blogosphere: Age, gender and the varieties of self–expression. First Monday, 12(9). http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2003/1878
Tuzzi, A., & Cortelazzo, M. A. (under review). What is Elena Ferrante? A Comparative Analysis of a Secretive Bestselling Italian Writer. Digital Scholarship in the Humanities.
George K. Mikros is currently Professor of Computational and Quantitative Linguistics and Chair of the Department of Italian Language and Literaturein the National and Kapodistrian University of Athensand Director of the Computational Stylistics Lab. He also holds an adjunct faculty position at theDepartment of Applied Linguistics, in the Master of Arts Online Program of the University of Massachusetts, Boston. He graduated from the Department of Italian Language and Literature (Aristotle University of Thessaloniki) in 1992. He completed his Ph.D. thesis in 1999 at the Department of Linguistics (National and Kapodistrian University of Athens). In 1992 he joined Institute for Language and Speech Processing (ILSP) and in 1996-1998 he was head of the Institute’s Liaison Office. Since 1999 coordinates the Course Unit ISP12 “Understanding the Language and Civilization: From Latin to Modern Spanish” in the undergraduate Program “Hispanic Language and Civilisation Studies” at the Hellenic Open University. From 2016 he is the Director of the Program. He has participated in many European and National Research Programs in the area of Human Language Technology. He was elected Member at Large (2007-2009), General Secretary (2012-2016) and Vice President (2016-present) of the International Association of Quantitative Linguistics (IQLA). He also participates as a program committee member in many International Conferences in the area of Computational Linguistics. Publishes in international journals and peer-reviewed conference proceedings in the following areas: Computational Linguistics, Computational Stylistics, Corpus Linguistics, Quantitative Linguistic Data Analysis, Stylometry, Automatic Authorship Attribution, Opinion Mining and Sentiment Analysis.
Title: From text to concepts and back: going multilingual with BabelNet in a step or two
Multilinguality is a key feature of today’s Web, and it is this feature that we leverage and exploit in our research work at the Sapienza University of Rome’s Linguistic Computing Laboratory, which I am going to overview and showcase in this talk. I will describe the most recent developments of the BabelNet technology. I will introduce BabelNet live – the largest, continuously-updated multilingual encyclopedic dictionary – and then discuss a range of cutting-edge industrial use cases implemented by Babelscape, our Sapienza startup company, including: multilingual interpretation of terms; multilingual concept and entity extraction from text; cross-lingual text similarity.
Roberto Navigli is Professor of Computer Science at the Sapienza University of Rome, where he heads the multilingual Natural Language Processing group. He is one of the few Europeans to have received two prestigious ERC grants in computer science, namely an ERC Starting Grant on multilingual word sense disambiguation (2011-2016) and an ERC Consolidator Grant on multilingual language- and syntax-independent open-text unified representations (2017-2022). He was also a co-PI of a Google Focused Research Award on NLP. In 2015 he received the META prize for groundbreaking work in overcoming language barriers with BabelNet, a project also highlighted in The Guardian and Time magazine, and winner of the Artificial Intelligence Journal prominent paper award 2017. Based on the success of BabelNet and its multilingual disambiguation technology, he co-founded Babelscape, a Sapienza startup company which enables language processing in hundreds of languages.