My research interest lies in organizing information in large-scale information environments and extracting knowledge from raw datasets. I design, build, and experiment with Web-based data infrastructures, investigate solutions for (predictive) data analytics scenarios, and study the design of the interaction between users and data-driven applications. Currently, I concentrate on the following research topics:
- Large-scale information environments: I participate in ResourceSync, which is a data synchronization framework for the Web. Previously, I co-designed and developed data.europeana.eu, which exposes metadata of 20 million texts, images, videos and sounds gathered from institutions all over Europe as Open Data. To demonstrate how open, structured Web vocabularies could be used in information retrieval, we built the Lucene-SKOS query expansion module for Apache Lucene and Solr. We also experimented with user-interest models to improve named entity disambiguation on short social media texts such as Twitter (RESLVE).
- Data analytics: Recently, we applied machine-learning techniques for automatically identifying useful comments in Flickr Commons. In another project, we investigated quality issues in SKOS vocabularies and provided qSKOS, which is a tool that supports users in identifying quality issues in their vocabularies. In our DSNotify work we addressed the problem that links in distributed open data networks can break over time and proposed a change detection framework that informs data-consuming actors about various types of change events.
- User Interaction and Human Factors: In the Maphub project, we built a system that allows people to cross-reference historical maps with resources in Web-based knowledge graphs and studied the interaction between users and the enabling technique, which we call Semantic Tagging. In this context, I also contribute to the W3C Open Annotation working group. I also support the design of Meketre, which aims at helping Egyptologists to organize, analyze, and share collected materials and data.
Bernhard Haslhofer is working as a Data Scientist at AIT Austrian Institute of Technology. Previously, he was an EU Marie Curie Fellow Postdoc and Lecturer at Cornell University Information Science and the University of Vienna. He received his PhD from the University of Vienna, and a masters degree and diploma in Economics and Computer Science from the Technical University of Vienna. His research interest lie in organizing information in large-scale information environments and extracting knowledge from raw datasets. He designs, builds, and experiments with Web-based data infrastructures, investigates solutions for (predictive) data analytics scenarios, and studies the design of the interaction between people (users) and data-driven applications.
Recent Publications (see all ...)
Momeni, Elaheh and Haslhofer, Bernhard and Tao, Ke and Houben, Geert-Jan: Sifting useful comments from Flickr Commons and YouTube. In: International Journal on Digital Libraries 1-19, 2014.
Isaac, Antoine and Haslhofer, Bernhard: Europeana Linked Open Data - data.europeana.eu. In: Semantic Web 4(3), IOS Press, 2013.
Haslhofer, Bernhard and Warner, Simeon and Lagoze, Carl and Klein, Martin and Sanderson, Robert and Van de Sompel, Herbert and Nelson, Micheal: Web Synchronization Simulations using the ResourceSync Framework. Technical Report, University of Vienna, 2013.
Mader, Christian and Haslhofer, Bernhard: Perception and Relevance of Quality Issues in Web Vocabularies. In: I-Semantics, Graz, Austria, 2013.
Momeni Roochi, Elaheh and Tao, Ke and Haslhofer, Bernhard and Houben, Geert-Jan: Identification of Useful User Comments in Social Media: A Case Study on Flickr Commons. In: ACM/IEEE Joint Conference on Digital Libraries (JCDL 2013), Indianapolis, USA, 2013. Student Best Paper Award Nominee
Haslhofer, Bernhard and Martins, Flávio and Magalhães, João: Using SKOS vocabularies for improving Web Search. In: Web of Linked Entities (WoLE) Workshop, co-located with WWW 2013, Rio de Janeiro, 2013.
Murnane, Elizabeth L and Haslhofer, Bernhard and Lagoze, Carl and : RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text. In: Web of Linked Entities (WoLE) Workshop, co-located with WWW 2013, Rio de Janeiro, 2013. Best Paper
Murnane, Elizabeth L and Haslhofer, Bernhard and Lagoze, Carl and : RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text. In: WWW 2013, Poster Track, Rio de Janeiro, 2013.
Haslhofer, Bernhard and Warner, Simeon and Lagoze, Carl and Klein, Martin and Sanderson, Robert and Nelson, Michael L and Van de Sompel, Herbert and : ResourceSync: Leveraging Sitemaps for Resource Synchronization. In: WWW 2013, Developers Track, Rio de Janeiro, 2013.
Haslhofer, Bernhard and Robitza, Werner and Lagoze, Carl and Guimbretiere, Francois: Semantic Tagging on Historical Maps. In: ACM Web Science 2013, Paris, 2013.
Klein, Martin and Sanderson, Robert and Van de Sompel, Herbert and Warner, Simeon and Haslhofer, Bernhard and Lagoze, Carl and Nelson, Michael L: A Technical Framework for Resource Synchronization. In: D-Lib Magazine, 19 (1). p. 3, 2013.
ResourceSync Framework Specification (co-editor): describes a synchronization framework for the web consisting of various capabilities that allow third party systems to remain synchronized with a server's evolving resources.
Open Annotation Data Model (contributor): specifies an interoperable framework for creating associations between related resources, annotations, using a methodology that conforms to the Architecture of the World Wide Web. Open Annotations can easily be shared between platforms, with sufficient richness of expression to satisfy complex requirements while remaining simple enough to also allow for the most common use cases, such as attaching a piece of text to a single web resource.
Fall 2012 (Cornell)
INFO/CS 4302 - Web Information Systems, instructor. This course introduces technologies for building data-centric information systems on the World Wide Web, show the practical applications of such systems, and discuss their design and their social and policy context by examining cross-cutting issues such as citizen science, data journalism and open government.
INFO 5900 - Independent Research, instructor.
Spring 2012 (Cornell)
CS 5999 - Master of Engineering Project, co-instructor.
Fall 2011 (Cornell)
INFO/CS 4302 - Web Information Systems, co-instructor. Examines technologies for building data-centric information systems on the World Wide Web, discusses the social and policy context from which they arose, shows the practical applications of such systems, and go into cross-cutting issues in this context. Topics: Internet and Web foundations, structured Web data, RESTful Web Services, Linked Data, Knowledge Organization on the Web, Citizen Science, Human Computation.
CS 5999 - Master of Engineering Project, co-instructor.
Before (University of Vienna)
Multimedia Information Systems 2 (2007-2011), co-instructor. A masters-level course in Media Informatics examining technologies and available applications for building (multimedia) Web information systems. Focus on XML, Semantic Web technologies and, metadata standards.
Multimedia Information Retrieval (2009-2011), co-instructor. An advanced masters-level course focusing on the principles of information retrieval in distributed environments such as the Web, with a special focus on multimedia information.
Information System Technologies for Multimedia Applications (2008-2010). An undergraduate course focusing on the technical properties of various media types (image, audio, video) and their technical processing (e.g., with Java Media Framework) in multimedia applications.
Media Informatics Student Projects (2008-2011).
Modeling Techniques and Methods (2007-2011), co-instructor. An undergraduate introductory course covering basic data modeling standards such as EER, UML, etc.
Grants and Projects
SciLink (03/2011 - ongoing), an EU PEOPLE International Outgoing Fellowship (Marie Curie) grant carried out at Cornell University and the University of Vienna. Research on (i) interactive links discovery in scholarly publication processes, (ii) strategies for maintaining link integrity, and (iii) novel Web-based resource aggregation and presentation interfaces for scholarly publication workflows.
ResourceSync (12/2011 - ongoing), a joint NISO and Open Archives Initiative (OAI) project funded by the Sloan Foundation. In this project we research, develop, prototype, test, and deploy mechanisms for the large-scale synchronization of web resources. Building on the OAI-PMH strategies for synchronizing metadata, this project will enhance that specification using modern web technologies, but will allow for the synchronization of the objects themselves, not just their metadata.
Maphub (12/2011 - 02/2013), an experiment funded by the Andrew W. Mellon Foundation. We examined the application of Open Annotations in the context of historic map material. Our goal was to design and build a collaborative Web environment in which scholars and citizens can contribute their knowledge to digitized high-resolution online maps. We experimented with designs that integrate the annotation process with the re-use of data from public data sources, such as Wikipedia.
MEKETRE (07/2009 - 12/2012), an interdisciplinary Austrian Research Fund (FWF) project with the Institute for Egyptology at the University of Vienna. It aimed at building a collaborative Web-based solution for efficiently organizing the collected and digitized content objects from the Egyptian middle kingdom period by means of open collaboratively developed vocabularies.
EuropeanaConnect (05/2009 - 10/2011), an EU eContentplus funded project that supported the development of Europeana, which enables people to explore the digital resources of Europe's museums, libraries, archives and audio-visual collections
BRICKS (01/2003 – 10/2007), an EU FP 6 Integrated Project that aimed at building the infrastructure for integrating cultural heritage institutions across Europe. My work focus was on the metadata management task, which had the goal to provide a flexible, distributed metadata storage solution that meets the heterogeneous requirements of the institutions involved in BRICKS.
Open Humanities Award, 2013
"Certificate of Appreciation", awarded by the University of Vienna, Faculty of computer science. 2010, 2011
Invited Talks / Research Visit / etc.
Maphub - Annotations and Semantic Tags on Historical Maps. Stanford University - Open Annotation Rollout. April 2013, Palo Alto, USA. (slides)
Old Maps, Annotations, and Open Data Networks. Harvard University. January 2013, Cambridge, USA. (slides)
Research Visit Los Alamos National Labs, May 2012
Linked Data and SKOS. Workshop on Physics Classification. December 2011, Boston, USA. (slides)
Linked Data in Scholarly Communication. AAHEP5 Information Provider Summit, Cornell University. October 2011, Ithaca, USA. (slides)
Metadata is back! Keynote at Semantic Web Technologies for Libraries and Readers Workshop, co-located with JCDL 2011. June 2011, Ottawa, Canada. (slides)
Linked Data als Perspektive für die bibliothekarische Inhaltserschließung. (German) Österreichisches Online-Informationstreffen und Österreichischer Dokumentartag (ODOK), 2010, Leoben, Austria. (slides)
CIDOC CRM in Practice - Experiences, Problems, and Possible Solutions. Workshop Vernetzte Datenwelten, Deutschen Archäologisches Institut (DAI), 2009, Berlin, Germany. (slides)
Linked Data Tutorial. Vlaams Theater Instituut, 2009, Brussels, Belgium. (slides)
11th International Conference on Web Engineering (ICWE 2011), Doctoral consortium co-chair
Very Large Databases Conference (VLDB 2007), local organization
Linked Data Camp 2009, Museumsquartier (MQ) Vienna
Web of Data Practitioner’s Days 2008, University of Vienna
Reviewing Activities / Program Committee Memberships
Multimedia Tools and Applications (2010, 2011)
International Journal on Digital Libraries (2009, 2012)
ACM Computing Surveys (2009)
World Wide Web (WWW) - Demo track (2012)
International Conference on Semantic Systems (I-Semantics), 2010
Linked Data Triplification Challenge, 2011