My research interest lies in organizing information in large-scale information environments and extracting knowledge from raw datasets. I design, build, and experiment with Web-based data infrastructures, investigate solutions for (predictive) data analytics scenarios, and study the design of the interaction between users and data-driven applications. Currently, I concentrate on the following research topics:

  • Large-scale information environments: I participate in ResourceSync, which is a data synchronization framework for the Web. Previously, I co-designed and developed, which exposes metadata of 20 million texts, images, videos and sounds gathered from institutions all over Europe as Open Data. To demonstrate how open, structured Web vocabularies could be used in information retrieval, we built the Lucene-SKOS query expansion module for Apache Lucene and Solr. We also experimented with user-interest models to improve named entity disambiguation on short social media texts such as Twitter (RESLVE).
  • Data analytics: Recently, we applied machine-learning techniques for automatically identifying useful comments in Flickr Commons. In another project, we investigated quality issues in SKOS vocabularies and provided qSKOS, which is a tool that supports users in identifying quality issues in their vocabularies. In our DSNotify work we addressed the problem that links in distributed open data networks can break over time and proposed a change detection framework that informs data-consuming actors about various types of change events.
  • User Interaction and Human Factors: In the Maphub project, we built a system that allows people to cross-reference historical maps with resources in Web-based knowledge graphs and studied the interaction between users and the enabling technique, which we call Semantic Tagging. In this context, I also contribute to the W3C Open Annotation working group. I also support the design of Meketre, which aims at helping Egyptologists to organize, analyze, and share collected materials and data.

Short Bio

Bernhard Haslhofer is working as a Data Scientist at AIT Austrian Institute of Technology. Previously, he was an EU Marie Curie Fellow Postdoc and Lecturer at Cornell University Information Science and the University of Vienna. He received his PhD from the University of Vienna, and a masters degree and diploma in Economics and Computer Science from the Technical University of Vienna. His research interest lie in organizing information in large-scale information environments and extracting knowledge from raw datasets. He designs, builds, and experiments with Web-based data infrastructures, investigates solutions for (predictive) data analytics scenarios, and studies the design of the interaction between people (users) and data-driven applications.

Recent Publications (see all ...)

Isaac, Antoine and Haslhofer, Bernhard: Europeana Linked Open Data - Semantic Web 4(3), IOS Press, 2013.

Haslhofer, Bernhard and Warner, Simeon and Lagoze, Carl and Klein, Martin and Sanderson, Robert and Van de Sompel, Herbert and Nelson, Micheal: Web Synchronization Simulations using the ResourceSync Framework. Technical Report, University of Vienna, 2013.

Mader, Christian and Haslhofer, Bernhard: Perception and Relevance of Quality Issues in Web Vocabularies. In: I-Semantics 2013, Graz, Austria.

Momeni Roochi, Elaheh and Tao, Ke and Haslhofer, Bernhard and Houben, Geert-Jan: Identification of Useful User Comments in Social Media: A Case Study on Flickr Commons. In: ACM/IEEE Joint Conference on Digital Libraries (JCDL 2013), Indianapolis, USA. (Student Best Paper Award Nominee)

Haslhofer, Bernhard and Martins, Flávio and Magalhães, João: Using SKOS vocabularies for improving Web Search. In: Web of Linked Entities (WoLE) Workshop, co-located with WWW 2013, Rio de Janeiro (2013)

Murnane, Elizabeth L and Haslhofer, Bernhard and Lagoze, Carl: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text. In: Web of Linked Entities (WoLE) Workshop, co-located with WWW 2013, Rio de Janeiro (2013) (Best Paper)

Murnane, Elizabeth L and Haslhofer, Bernhard and Lagoze, Carl: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text. In: WWW 2013, Poster Track, Rio de Janeiro (2013)

Haslhofer, Bernhard and Warner, Simeon and Lagoze, Carl and Klein, Martin and Sanderson, Robert and Nelson, Michael L and Van de Sompel, Herbert: ResourceSync: Leveraging Sitemaps for Resource Synchronization. In: WWW 2013, Developers Track, Rio de Janeiro (2013)

Haslhofer, Bernhard and Robitza, Werner and Lagoze, Carl and Guimbretiere, Francois: Semantic Tagging on Historical Maps. In: In: ACM Web Science 2013, Paris (2013)

Klein, Martin and Sanderson, Robert and Van de Sompel, Herbert and Warner, Simeon and Haslhofer, Bernhard and Lagoze, Carl and Nelson, Michael L: A Technical Framework for Resource Synchronization. In: D-Lib Magazine, 19 (1). p. 3 (2013)


ResourceSync Framework Specification (co-editor): describes a synchronization framework for the web consisting of various capabilities that allow third party systems to remain synchronized with a server's evolving resources.

Open Annotation Data Model (contributor): specifies an interoperable framework for creating associations between related resources, annotations, using a methodology that conforms to the Architecture of the World Wide Web. Open Annotations can easily be shared between platforms, with sufficient richness of expression to satisfy complex requirements while remaining simple enough to also allow for the most common use cases, such as attaching a piece of text to a single web resource.


Fall 2012 (Cornell)

INFO/CS 4302 - Web Information Systems, instructor. This course introduces technologies for building data-centric information systems on the World Wide Web, show the practical applications of such systems, and discuss their design and their social and policy context by examining cross-cutting issues such as citizen science, data journalism and open government.

INFO 5900 - Independent Research, instructor.

Spring 2012 (Cornell)

CS 5999 - Master of Engineering Project, co-instructor.

Fall 2011 (Cornell)

INFO/CS 4302 - Web Information Systems, co-instructor. Examines technologies for building data-centric information systems on the World Wide Web, discusses the social and policy context from which they arose, shows the practical applications of such systems, and go into cross-cutting issues in this context. Topics: Internet and Web foundations, structured Web data, RESTful Web Services, Linked Data, Knowledge Organization on the Web, Citizen Science, Human Computation.

CS 5999 - Master of Engineering Project, co-instructor.

Before (University of Vienna)

Multimedia Information Systems 2 (2007-2011), co-instructor. A masters-level course in Media Informatics examining technologies and available applications for building (multimedia) Web information systems. Focus on XML, Semantic Web technologies and, metadata standards.

Multimedia Information Retrieval (2009-2011), co-instructor. An advanced masters-level course focusing on the principles of information retrieval in distributed environments such as the Web, with a special focus on multimedia information.

Information System Technologies for Multimedia Applications (2008-2010). An undergraduate course focusing on the technical properties of various media types (image, audio, video) and their technical processing (e.g., with Java Media Framework) in multimedia applications.

Media Informatics Student Projects (2008-2011).

Modeling Techniques and Methods (2007-2011), co-instructor. An undergraduate introductory course covering basic data modeling standards such as EER, UML, etc.

Grants and Projects

SciLink (03/2011 - ongoing), an EU PEOPLE International Outgoing Fellowship (Marie Curie) grant carried out at Cornell University and the University of Vienna. Research on (i) interactive links discovery in scholarly publication processes, (ii) strategies for maintaining link integrity, and (iii) novel Web-based resource aggregation and presentation interfaces for scholarly publication workflows.

ResourceSync (12/2011 - ongoing), a joint NISO and Open Archives Initiative (OAI) project funded by the Sloan Foundation. In this project we research, develop, prototype, test, and deploy mechanisms for the large-scale synchronization of web resources. Building on the OAI-PMH strategies for synchronizing metadata, this project will enhance that specification using modern web technologies, but will allow for the synchronization of the objects themselves, not just their metadata.

Maphub (12/2011 - 02/2013), an experiment funded by the Andrew W. Mellon Foundation. We examined the application of Open Annotations in the context of historic map material. Our goal was to design and build a collaborative Web environment in which scholars and citizens can contribute their knowledge to digitized high-resolution online maps. We experimented with designs that integrate the annotation process with the re-use of data from public data sources, such as Wikipedia.

MEKETRE (07/2009 - 12/2012), an interdisciplinary Austrian Research Fund (FWF) project with the Institute for Egyptology at the University of Vienna. It aimed at building a collaborative Web-based solution for efficiently organizing the collected and digitized content objects from the Egyptian middle kingdom period by means of open collaboratively developed vocabularies.

EuropeanaConnect (05/2009 - 10/2011), an EU eContentplus funded project that supported the development of Europeana, which enables people to explore the digital resources of Europe's museums, libraries, archives and audio-visual collections

BRICKS (01/2003 – 10/2007), an EU FP 6 Integrated Project that aimed at building the infrastructure for integrating cultural heritage institutions across Europe. My work focus was on the metadata management task, which had the goal to provide a flexible, distributed metadata storage solution that meets the heterogeneous requirements of the institutions involved in BRICKS.


Open Humanities Award, 2013

"Certificate of Appreciation", awarded by the University of Vienna, Faculty of computer science. 2010, 2011

Invited Talks / Research Visit / etc.

The Story behind Maphub, Open Knowledge Conference (OKCon), September 2013, Geneva. (slides)

Semantic Tagging for old maps...and other things on the Web, The Web As Literature, June 2013, British Library, London. (slides)

Linked (Open) Data. Guest Lecture at Technical University of Vienna, May 2013, Austria (slides)

Maphub and Annotorious. iAnnotate 2013. April 2013, San Francisco, USA. (slides)

Maphub - Annotations and Semantic Tags on Historical Maps. Stanford University - Open Annotation Rollout. April 2013, Palo Alto, USA. (slides)

Old Maps, Annotations, and Open Data Networks. Harvard University. January 2013, Cambridge, USA. (slides)

Research Visit Los Alamos National Labs, May 2012

Linked Data and SKOS. Workshop on Physics Classification. December 2011, Boston, USA. (slides)

Linked Data in Scholarly Communication. AAHEP5 Information Provider Summit, Cornell University. October 2011, Ithaca, USA. (slides)

Metadata is back! Keynote at Semantic Web Technologies for Libraries and Readers Workshop, co-located with JCDL 2011. June 2011, Ottawa, Canada. (slides)

Research on Scholarly Practices and Communication at Cornell Information Science. (with Carl Lagoze) Microsoft Research, May 2011. USA. (video)

Linked Data als Perspektive für die bibliothekarische Inhaltserschließung. (German) Österreichisches Online-Informationstreffen und Österreichischer Dokumentartag (ODOK), 2010, Leoben, Austria. (slides)

Linked Data im Kontext Digitaler Bibliothekssysteme. (German) Semantic Web in Bibliotheken (SWIB), 2009, Cologne, Germany. (slides)

CIDOC CRM in Practice - Experiences, Problems, and Possible Solutions. Workshop Vernetzte Datenwelten, Deutschen Archäologisches Institut (DAI), 2009, Berlin, Germany. (slides)

Linked Data Tutorial. Vlaams Theater Instituut, 2009, Brussels, Belgium. (slides)

Event Organization


11th International Conference on Web Engineering (ICWE 2011), Doctoral consortium co-chair

International Conference on Dublin Core and Metadata Applications (DC 2008), Poster chair

Very Large Databases Conference (VLDB 2007), local organization


Web of Data in the Context of Multimedia at SAMT2009, Graz, Austria (slides: 1, 2)

Semantic Digital Libraries Tutorial (WWW2007, ESWC2007, JCDL2006)


Linked Data Camp 2009, Museumsquartier (MQ) Vienna

Web of Data Practitioner’s Days 2008, University of Vienna

Reviewing Activities / Program Committee Memberships


International Journal on Semantic Web and Information Systems (IJSWIS) (2012)

Multimedia Tools and Applications (2010, 2011)

International Journal on Metadata, Semantics, and Ontologies (2010, 2012)

International Journal on Digital Libraries (2009, 2012)

ACM Computing Surveys (2009)


World Wide Web (WWW) - Demo track (2012)

International Conference on Web Engineering (ICWE), 2011, 2012, 2013

International Conference on Theory and Practice of Digital Libraries (TPDL), 2010, 2011, 2013

Extended Semantic Web Conference (ESWC), 2010, 2012

IEEE Conference on Commerce and Enterprise Computing (CEC), 2010, 2011, 2012

International Conference on Semantic Systems (I-Semantics), 2010

Dublin Core Conference (DC), 2008, 2009, 2010, 2011, 2012, 2013


Linked Data Triplification Challenge, 2011

Workshop on Scripting and Development for the Semantic Web (SFSW), 2009, 2010

Networked Knowledge Organization Systems and Services Workshop (NKOS), 2006, 2007, 2008, 2009, 2010, 2011

International Workshop on Web Semantics (WebS), 2004-2013