24-26 Sep 2019 Berlin (Germany)

The DARIAH Code Sprint 2019

Overall information

The DARIAH code sprint will again be organised by the DESIR project, an offspring project of DARIAH-EU tasked with developing sustainability approaches for the DARIAH research infrastructure in terms of technological and organisational matters. The code sprint is an opportunity to bring together interested developers, DH-affiliated people, not only from the wide DARIAH community.

We will have three tracks approaching the wider topic of bibliographical metadata from three angles: extraction of data from PDFs (GROBID), the import and processing of data applying Bibsonomy and the visualisation of this data. The connecting brace is to work with the same bibliographical data through the process and to improve the interoperability within the tools.

Although this is already our second code sprint it is not exclusively addressed to participants of the first code sprint. Everyone is welcome! An affiliation to coding in the DH or in general technological discussions would be helpful.

The DARIAH Code Sprint 2019 will take place in Berlin from 24 to 26 September 2019. More detailed information about the location can be found here.

The documentation of the first DARIAH Code Sprint can be found on this page under Documentation of the first Code Sprint.

 

Track descriptions

 

Track A: Extraction of bibliographical data and citations from PDF applying GROBID

As a result of the first Code Sprint that was organised last year (2018) by the DESIR project, this track has successfully built a tool covering the following functionalities:
1. Citation extraction of PDF files using GROBID;
2. Visualisation of extracted information directly on the PDF  files. This visualization is intended to highlight important information on scientific articles (e.g., authors, title, tables, figures, keywords);
3. Inclusion of some additional information from external services (e.g., affiliation disambiguation, named entity recognition);
4. Integration of all extracted data on the PDF files as usable viewers.
By browsing the tool url, users will be given some ideas of how this tool works: Firstly, users need to upload any scientific article in Pdf format; Then, click the service buttons as needed to see the highlighted results that show:
- bibliographical extraction results;
- affiliation processing results;
- named-entity recognition.
For the second sprint code, the idea of adding features and capabilities to the demonstrator will be our focus. For example, article authors as results of the Grobid extraction process will be able to refer to the digital researcher identifier (e.g., ORCID identifier). Track A invites participants to give creative ideas and to be part of our project.

 

 Track B: Automatic Import of Bibliographic Data into BibSonomy

In this track we aim to extend the tool for automatic import of bibliographic metadata into BibSonomy. The first version of the tool was created at the DESIR workshop 2018. Currently, users can upload a pdf file and have metadata automatically extracted using GROBID. In a further step, users can correct the metadata and save it to BibSonomy. We want to extend the tool by adding further features:
- Metadata extraction from text files,
- Individual user login for BibSonomy,
- Improved User Interface,
- API.
Feel free to come up with your own ideas for improvement. We are looking forward to actively discuss all ideas in the beginning of the code sprint.

 

Track C: Visualisation of time dependent graphs of relations

One of the major substantial outcomes of the previous DESIR Code Sprint Track-C was the novel generic concept of time dependent graphs of relations and its visual presentation. Examples of such graphs may be co-authorship and citation graphs, genealogy trees, or characters interaction graphs. From the visual perspective both the structure and time characteristics of such graphs play a significant analytical role. Our web-based tool developed throughout DESIR project now holds a functionality of visualizing bibliographical datasets (e.g imported via BibSonomy API or loaded from a file), on top of the generic data model. Within this Code Sprint we will focus on the extension of our tool both towards new data formats and use cases, as well as new visual forms. The participants will have the opportunity to work on the mapping of different data to the generic model of our graphs and/or on the translation of data formats to intermediate RDF description (subject-predicate-object). Bring-Your-Own-Data model is encouraged. New visual forms will cover the modification of web application user interface to include additional visualizations of metadata or aggregated information. Experience in Java and/or Javascript programming is recommended.

Online user: 1