Installing INCEpTION - an Open Source tool for Linguistic Annotation

Posted on Mon 25 July 2022 in Language

In a previous post about Tarkeeb I mentioned the use of annotation software for rebuilding the Quran Corpus or building any Arabic corpus. I discussed software like Flat but then mentioned my preferred option for annotation, INCEpTION. Below I will explain how to install INCEpTION on Ubuntu 20.04(you should be able to adjust the commands to install it on most distros).

Installing Java

If you use a system container like LXC, the standard containers don't come with Java installed. This is a good thing, as it allows you to choose OpenJDK over Oracle's Java.

To install OpenJDK, do the following:

sudo apt update
sudo apt upgrade
sudo apt install openjdk-11-jdk

Verifying that Java is installed:

java --version

If it didn't work, visit here for debugging.

Downloading & Running INCEpTION

Create a directory for the standalone Java executable:

mkdir annotation/

Download INCEpTION from here. When I installed the software, it worked with version 0.18.0. However, version 0.19.3 should also work(I will update this article if it doesn't).

Save the inception-app-webapp-0.19.3-standalone.jar into the annotation/ folder.

On the Downloads page above, the instructions for running the web application are:

java -jar inception-app-webapp-0.19.3-standalone.jar

These instructions assume that you are running the executable locally and can double-click on it. This didn't work for me because I isolated the app into a container.

After a bit of yak-shaving, I learned that INCEpTION is built using the Spring Framework. To run a Spring web application as if it was a remote application and making it accessible outside a container, the following command needs to be used:

java -Djava.awt.headless=true -Dserver.port=8999 -Dserver.ip=0.0.0.0 -jar inception-app-webapp-0.19.3-standalone.jar

Binding to 0.0.0.0 and a specific port is self-explanatory. I cannot recall why, but running the application as headless was required to make it work. If you can run it without the headless command, please email me and explain how/why.

Using INCEpTION

The application should now be accessible on the container IP on the specific port selected. eg. 10.0.4.44:8999

Thereafter you can look at the User Guide for more information on how to use it.

I will be adding a future blog post about rebuilding the Quran Corpus with INCEpTION. At the moment I am manually adding some content to the corpus data. This data was added to the existing Quran Corpus and specifically relates to the Reference Nodes, Hidden Nodes and Empty Nodes.


If you don't know how to use RSS and want email updates on my new content, consider Joining my Newsletter

The original content of this blog is a Waqf solely for the Pleasure of Allah. You are hereby granted full permission to copy, download, distribute, publish and share this content without modification under condition that full attribution is given to this author by creating a link either above or below the content that links back to the original source of the content. For any questions or ambiguity, you are requested to contact me via email for clarification.