Newsletter #1 - SPIRIT PROTOTYPE 1
The SPIRIT project is taking a disruptive approach in the development, testing, training and evaluation of (and on) a novel system prototype, determining scalable privacy preserving intelligence analysis tools to resolve identities. The first version (v1.0) of the SPIRIT prototype, is focusing on use cases provided by six (6) SPIRIT LEAs, tested on anonymised data sets combined with developed functionality to securely trawl through open source data, behind each partner Forces’ firewall and security systems.
In October 2019, the SPIRIT prototype was successfully deployed to the Hellenic Border Police, followed by deployment in February 2020 to Thames Valley Police; additional deployment to at least four (4) LEAs has been delayed, pending either technical infrastructure or ethical restrictions at partner Forces’ sites. Basic evaluation and assessment of the current build is ongoing and the LEAs remain committed to working in partnership with technical partners to maximise SPIRIT tool capability and to scope further developmental opportunities to enhance and refine the end user experience
Integrated Ethics & Privacy Protection
SPIRIT PROTOTYPE 1 has been developed in strict compliance with the relevant ethical and legal guidelines, provisions, procedures and protocols that have been identified. The SPIRIT Consortium has followed a regulatory model with internal and external controls. The Ethical lead partner has worked closely with the SPIRIT DPO and the Ethical Advisory Board to comply with the requirements set by the Ethical Panel of the European Commission. A SPIRIT toolkit based on the implementation of a dynamic Data Protection Impact Assessment, Incidental Risks, Incidental Findings Policy, regular ethical audits, algorithm training and algorithm auditing has been put in place. These metrics and procedures to ensure the protection of citizens’ fundamental rights have been collected into the SPIRIT Handbook for legal and ethical compliance.
Keyword based refined and automated search
The Enhanced Refined Search (ERS) permits the user to define an increased set of parameters relating to the search and then further refine the initial results by selecting those that are appropriate for the investigation and should be further explored. Once satisfied with the initial results selection list the urls are available to the crawler module. The results of the searching are scraped to extract the relevant text based on examination of the html tags. This text is parsed by an open source natural language processing tool; spaCy producing a range of output to include lists of entities and noun chunks. The Automated Search is a third party/social media data acquisition service that provides information based on third party and social media APis is provided by SPIRIT. Twitter API and Google Search API have been integrated in order to download texts, images and videos to be processed by SPIRIT's analysis tools. A search can be performed starting from a twitter ID, a keyword or a list of keywords..
Content Database System
The data collected and used by the other SPIRIT tools is managed in a graph database system that has been designed for the purposes of SPIRIT PROTOTYPE 1. This system consists of three components: 1) The SPIRIT content database that contains data about investigations, about the media files discovered and used during the investigations, and about the social graph. Logically, this database is represented using the Property Graph data model. 2) The database management system used to store the content database, for which SPIRIT employs ArangoDB which is a scalable multi- model NoSQL system that supports a suitable graph database model, as well as desirable transactional properties (ACID). 3) The mediator service that provides a GraphQL API to enable the other SPIRIT tools to access the content database (queries and updates). This mediator service has been developed to provide the basis to add further abstraction layers to the graph database system. Such abstraction layers i) capture the semantics of the data via an ontology (to enable semantics-based graph analysis and sense making) and ii) include computational logic (graph traversal & analytics algorithms).
Multi purpose web crawler
A web crawler has been integrated in SPIRIT PROTOTYPE 1. The crawler adopts a master-slave architecture in order to manage incoming requests for downloading texts, images and videos from one or more Web sites. An artificial dataset was created for testing the implemented crawler. A well-known open research dataset (in the area of Named Entity Recognition) was used to generate contents of the artificial Web sites. In order to not refer real identities, all the entities of the dataset were replaced with entities from the Valcri dataset or with other dummy/fictional entities.
Graph visualization
SPIRIT PROTOTYPE 1, retrieves all data from the SPIRIT services and allows to develop interactive visualisation to see relationships and connections within their data set significantly faster. This visualization is achieved by using nodes and edges. A node represents a single data point, such as a person, a location, while an edge represents a connection between two nodes, such as a communication
Some events in which SPIRIT was present:
- TERECOM-2018 and AICOL-2018. JURIX-2018, the 31st international conference on Legal Knowledge and Information Systems, Groningen, Netherlands. Paper presented: Minimisation of Incidental Findings, and Residual Risks for Security Compliance: the SPIRIT Project
- CoU Event, in Thematic Group 4 - Cyber: Crime and Security, Brussels March 28-29th, 2019.
- Annual School and Expertise Network Day of London Metropolitan University.
- Int. Workshop on Graph Data Management Experiences & Systems (GRADES), that rook place during the ACM SIGMOD 2019 conference in Amsterdam. Paper presented: Defining Schemas for Property Graphs by using the GraphQL Schema Definition Language
- 14th meeting of the Community of Users on Secure, Safe and Resilient Societies, which it was a meeting between LEAs and scientifics, Brussels, 6th– 20th September 2019.
- European workshop on Data Security & Ethics, Brussels, 7th January 2020
See here the original Newsletter #1