In this project, we introduced a complete set of tools for knowledge extraction from geo-referenced data. The tools can be applied to any collection of geo-referenced datasets and take care of every stage of the data mining process: data gathering, data storage, data linking, and finally data analysis and visualization. Flexible data storage is provided by an SQL database. Geography is used as a way to link geo-referenced data. Finally, data mining tools from Matlab are implemented into a complete data mining pipeline for analysis and visualization of linked data. The pipeline is designed to be user-friendly, so that it can be readily used by non-experts to analyze any linked data. It extracts knowledge from linked data using clustering techniques, principal component analysis and linear regression. The outcomes of the methods are validated through statistical analysis and visualized in order to ease their interpretation.

The proposed tools were validated by applying them to the case studies of Bristol, Bath and England. By gathering, linking and analyzing large amounts of data, significant relations among social features were extracted, as well as significant patterns in the geographical distribution of social characteristics of the analyzed populations. In particular, city level analysis determined "demographic borders" which separated areas with extremely different social characteristics. The borders included both natural (e.g. river Avon) and man-made (e.g. roads, rail tracks) characteristics of Bristol and Bath. Some of the borders are still present in today's landscape of the two cities, whereas others do not exist anymore (e.g. old Bristol city boundary) and have only left their footprint in the patterns of social characteristics.

Author: Kira Kowalska

Supervisor: Prof. Nello Cristianini