Статьи

neo4j: извлечение подграфа в качестве матрицы смежности и вычисление центральности собственного вектора с помощью JBLAS

Ранее на этой неделе я написал сообщение в блоге, показывающее,  как вычислить центральность собственных векторов матрицы смежности  с использованием  JBLAS,  а следующим шагом была разработка центральности собственных векторов подграфа neo4j.

Было сделано 3 шага:

  1. Экспорт подграфа neo4j в качестве матрицы смежности
  2. Запустите JBLAS, чтобы получить оценки центральности собственного вектора для каждого узла
  3. Запишите эти оценки обратно в neo4j

Я решил использовать  набор данных Пола Ревера  из  блога Киран Хили,  который состоит из людей и групп, членами которых они являются.

Скрипт  для импорта данных на  моей вилке хранилищ почитают .

Импортировав данные, следующим шагом было написать запрос на шифрование, который дал бы мне людей в матрице смежности  с числом в каждом пересечении столбцов / строк, показывающим, сколько общих групп было у этой пары людей.

Я подумал, что будет проще построить этот запрос постепенно, поэтому я начал писать запрос, который будет возвращать одну строку матрицы смежности:

MATCH p1:Person, p2:Person
WHERE p1.name = "Paul Revere"
WITH p1, p2
MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2
 
WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links
ORDER BY p2
RETURN p1, COLLECT(links) AS row

Here we start with Paul Revere and then find the relationships between him and every other person by way of a common group membership.

We use an optional relationship since we need to include a value in each column/row of our adjacency matrix we need to return a 0 value for anyone he doesn’t intersect with.

If we run that query we get back the following:

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| p1            | row                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "Paul Revere" | [2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,3,1,1,1,1,1,1,3,3,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,2,1,1,2,1,2,1,1,1,1,1,0,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,2,1,1,1,1,1,1,2,1,3,1,3,2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,0,1,0,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,2,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,3,1,1,1,1,3,1,1,1,1,0,1,2,1,1,1,1,1,1,1] |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

As it turns outs we’ve only got to remove the WHERE clause and order everybody and we’ve get the adjacency matrix for everyone:

MATCH p1:Person, p2:Person
WITH p1, p2
MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2
 
WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links
ORDER BY p2
RETURN p1, COLLECT(links) AS row
ORDER BY p1
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| p1                      | row                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "Abiel Ruddock"         | [0,1,1,1,0,1,0,1,0,0,1,1,1,0,1,2,0,1,0,1,1,1,2,2,1,0,0,1,1,0,1,1,1,1,1,0,0,0,0,1,1,0,0,2,2,0,0,1,1,2,1,1,1,0,1,0,1,1,0,0,2,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,1,1,1,1,1,1,1,0,2,1,2,1,0,0,0,0,1,1,0,1,0,0,1,0,2,0,0,1,0,0,0,1,0,0,2,0,1,0,1,1,1,0,0,1,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,1,1,0,1,1,1,2,0,0,1,1,0,0,2,0,1,2,1,1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,2,1,0,1,1,1,1,1,0,0,1,1,0,0,0,0,1,0,1,1,0,0,1,0,0,2,1,0,0,1,1,1,1,0,1,0,0,0,1,0,1,0,1,1,0,0,1,0,1,0,1,0,0,1,0,2,1,1,0,0,2,0,1,0,0,0,0,1,0,1,0,1,0,1,0] |
| "Abraham Hunt"          | [1,0,1,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,1,1,0,1,1,0,0,0,1,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,1,1,0,1,0,1,1,1,1,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,1,0] |
...
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
254 rows
9897 ms

The next step was to wire up the query results with the JBLAS code that I wrote in the previous post. I ended up with the following:

public class Neo4jAdjacencyMatrixSpike {
    public static void main(String[] args) throws SQLException {
        ClientResponse response = client()
                .resource("http://localhost:7474/db/data/cypher")
                .entity(queryAsJson(), MediaType.APPLICATION_JSON)
                .accept(MediaType.APPLICATION_JSON)
                .post(ClientResponse.class);
 
        JsonNode result = response.getEntity(JsonNode.class);
        ArrayNode rows = (ArrayNode) result.get("data");
 
        List<Double> principalEigenvector = JBLASSpike.getPrincipalEigenvector(new DoubleMatrix(asMatrix(rows)));
 
        List<Person> people = asPeople(rows);
        updatePeopleWithEigenvector(people, principalEigenvector);
 
        System.out.println(sort(people).take(10));
    }
 
    private static double[][] asMatrix(ArrayNode rows) {
        double[][] matrix = new double[rows.size()][254];
        int rowCount = 0;
 
        for (JsonNode row : rows) {
            ArrayNode matrixRow = (ArrayNode) row.get(2);
 
            double[] rowInMatrix = new double[254];
            matrix[rowCount] = rowInMatrix;
            int columnCount = 0;
            for (JsonNode jsonNode : matrixRow) {
                matrix[rowCount][columnCount] = jsonNode.asInt();
                columnCount++;
            }
 
            rowCount++;
        }
        return matrix;
    }
 
    // rest cut for brevity
}

Here we are taking the query and then converting it into an array of arrays before passing it to our JBLAS code to calculate the principal eigenvector. We then return the top 10 people:

Person{name='William Cooper', eigenvector=0.172604992239612, nodeId=68},
Person{name='Nathaniel Barber', eigenvector=0.17260499223961198, nodeId=18},
Person{name='John Hoffins', eigenvector=0.17260499223961195, nodeId=118},
Person{name='Paul Revere', eigenvector=0.17171142003936804, nodeId=207},
Person{name='Caleb Davis', eigenvector=0.16383970722169897, nodeId=71},
Person{name='Caleb Hopkins', eigenvector=0.16383970722169897, nodeId=121},
Person{name='Henry Bass', eigenvector=0.16383970722169897, nodeId=21},
Person{name='Thomas Chase', eigenvector=0.16383970722169897, nodeId=54},
Person{name='William Greenleaf', eigenvector=0.16383970722169897, nodeId=104},
Person{name='Edward Proctor', eigenvector=0.15600043886738055, nodeId=201}

I get back the same 10 people as Kieran Healy although they have different eigenvector values. As far as I understand the absolute value doesn’t matter, what’s more important is the relative score to other people so I think we’re ok.

The final step was to write these eigenvector values back into neo4j which we can do with the following code:

    private static void updateNeo4jWithEigenvectors(List<Person> people) {
        for (Person person : people) {
            ObjectNode request = JsonNodeFactory.instance.objectNode();
            request.put("query", "START p = node({nodeId}) SET p.eigenvectorCentrality={value}");
 
            ObjectNode params = JsonNodeFactory.instance.objectNode();
            params.put("nodeId", person.nodeId);
            params.put("value", person.eigenvector);
 
            request.put("params", params);
 
            client()
                    .resource("http://localhost:7474/db/data/cypher")
                    .entity(request, MediaType.APPLICATION_JSON)
                    .accept(MediaType.APPLICATION_JSON)
                    .post(ClientResponse.class);
        }
    }

Now we might use that eigenvector centrality value in other queries, such as one to show who the most central/potentially influential people are in each group:

MATCH g:Group<-[:MEMBER_OF]-p
 
WITH g.name AS group, p.name AS personName, p.eigenvectorCentrality as eigen
ORDER BY eigen DESC
 
WITH group, COLLECT(personName) AS people
RETURN group, HEAD(people) + [HEAD(TAIL(people))] + [HEAD(TAIL(TAIL(people)))] AS mostCentral
+--------------------------------------------------------------------------+
| group             | mostCentral                                          |
+--------------------------------------------------------------------------+
| "StAndrewsLodge"  | ["Paul Revere","Joseph Warren","Thomas Urann"]       |
| "BostonCommittee" | ["William Cooper","Nathaniel Barber","John Hoffins"] |
| "LoyalNine"       | ["Caleb Hopkins","William Greenleaf","Caleb Davis"]  |
| "LondonEnemies"   | ["William Cooper","Nathaniel Barber","John Hoffins"] |
| "LongRoomClub"    | ["Paul Revere","John Hancock","Benjamin Clarke"]     |
| "NorthCaucus"     | ["William Cooper","Nathaniel Barber","John Hoffins"] |
| "TeaParty"        | ["William Cooper","Nathaniel Barber","John Hoffins"] |
+--------------------------------------------------------------------------+
7 rows
280 ms

Our top ten feature frequently although it’s interesting that only one of them is in the ‘LongRoomClub’ group which perhaps indicates that people in that group are less likely to be members of the other ones.

I’d be interested if anyone can think of other potential uses for eigenvector centrality once we’ve got it back in the graph.

All the code described in this post is on github if you want to take it for a spin.