Ранее на этой неделе я написал сообщение в блоге, показывающее, как вычислить центральность собственных векторов матрицы смежности с использованием JBLAS, а следующим шагом была разработка центральности собственных векторов подграфа neo4j.
Было сделано 3 шага:
- Экспорт подграфа neo4j в качестве матрицы смежности
- Запустите JBLAS, чтобы получить оценки центральности собственного вектора для каждого узла
- Запишите эти оценки обратно в neo4j
Я решил использовать набор данных Пола Ревера из блога Киран Хили, который состоит из людей и групп, членами которых они являются.
Скрипт для импорта данных на моей вилке хранилищ почитают .
Импортировав данные, следующим шагом было написать запрос на шифрование, который дал бы мне людей в матрице смежности с числом в каждом пересечении столбцов / строк, показывающим, сколько общих групп было у этой пары людей.
Я подумал, что будет проще построить этот запрос постепенно, поэтому я начал писать запрос, который будет возвращать одну строку матрицы смежности:
MATCH p1:Person, p2:Person WHERE p1.name = "Paul Revere" WITH p1, p2 MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2 WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links ORDER BY p2 RETURN p1, COLLECT(links) AS row |
Here we start with Paul Revere and then find the relationships between him and every other person by way of a common group membership.
We use an optional relationship since we need to include a value in each column/row of our adjacency matrix we need to return a 0 value for anyone he doesn’t intersect with.
If we run that query we get back the following:
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | p1 | row | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Paul Revere" | [2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,3,1,1,1,1,1,1,3,3,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,2,1,1,2,1,2,1,1,1,1,1,0,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,2,1,1,1,1,1,1,2,1,3,1,3,2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,0,1,0,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,2,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,3,1,1,1,1,3,1,1,1,1,0,1,2,1,1,1,1,1,1,1] | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |
As it turns outs we’ve only got to remove the WHERE clause and order everybody and we’ve get the adjacency matrix for everyone:
MATCH p1:Person, p2:Person WITH p1, p2 MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2 WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links ORDER BY p2 RETURN p1, COLLECT(links) AS row ORDER BY p1 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | p1 | row | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Abiel Ruddock" | [0,1,1,1,0,1,0,1,0,0,1,1,1,0,1,2,0,1,0,1,1,1,2,2,1,0,0,1,1,0,1,1,1,1,1,0,0,0,0,1,1,0,0,2,2,0,0,1,1,2,1,1,1,0,1,0,1,1,0,0,2,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,1,1,1,1,1,1,1,0,2,1,2,1,0,0,0,0,1,1,0,1,0,0,1,0,2,0,0,1,0,0,0,1,0,0,2,0,1,0,1,1,1,0,0,1,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,1,1,0,1,1,1,2,0,0,1,1,0,0,2,0,1,2,1,1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,2,1,0,1,1,1,1,1,0,0,1,1,0,0,0,0,1,0,1,1,0,0,1,0,0,2,1,0,0,1,1,1,1,0,1,0,0,0,1,0,1,0,1,1,0,0,1,0,1,0,1,0,0,1,0,2,1,1,0,0,2,0,1,0,0,0,0,1,0,1,0,1,0,1,0] | | "Abraham Hunt" | [1,0,1,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,1,1,0,1,1,0,0,0,1,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,1,1,0,1,0,1,1,1,1,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,1,0] | ... +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 254 rows 9897 ms |
The next step was to wire up the query results with the JBLAS code that I wrote in the previous post. I ended up with the following:
public class Neo4jAdjacencyMatrixSpike { public static void main(String[] args) throws SQLException { ClientResponse response = client() .resource("http://localhost:7474/db/data/cypher") .entity(queryAsJson(), MediaType.APPLICATION_JSON) .accept(MediaType.APPLICATION_JSON) .post(ClientResponse.class); JsonNode result = response.getEntity(JsonNode.class); ArrayNode rows = (ArrayNode) result.get("data"); List<Double> principalEigenvector = JBLASSpike.getPrincipalEigenvector(new DoubleMatrix(asMatrix(rows))); List<Person> people = asPeople(rows); updatePeopleWithEigenvector(people, principalEigenvector); System.out.println(sort(people).take(10)); } private static double[][] asMatrix(ArrayNode rows) { double[][] matrix = new double[rows.size()][254]; int rowCount = 0; for (JsonNode row : rows) { ArrayNode matrixRow = (ArrayNode) row.get(2); double[] rowInMatrix = new double[254]; matrix[rowCount] = rowInMatrix; int columnCount = 0; for (JsonNode jsonNode : matrixRow) { matrix[rowCount][columnCount] = jsonNode.asInt(); columnCount++; } rowCount++; } return matrix; } // rest cut for brevity } |
Here we are taking the query and then converting it into an array of arrays before passing it to our JBLAS code to calculate the principal eigenvector. We then return the top 10 people:
Person{name='William Cooper', eigenvector=0.172604992239612, nodeId=68}, Person{name='Nathaniel Barber', eigenvector=0.17260499223961198, nodeId=18}, Person{name='John Hoffins', eigenvector=0.17260499223961195, nodeId=118}, Person{name='Paul Revere', eigenvector=0.17171142003936804, nodeId=207}, Person{name='Caleb Davis', eigenvector=0.16383970722169897, nodeId=71}, Person{name='Caleb Hopkins', eigenvector=0.16383970722169897, nodeId=121}, Person{name='Henry Bass', eigenvector=0.16383970722169897, nodeId=21}, Person{name='Thomas Chase', eigenvector=0.16383970722169897, nodeId=54}, Person{name='William Greenleaf', eigenvector=0.16383970722169897, nodeId=104}, Person{name='Edward Proctor', eigenvector=0.15600043886738055, nodeId=201} |
I get back the same 10 people as Kieran Healy although they have different eigenvector values. As far as I understand the absolute value doesn’t matter, what’s more important is the relative score to other people so I think we’re ok.
The final step was to write these eigenvector values back into neo4j which we can do with the following code:
private static void updateNeo4jWithEigenvectors(List<Person> people) { for (Person person : people) { ObjectNode request = JsonNodeFactory.instance.objectNode(); request.put("query", "START p = node({nodeId}) SET p.eigenvectorCentrality={value}"); ObjectNode params = JsonNodeFactory.instance.objectNode(); params.put("nodeId", person.nodeId); params.put("value", person.eigenvector); request.put("params", params); client() .resource("http://localhost:7474/db/data/cypher") .entity(request, MediaType.APPLICATION_JSON) .accept(MediaType.APPLICATION_JSON) .post(ClientResponse.class); } } |
Now we might use that eigenvector centrality value in other queries, such as one to show who the most central/potentially influential people are in each group:
MATCH g:Group<-[:MEMBER_OF]-p WITH g.name AS group, p.name AS personName, p.eigenvectorCentrality as eigen ORDER BY eigen DESC WITH group, COLLECT(personName) AS people RETURN group, HEAD(people) + [HEAD(TAIL(people))] + [HEAD(TAIL(TAIL(people)))] AS mostCentral |
+--------------------------------------------------------------------------+ | group | mostCentral | +--------------------------------------------------------------------------+ | "StAndrewsLodge" | ["Paul Revere","Joseph Warren","Thomas Urann"] | | "BostonCommittee" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "LoyalNine" | ["Caleb Hopkins","William Greenleaf","Caleb Davis"] | | "LondonEnemies" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "LongRoomClub" | ["Paul Revere","John Hancock","Benjamin Clarke"] | | "NorthCaucus" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "TeaParty" | ["William Cooper","Nathaniel Barber","John Hoffins"] | +--------------------------------------------------------------------------+ 7 rows 280 ms |
Our top ten feature frequently although it’s interesting that only one of them is in the ‘LongRoomClub’ group which perhaps indicates that people in that group are less likely to be members of the other ones.
I’d be interested if anyone can think of other potential uses for eigenvector centrality once we’ve got it back in the graph.
All the code described in this post is on github if you want to take it for a spin.