Enron words
This is the bipartite document–word dataset of Enron words. Left nodes are
documents and right nodes are words. Edge weights are multiplicities.
Metadata
Statistics
Size  n =  67,960

Left size  n_{1} =  39,861

Right size  n_{2} =  28,099

Volume  m =  6,412,172

Unique edge count  m̿ =  3,710,420

Wedge count  s =  3,214,624,476

Claw count  z =  2,510,007,422,598

Maximum degree  d_{max} =  7,190

Maximum left degree  d_{1max} =  2,120

Maximum right degree  d_{2max} =  7,190

Average degree  d =  188.704

Average left degree  d_{1} =  160.863

Average right degree  d_{2} =  228.199

Fill  p =  0.003 312 71

Average edge multiplicity  m̃ =  1.728 15

Size of LCC  N =  67,960

Diameter  δ =  6

50Percentile effective diameter  δ_{0.5} =  2.492 21

90Percentile effective diameter  δ_{0.9} =  3.606 21

Median distance  δ_{M} =  3

Mean distance  δ_{m} =  2.992 72

Balanced inequality ratio  P =  0.224 254

Left balanced inequality ratio  P_{1} =  0.225 645

Right balanced inequality ratio  P_{2} =  0.156 346

Relative edge distribution entropy  H_{er} =  0.897 344

Power law exponent  γ =  1.269 14

Degree assortativity  ρ =  −0.174 109

Degree assortativity pvalue  p_{ρ} =  0.000 00

Plots
Matrix decompositions plots
Downloads
References
[1]

Jérôme Kunegis.
KONECT – The Koblenz Network Collection.
In Proc. Int. Conf. on World Wide Web Companion, pages
1343–1350, 2013.
[ http ]

[2]

M. Lichman.
UCI Machine Learning Repository, 2013.
[ http ]
