NY Times

This is the bipartite document–word dataset of NY Times. Left nodes are documents and right nodes are words. Edge weights are multiplicities.


Internal namebag-nytimes
NameNY Times
Data sourcehttp://archive.ics.uci.edu/ml/datasets/Bag+of+Words
AvailabilityDataset is available for download
Consistency checkDataset passed all tests
Text network
Node meaningDocument, word
Edge meaningOccurrence
Network formatBipartite, undirected
Edge typeUnweighted, multiple edges


Size n =401,388
Left size n1 =299,752
Right size n2 =101,636
Volume m =99,542,125
Unique edge count m̿ =69,679,427
Wedge count s =621,479,000,671
Claw count z =8,494,942,350,924,751
Maximum degree dmax =108,622
Maximum left degree d1max =2,017
Maximum right degree d2max =108,622
Average degree d =495.990
Average left degree d1 =332.082
Average right degree d2 =979.398
Average edge multiplicity m̃ =1.428 57
Size of LCC N =401,388
Diameter δ =7
50-Percentile effective diameter δ0.5 =1.873 39
90-Percentile effective diameter δ0.9 =2.889 94
Median distance δM =2
Mean distance δm =2.486 43
Balanced inequality ratio P =0.296 549
Left balanced inequality ratio P1 =0.403 163
Right balanced inequality ratio P2 =0.122 051
Power law exponent γ =1.196 59
Degree assortativity ρ =−0.053 058 2
Degree assortativity p-value pρ =0.000 00
Spectral separation 1[A] / λ2[A]| =1.852 81


Degree distribution

Cumulative degree distribution

Lorenz curve

Spectral distribution of the adjacency matrix

Spectral distribution of the normalized adjacency matrix

Spectral distribution of the Laplacian

Spectral graph drawing based on the adjacency matrix

Spectral graph drawing based on the normalized adjacency matrix

Zipf plot

Hop distribution

Edge weight/multiplicity distribution

Matrix decompositions plots



[1] Jérôme Kunegis. KONECT – The Koblenz Network Collection. In Proc. Int. Conf. on World Wide Web Companion, pages 1343–1350, 2013. [ http ]
[2] M. Lichman. UCI Machine Learning Repository, 2013. [ http ]