PubMed

This is the bipartite document–word dataset of PubMed. Left nodes are documents and right nodes are words. Edge weights are multiplicities.

Metadata

CodePM
Internal namebag-pubmed
NamePubMed
Data sourcehttp://archive.ics.uci.edu/ml/datasets/Bag+of+Words
AvailabilityDataset is available for download
Consistency checkDataset passed all tests
Category
Text network
Node meaningDocument, word
Edge meaningOccurrence
Network formatBipartite, undirected
Edge typeUnweighted, multiple edges

Statistics

Size n =8,341,043
Left size n1 =8,200,000
Right size n2 =141,043
Volume m =737,869,083
Unique edge count m̿ =483,450,157
Wedge count s =42,676,090,519,343
Maximum degree dmax =2,323,263
Maximum left degree d1max =436
Maximum right degree d2max =2,323,263
Average degree d =176.925
Average left degree d1 =89.984 0
Average right degree d2 =5,231.52
Average edge multiplicity m̃ =1.526 26
Size of LCC N =8,341,043
50-Percentile effective diameter δ0.5 =1.822 50
90-Percentile effective diameter δ0.9 =3.731 23
Mean distance δm =2.764 15
Balanced inequality ratio P =0.283 092
Left balanced inequality ratio P1 =0.413 586
Right balanced inequality ratio P2 =0.102 089

Plots

Degree distribution

Cumulative degree distribution

Lorenz curve

Spectral distribution of the adjacency matrix

Spectral distribution of the normalized adjacency matrix

Spectral distribution of the Laplacian

Spectral graph drawing based on the adjacency matrix

Spectral graph drawing based on the normalized adjacency matrix

Hop distribution

Edge weight/multiplicity distribution

Matrix decompositions plots

Downloads

References

[1] Jérôme Kunegis. KONECT – The Koblenz Network Collection. In Proc. Int. Conf. on World Wide Web Companion, pages 1343–1350, 2013. [ http ]
[2] M. Lichman. UCI Machine Learning Repository, 2013. [ http ]