year | nb of papers from the metadata | nb of papers in PDF | nb of papers in XML (= output of PDFBox) | nb of non empty papers as extraction result | nb of papers with an abstract (from extraction) | nb of papers with references (from extraction) | nb of unknown words | nb of known words | nb of words of the content | evaluation of noise = pourcentage of nb of known words / nb of words of the content | evaluation of silence = pourcentage of non empty papers as extraction result / PDF docs | combined evaluation of noise and silence | nb of English papers | nb of French papers | nb of papers in another language (es+de+ru) |
1996 | 32 | 32 | 32 | 32 | 4 | 24 | 3358 | 142994 | 146352 | 97.706 | 100.000 | 98.839 | 32 | 0 | 0 |
2000 | 40 | 40 | 40 | 40 | 8 | 26 | 5961 | 186473 | 192434 | 96.902 | 100.000 | 98.427 | 39 | 0 | 1 |
2006 | 22 | 22 | 22 | 22 | 18 | 21 | 1523 | 77848 | 79371 | 98.081 | 100.000 | 99.031 | 22 | 0 | 0 |
2008 | 40 | 40 | 40 | 40 | 33 | 39 | 1899 | 139560 | 141459 | 98.658 | 100.000 | 99.324 | 40 | 0 | 0 |
2010 | 37 | 37 | 37 | 37 | 33 | 36 | 2676 | 147789 | 150465 | 98.222 | 100.000 | 99.103 | 37 | 0 | 0 |
2012 | 28 | 28 | 28 | 28 | 23 | 24 | 1692 | 89662 | 91354 | 98.148 | 100.000 | 99.065 | 28 | 0 | 0 |
2014 | 28 | 28 | 28 | 28 | 28 | 28 | 2274 | 107495 | 109769 | 97.928 | 100.000 | 98.953 | 28 | 0 | 0 |
total | 227 | 227 | 227 | 227 | 147 | 198 | 19383 | 891821 | 911204 | 97.873 | 100.000 | 98.925 | 226 | 0 | 1 |