year | nb of papers from the metadata | nb of papers in PDF | nb of papers in XML (= output of PDFBox) | nb of non empty papers as extraction result | nb of papers with an abstract (from extraction) | nb of papers with references (from extraction) | nb of unknown words | nb of known words | nb of words of the content | evaluation of noise = pourcentage of nb of known words / nb of words of the content | evaluation of silence = pourcentage of non empty papers as extraction result / PDF docs | combined evaluation of noise and silence | nb of English papers | nb of French papers | nb of papers in another language (es+de+ru) |
2003 | 18 | 18 | 17 | 17 | 16 | 17 | 1430 | 81772 | 83202 | 98.281 | 94.444 | 96.325 | 17 | 0 | 0 |
2004 | 21 | 21 | 20 | 20 | 20 | 20 | 1441 | 98732 | 100173 | 98.561 | 95.238 | 96.871 | 20 | 0 | 0 |
2005 | 33 | 33 | 32 | 32 | 30 | 30 | 2006 | 149547 | 151553 | 98.676 | 96.970 | 97.816 | 31 | 0 | 1 |
2006 | 27 | 27 | 26 | 26 | 22 | 23 | 1376 | 101108 | 102484 | 98.657 | 96.296 | 97.463 | 24 | 0 | 2 |
2007 | 22 | 22 | 21 | 21 | 17 | 18 | 1329 | 82062 | 83391 | 98.406 | 95.455 | 96.908 | 19 | 0 | 2 |
2008 | 21 | 21 | 21 | 21 | 20 | 20 | 1625 | 104410 | 106035 | 98.467 | 100.000 | 99.228 | 21 | 0 | 0 |
2009 | 18 | 18 | 18 | 18 | 17 | 17 | 1177 | 71499 | 72676 | 98.380 | 100.000 | 99.184 | 18 | 0 | 0 |
2010 | 15 | 15 | 15 | 15 | 13 | 13 | 947 | 60732 | 61679 | 98.465 | 100.000 | 99.226 | 15 | 0 | 0 |
2011 | 21 | 21 | 21 | 21 | 19 | 19 | 1563 | 100955 | 102518 | 98.475 | 100.000 | 99.232 | 21 | 0 | 0 |
2012 | 20 | 20 | 19 | 19 | 16 | 17 | 906 | 69630 | 70536 | 98.716 | 95.000 | 96.822 | 19 | 0 | 0 |
2013 | 21 | 21 | 21 | 21 | 19 | 19 | 1270 | 80127 | 81397 | 98.440 | 100.000 | 99.214 | 21 | 0 | 0 |
2014 | 25 | 25 | 25 | 25 | 24 | 24 | 1527 | 96494 | 98021 | 98.442 | 100.000 | 99.215 | 25 | 0 | 0 |
total | 262 | 262 | 256 | 256 | 233 | 237 | 16597 | 1097068 | 1113665 | 98.510 | 97.710 | 98.108 | 251 | 0 | 5 |