year | nb of papers from the metadata | nb of papers in PDF | nb of papers in XML (= output of PDFBox) | nb of non empty papers as extraction result | nb of papers with an abstract (from extraction) | nb of papers with references (from extraction) | nb of unknown words | nb of known words | nb of words of the content | evaluation of noise = pourcentage of nb of known words / nb of words of the content | evaluation of silence = pourcentage of non empty papers as extraction result / PDF docs | combined evaluation of noise and silence | nb of English papers | nb of French papers | nb of papers in another language (es+de+ru) |
2004 | 1 | 1 | 1 | 1 | 0 | 1 | 166 | 8361 | 8527 | 98.053 | 100.000 | 99.017 | 1 | 0 | 0 |
2005 | 5 | 5 | 5 | 5 | 0 | 5 | 819 | 56537 | 57356 | 98.572 | 100.000 | 99.281 | 5 | 0 | 0 |
2006 | 7 | 7 | 7 | 7 | 0 | 7 | 1247 | 80964 | 82211 | 98.483 | 100.000 | 99.236 | 7 | 0 | 0 |
2007 | 12 | 12 | 12 | 12 | 0 | 12 | 2775 | 146159 | 148934 | 98.137 | 100.000 | 99.060 | 12 | 0 | 0 |
2008 | 3 | 3 | 3 | 3 | 0 | 3 | 1043 | 38501 | 39544 | 97.362 | 100.000 | 98.664 | 3 | 0 | 0 |
2009 | 2 | 2 | 2 | 2 | 0 | 2 | 284 | 24987 | 25271 | 98.876 | 100.000 | 99.435 | 2 | 0 | 0 |
2010 | 3 | 3 | 3 | 3 | 0 | 3 | 636 | 29316 | 29952 | 97.877 | 100.000 | 98.927 | 3 | 0 | 0 |
2011 | 20 | 20 | 20 | 20 | 0 | 20 | 3803 | 224326 | 228129 | 98.333 | 100.000 | 99.159 | 20 | 0 | 0 |
2012 | 8 | 8 | 8 | 8 | 0 | 8 | 1853 | 103243 | 105096 | 98.237 | 100.000 | 99.111 | 8 | 0 | 0 |
2013 | 21 | 21 | 21 | 21 | 0 | 21 | 4995 | 302824 | 307819 | 98.377 | 100.000 | 99.182 | 21 | 0 | 0 |
total | 82 | 82 | 82 | 82 | 0 | 82 | 17621 | 1015218 | 1032839 | 98.294 | 100.000 | 99.140 | 82 | 0 | 0 |