year | nb of papers from the metadata | nb of papers in PDF | nb of papers in XML (= output of PDFBox) | nb of non empty papers as extraction result | nb of papers with an abstract (from extraction) | nb of papers with references (from extraction) | nb of unknown words | nb of known words | nb of words of the content | evaluation of noise = pourcentage of nb of known words / nb of words of the content | evaluation of silence = pourcentage of non empty papers as extraction result / PDF docs | combined evaluation of noise and silence | nb of English papers | nb of French papers | nb of papers in another language (es+de+ru) |
2006 | 25 | 25 | 25 | 25 | 25 | 25 | 4058 | 256029 | 260087 | 98.440 | 100.000 | 99.214 | 4 | 21 | 0 |
2007 | 21 | 21 | 21 | 21 | 21 | 21 | 2908 | 204679 | 207587 | 98.599 | 100.000 | 99.295 | 4 | 17 | 0 |
2008 | 24 | 24 | 24 | 24 | 24 | 24 | 5451 | 270852 | 276303 | 98.027 | 100.000 | 99.004 | 3 | 21 | 0 |
2009 | 27 | 27 | 27 | 27 | 27 | 27 | 6216 | 302966 | 309182 | 97.990 | 100.000 | 98.985 | 8 | 19 | 0 |
2010 | 14 | 14 | 14 | 14 | 14 | 14 | 2828 | 154452 | 157280 | 98.202 | 100.000 | 99.093 | 2 | 12 | 0 |
2011 | 20 | 20 | 20 | 20 | 20 | 20 | 5947 | 241144 | 247091 | 97.593 | 100.000 | 98.782 | 4 | 16 | 0 |
2012 | 14 | 14 | 14 | 14 | 14 | 14 | 2903 | 164980 | 167883 | 98.271 | 100.000 | 99.128 | 7 | 7 | 0 |
2013 | 12 | 12 | 12 | 12 | 12 | 12 | 2390 | 143461 | 145851 | 98.361 | 100.000 | 99.174 | 1 | 11 | 0 |
2014 | 11 | 11 | 11 | 11 | 11 | 11 | 2344 | 126715 | 129059 | 98.184 | 100.000 | 99.084 | 0 | 11 | 0 |
2015 | 9 | 9 | 9 | 9 | 9 | 9 | 2077 | 95176 | 97253 | 97.864 | 100.000 | 98.921 | 1 | 8 | 0 |
total | 177 | 177 | 177 | 177 | 177 | 177 | 37122 | 1960454 | 1997576 | 98.142 | 100.000 | 99.062 | 34 | 143 | 0 |