year | nb of papers from the metadata | nb of papers in PDF | nb of papers in XML (= output of PDFBox) | nb of non empty papers as extraction result | nb of papers with an abstract (from extraction) | nb of papers with references (from extraction) | nb of unknown words | nb of known words | nb of words of the content | evaluation of noise = pourcentage of nb of known words / nb of words of the content | evaluation of silence = pourcentage of non empty papers as extraction result / PDF docs | combined evaluation of noise and silence | nb of English papers | nb of French papers | nb of papers in another language (es+de+ru) |
2005 | 19 | 19 | 19 | 19 | 15 | 18 | 2759 | 143201 | 145960 | 98.110 | 100.000 | 99.046 | 19 | 0 | 0 |
2006 | 26 | 26 | 26 | 26 | 24 | 25 | 3576 | 173193 | 176769 | 97.977 | 100.000 | 98.978 | 26 | 0 | 0 |
2007 | 22 | 22 | 22 | 22 | 22 | 22 | 3266 | 178879 | 182145 | 98.207 | 100.000 | 99.095 | 22 | 0 | 0 |
2008 | 23 | 23 | 23 | 23 | 21 | 23 | 2823 | 160420 | 163243 | 98.271 | 100.000 | 99.128 | 23 | 0 | 0 |
2009 | 20 | 20 | 20 | 20 | 19 | 20 | 3205 | 169121 | 172326 | 98.140 | 100.000 | 99.061 | 20 | 0 | 0 |
2010 | 20 | 20 | 19 | 19 | 18 | 19 | 4008 | 164594 | 168602 | 97.623 | 95.000 | 96.294 | 19 | 0 | 0 |
2011 | 26 | 26 | 26 | 26 | 24 | 26 | 5167 | 207070 | 212237 | 97.565 | 100.000 | 98.768 | 26 | 0 | 0 |
2012 | 33 | 33 | 33 | 33 | 32 | 31 | 4514 | 288702 | 293216 | 98.461 | 100.000 | 99.224 | 33 | 0 | 0 |
2013 | 56 | 56 | 56 | 56 | 52 | 56 | 10208 | 541848 | 552056 | 98.151 | 100.000 | 99.067 | 56 | 0 | 0 |
2014 | 31 | 31 | 31 | 31 | 29 | 30 | 6126 | 291911 | 298037 | 97.945 | 100.000 | 98.962 | 31 | 0 | 0 |
2015 | 32 | 32 | 32 | 32 | 32 | 32 | 7431 | 349471 | 356902 | 97.918 | 100.000 | 98.948 | 32 | 0 | 0 |
total | 308 | 308 | 307 | 307 | 288 | 302 | 53083 | 2668410 | 2721493 | 98.049 | 99.675 | 98.856 | 307 | 0 | 0 |