Estimation of the quality based on absence/presence of words in the TagParser lexicon.

year	nb of papers from the metadata	nb of papers in PDF	nb of papers in XML (= output of PDFBox)	nb of non empty papers as extraction result	nb of papers with an abstract (from extraction)	nb of papers with references (from extraction)	nb of unknown words	nb of known words	nb of words of the content	evaluation of noise = pourcentage of nb of known words / nb of words of the content	evaluation of silence = pourcentage of non empty papers as extraction result / PDF docs	combined evaluation of noise and silence	nb of English papers	nb of French papers	nb of papers in another language (es+de+ru)
1995	36	36	36	36	0	0	18	800	818	97.800	100.000	98.888	36	0	0
2005	105	105	103	103	89	101	8133	346759	354892	97.708	98.095	97.901	103	0	0
2007	115	115	115	115	105	111	10848	405213	416061	97.393	100.000	98.679	115	0	0
2009	104	104	103	103	88	83	6067	320144	326211	98.140	99.038	98.587	103	0	0
2011	107	107	107	107	107	106	10400	397882	408282	97.453	100.000	98.710	107	0	0
2013	88	88	87	87	82	82	7831	293318	301149	97.400	98.864	98.126	87	0	0
2015	101	101	95	95	84	84	8188	312343	320531	97.445	94.059	95.723	94	0	1
total	656	656	646	646	555	567	51485	2076459	2127944	97.581	98.476	98.026	645	0	1

Note#1: the unknowns with initiale lower-case letter denote some full-size unknown words but often words which have been cut due to bad interpreation of PDF multi-columns by PDFBox.

Note#2: a paper without any content (i.e. without any body) is not taken in the pipeline. This situation may be the consequence of a processing problem or may be it is an invited presentation without any text.In constrast, a paper without abstract is taken.

Note#3: a paper without any content holds an entry in the metadata, and normally, each paper has an entry in the metadata.

Note#4: the combined evaluation is computed as: 2*EvalNoise*EvalSilence / EvalNoise+EvalSilence

total elapsed time (read and display included)= 1.0520166666666666 minutes with 8 cores

computed on: Sun Sep 25 19:25:07 CEST 2016 from: METADONNEES_LTC_250614.txt