Estimation of the quality based on absence/presence of words in the TagParser lexicon.

year	nb of papers from the metadata	nb of papers in PDF	nb of papers in XML (= output of PDFBox)	nb of non empty papers as extraction result	nb of papers with an abstract (from extraction)	nb of papers with references (from extraction)	nb of unknown words	nb of known words	nb of words of the content	evaluation of noise = pourcentage of nb of known words / nb of words of the content	evaluation of silence = pourcentage of non empty papers as extraction result / PDF docs	combined evaluation of noise and silence	nb of English papers	nb of French papers	nb of papers in another language (es+de+ru)
1996	32	32	32	32	4	24	3358	142994	146352	97.706	100.000	98.839	32	0	0
2000	40	40	40	40	8	26	5961	186473	192434	96.902	100.000	98.427	39	0	1
2006	22	22	22	22	18	21	1523	77848	79371	98.081	100.000	99.031	22	0	0
2008	40	40	40	40	33	39	1899	139560	141459	98.658	100.000	99.324	40	0	0
2010	37	37	37	37	33	36	2676	147789	150465	98.222	100.000	99.103	37	0	0
2012	28	28	28	28	23	24	1692	89662	91354	98.148	100.000	99.065	28	0	0
2014	28	28	28	28	28	28	2274	107495	109769	97.928	100.000	98.953	28	0	0
total	227	227	227	227	147	198	19383	891821	911204	97.873	100.000	98.925	226	0	1

Note#1: the unknowns with initiale lower-case letter denote some full-size unknown words but often words which have been cut due to bad interpreation of PDF multi-columns by PDFBox.

Note#2: a paper without any content (i.e. without any body) is not taken in the pipeline. This situation may be the consequence of a processing problem or may be it is an invited presentation without any text.In constrast, a paper without abstract is taken.

Note#3: a paper without any content holds an entry in the metadata, and normally, each paper has an entry in the metadata.

Note#4: the combined evaluation is computed as: 2*EvalNoise*EvalSilence / EvalNoise+EvalSilence

total elapsed time (read and display included)= 0.5173 minutes with 8 cores

computed on: Sun Sep 25 18:43:13 CEST 2016 from: METADONNEES_INLG_140515.txt