Estimation of the quality based on absence/presence of words in the TagParser lexicon.

year	nb of papers from the metadata	nb of papers in PDF	nb of papers in XML (= output of PDFBox)	nb of non empty papers as extraction result	nb of papers with an abstract (from extraction)	nb of papers with references (from extraction)	nb of unknown words	nb of known words	nb of words of the content	evaluation of noise = pourcentage of nb of known words / nb of words of the content	evaluation of silence = pourcentage of non empty papers as extraction result / PDF docs	combined evaluation of noise and silence	nb of English papers	nb of French papers	nb of papers in another language (es+de+ru)
2005	137	137	135	135	129	131	9041	575669	584710	98.454	98.540	98.497	135	0	0
2008	144	144	143	143	135	138	11094	580328	591422	98.124	99.306	98.711	143	0	0
2009	214	214	214	214	212	214	17484	963890	981374	98.218	100.000	99.101	214	0	0
2011	176	176	176	176	176	176	16993	922612	939605	98.191	100.000	99.087	176	0	0
2013	199	199	199	199	198	199	17097	881479	898576	98.097	100.000	99.040	199	0	0
2015	318	318	317	317	314	317	34273	1587453	1621726	97.887	99.686	98.778	317	0	0
total	1188	1188	1184	1184	1164	1175	105982	5511431	5617413	98.113	99.663	98.882	1184	0	0

Note#1: the unknowns with initiale lower-case letter denote some full-size unknown words but often words which have been cut due to bad interpreation of PDF multi-columns by PDFBox.

Note#2: a paper without any content (i.e. without any body) is not taken in the pipeline. This situation may be the consequence of a processing problem or may be it is an invited presentation without any text.In constrast, a paper without abstract is taken.

Note#3: a paper without any content holds an entry in the metadata, and normally, each paper has an entry in the metadata.

Note#4: the combined evaluation is computed as: 2*EvalNoise*EvalSilence / EvalNoise+EvalSilence

total elapsed time (read and display included)= 2.864633333333333 minutes with 8 cores

computed on: Sun Sep 25 18:42:42 CEST 2016 from: METADONNEES_IJCNLP_140514.txt