Evaluation Campaign Research Articles

Determining some demographics about the author of a document (e.g., gender, age) has attracted many studies during the last decade. To solve this author profiling task, various classification models have been proposed based on stylistic features (e.g., function word frequencies, n-gram of letters or words, POS distributions), as well as various vocabulary richness or overall stylistic measures. To determine the targeted category, different distance measures have been suggested without one approach clearly dominating all others. In this paper, 24 distance measures are studied, extracted from five general families of functions. Moreover, six theoretical properties are presented and we show that the Tanimoto or Matusita distance measures respect all proposed properties. To complement this analysis, 13 test collections extracted from the last CLEF evaluation campaigns are employed to evaluate empirically the effectiveness of these distance measures. This test set covers four languages (English, Spanish, Dutch, and Italian), four text genres (blogs, tweets, reviews, and social media) with respect to two genders and between four to five age groups. The empirical evaluations indicate that the Canberra or Clark distance measures tend to produce better effectiveness than the rest, at least in the context of an author profiling task. Moreover, our experiments indicate that having a training set closely related to the test set (e.g., the same collection) has a clear impact on the overall performance. The gender accuracy rate is decreased by 7% (19% for the age) when using the same text genre during the training compared to using the same collection (leaving-one-out methodology). Employing a different text genre in the training and in the test phases tends to hurt the overall performance, showing a decrease of the final accuracy rate of around 11% for the gender classification to 26% for the age.

Read full abstract

Abstract. The accuracy of solar radiation measurements, for direct (DIR) and diffuse (DIF) radiation, depends significantly on the precision of the operational Sun-tracking device. Thus, rigid targets for instrument performance and operation have been specified for international monitoring networks, e.g., the Baseline Surface Radiation Network (BSRN) operating under the auspices of the World Climate Research Program (WCRP). Sun-tracking devices that fulfill these accuracy requirements are available from various instrument manufacturers; however, none of the commercially available systems comprise an automatic accuracy control system allowing platform operators to independently validate the pointing accuracy of Sun-tracking sensors during operation. Here we present KSO-STREAMS (KSO-SunTRackEr Accuracy Monitoring System), a fully automated, system-independent, and cost-effective system for evaluating the pointing accuracy of Sun-tracking devices. We detail the monitoring system setup, its design and specifications, and the results from its application to the Sun-tracking system operated at the Kanzelhöhe Observatory (KSO) Austrian radiation monitoring network (ARAD) site. The results from an evaluation campaign from March to June 2015 show that the tracking accuracy of the device operated at KSO lies within BSRN specifications (i.e., 0.1° tracking accuracy) for the vast majority of observations (99.8 %). The evaluation of manufacturer-specified active-tracking accuracies (0.02°), during periods with direct solar radiation exceeding 300 W m−2, shows that these are satisfied in 72.9 % of observations. Tracking accuracies are highest during clear-sky conditions and on days where prevailing clear-sky conditions are interrupted by frontal movement; in these cases, we obtain the complete fulfillment of BSRN requirements and 76.4 % of observations within manufacturer-specified active-tracking accuracies. Limitations to tracking surveillance arise during overcast conditions and periods of partial solar-limb coverage by clouds. On days with variable cloud cover, 78.1 % (99.9 %) of observations meet active-tracking (BSRN) accuracy requirements while for days with prevailing overcast conditions these numbers reduce to 64.3 % (99.5 %).

Read full abstract

Evaluation Campaign Research Articles

Articles published on Evaluation Campaign

A study about the future evaluation of Question-Answering systems

Evaluating Performance of Containerized IoT Services for Clustered Devices at the Network Edge

A wikipedia-based semantic relatedness framework for effective dimensions classification in online reputation management

Rehabilitation-Oriented Serious Game Development and Evaluation Guidelines for Musculoskeletal Disorders.

EVALITA Goes Social: Tasks, Data, and Community at the 2016 Edition

Distance measures in author profiling

Blind Speech Separation and Enhancement With GCC-NMF

An automated method for the evaluation of the pointing accuracy of Sun-tracking devices

Developing a benchmark for emotional analysis of music.

The European Lead Factory: A Blueprint for Public-Private Partnerships in Early Drug Discovery.

Predicting the Best System Parameter Configuration: the (Per Parameter Learning) PPL method

Estimating the Structural Segmentation of Popular Music Pieces Under Regularity Constraints

Enhancing instance search with weak geometric correlation consistency

Feasibility study of a serious game based on Kinect system for functional rehabilitation of the lower limbs

GridBox pilot project

Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription

A Large-Scale Open-Source Acoustic Simulator for Speaker Recognition

A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research

Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

A simple and efficient algorithm for authorship verification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Evaluation Campaign Research Articles

Articles published on Evaluation Campaign

A study about the future evaluation of Question-Answering systems

Evaluating Performance of Containerized IoT Services for Clustered Devices at the Network Edge

A wikipedia-based semantic relatedness framework for effective dimensions classification in online reputation management

Rehabilitation-Oriented Serious Game Development and Evaluation Guidelines for Musculoskeletal Disorders.

EVALITA Goes Social: Tasks, Data, and Community at the 2016 Edition

Distance measures in author profiling

Blind Speech Separation and Enhancement With GCC-NMF

An automated method for the evaluation of the pointing accuracy of Sun-tracking devices

Developing a benchmark for emotional analysis of music.

The European Lead Factory: A Blueprint for Public-Private Partnerships in Early Drug Discovery.

Predicting the Best System Parameter Configuration: the (Per Parameter Learning) PPL method

Estimating the Structural Segmentation of Popular Music Pieces Under Regularity Constraints

Enhancing instance search with weak geometric correlation consistency

Feasibility study of a serious game based on Kinect system for functional rehabilitation of the lower limbs

GridBox pilot project

Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription

A Large-Scale Open-Source Acoustic Simulator for Speaker Recognition

A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research

Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

A simple and efficient algorithm for authorship verification