Abstract

We present in this paper our winning solution to Dedicated Task 1 in Nokia Mobile Data Challenge (MDC). MDC Task 1 is to infer the semantic category of a place based on the smartphone sensing data obtained at that place. We approach this task in a standard supervised learning setting: we extract discriminative features from the sensor data and use state-of-the-art classifiers (SVM, Logistic Regression and Decision Tree Family) to build classification models. We have found that feature engineering, or in other words, constructing features using human heuristics, is very effective for this task. In particular, we have proposed a novel feature engineering technique, Conditional Feature (CF), a general framework for domain-specific feature construction. In total, we have generated 2,796,200 features and in our final five submissions we use feature selection to select 100 to 2000 features. One of our key findings is that features conditioned on fine-granularity time intervals, e.g. every 30 min, are most effective. Our best 10-fold CV accuracy on training set is 75.1% by Gradient Boosted Trees, and the second best accuracy is 74.6% by L1-regularized Logistic Regression. Besides the good performance, we also report briefly our experience of using F# language for large-scale (∼70 GB raw text data) conditional feature construction.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.