Term Weighting Vs. Logistic Regression Performance on E-Commerce Data

Sajjad Salehi,Maryam Ghasdimanghootai

doi:10.14419/ijet.v7i4.35.22738

Abstract

Text categorization can become a very difficult problem to solve in many cases. However many text categorization algorithms have been developed in the history of computer science, they are not always as accurate as we expect. Some of them are highly accurate in special cases while others perform well in different cases. In this work, we are comparing two famous methods in text categorization; the first one is the well-known term weighting algorithm and the second one is the logistic regression algorithm. All the dataset is got from our previous start-up named “Ume Market Network” which was an online peer-to-peer e-commerce system, and was synchronized with Facebook sales groups. Every offer in this dataset should be categorized as a sale/purchase offer; therefore, the problem is a classical binary categorization on a text dataset of formal as well as colloquial expressions in English, Italian, and German languages. After overcoming all the ambiguities the logistic regression algorithm outperformed the term weighting algorithm by around 25% in acuracy.

Highlights

Collection Frequency FactorThere are much more detailed parameters to consider. As an example original term frequency, IDF (inverse document frequency) and IDF probability (term relevance) are considered in this method
In a nice similar research [11], Ifrim, Bakir, and weikum have shown that the logistic regression has a good impact in categorizing documents using variable length n-gram words or characters while learning involves automatic tokenization
The standard method of word tokenization is commonly used in text categorization as a means of the training set before the learning algorithm

Summary

Collection Frequency Factor

There are much more detailed parameters to consider. As an example original term frequency, IDF (inverse document frequency) and IDF probability (term relevance) are considered in this method. The factors used are: TF: term frequency IDF: Multiply TF by an inverse document frequency (IDF). The IDF factor varies inversely with the number of documents ni which contains the term ti in a collection of N documents and is typically computed as log (N/ni). In a nice similar research [11], Ifrim, Bakir, and weikum have shown that the logistic regression has a good impact in categorizing documents using variable length n-gram words or characters while learning involves automatic tokenization. They tried to solve this problem using n-gram logistic regression using gradient ascent approach. This offer could contain pictures, which are a good source to extract information

The Problem

Data Structure

General Statistics about the Dataset

Pictures

Offers on the Platform

Problem Solving Approach

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Term Weighting Vs. Logistic Regression Performance on E-Commerce Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Engineering & Technology

Lead the way for us

Journal: International Journal of Engineering & Technology	Publication Date: Nov 30, 2018
License type: cc-by

Similar Papers

Quality-related English text classification based on recurrent neural network
Cheng Liu ... Xiaofang Wang
Journal of Visual Communication and Image Representation | VOL. 71
Cheng Liu, et. al.Cheng Liu ... Xiaofang Wang
25 Nov 2019
Journal of Visual Communication and Image Representation | VOL. 71

Transfer learning with reasonable boosting strategy
Lei La ... Yongliang Wang
Neural Computing & Applications | VOL. 24
Lei La, et. al.Lei La ... Yongliang Wang
21 Dec 2012
Neural Computing & Applications | VOL. 24

An improved K-nearest-neighbor algorithm for text categorization
Shengyi Jiang ... Limin Kuang
Expert systems with applications | VOL. 39
Shengyi Jiang, et. al.Shengyi Jiang ... Limin Kuang
07 Aug 2011
Expert systems with applications | VOL. 39

Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms
Bassam Al-Salemi ... Shahrul Azman Mohd Noah
Information Processing and Management | VOL. 56
Bassam Al-Salemi, et. al.Bassam Al-Salemi ... Shahrul Azman Mohd Noah
22 Oct 2018
Information Processing and Management | VOL. 56

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Term Weighting Vs. Logistic Regression Performance on E-Commerce Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Engineering &amp; Technology

More From: International Journal of Engineering & Technology