Abstract

Enriching web shop pages with structured data has recently become popular in e-commerce. It is mainly driven by search engines favouring those pages. While structured data in e-commerce is mainly generated automatically by shop extensions, this data covers only a small share of the market, resulting in a major hamper for applications operating on aggregated data. In this context, more than 90% of product detail pages on the web are generated by only 7 e-commerce systems. Meanwhile, little research addresses methods to automatically detect e-commerce systems. Automated detection would allow to design system-specific extractors able to grow the amount of structured data in e-commerce. Therefore, we propose a novel approach to this problem, which filters features generated from HTML tag attributes with an e-commerce specific white list. We evaluate 6 classification algorithms on the problem and discuss computational effort. We can show that this approach is capable of detecting the 6 most important e-commerce systems with a F1-score of 0.9 by analyzing only one HTML page per web shop. We evaluate our findings on an independent dataset and on reference shop sites.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.