Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples

Daiho Uhm,Sunghae Jun

doi:10.3390/fi14070211

Abstract

Due to the expansion of the internet, we encounter various types of big data such as web documents or sensing data. Compared to traditional small data such as experimental samples, big data provide more chances to find hidden and novel patterns with big data analysis using statistics and machine learning algorithms. However, as the use of big data increases, problems also occur. One of them is a zero-inflated problem in structured data preprocessed from big data. Most count values are zeros because a specific word is found in only some documents. In particular, since most of the patent data are in the form of a text document, they are more affected by the zero-inflated problem. To solve this problem, we propose a generation of synthetic samples using statistical inference and tree structure. Using patent document and simulation data, we verify the performance and validity of our proposed method. In this paper, we focus on patent keyword analysis as text big data analysis, and we encounter the zero-inflated problem just like other text data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future Internet	Publication Date: Jul 16, 2022
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples

Abstract

Talk to us

Similar Papers

More From: Future Internet

Lead the way for us

Similar Papers

Discussion on geological science big data and its applications
Chonglong Wu ... Gang Liu
Chinese Science Bulletin | VOL. 61
Chonglong Wu, et. al.Chonglong Wu ... Gang Liu
16 May 2016
Chinese Science Bulletin | VOL. 61

Big data phenotyping in rare diseases: some ethical issues
Nina Hallowell ... Christoffer Nellåker
Genetics in Medicine | VOL. 21
Nina Hallowell, et. al.Nina Hallowell ... Christoffer Nellåker
01 Feb 2019
Genetics in Medicine | VOL. 21

Machine learning algorithms for Big Data analytics including deep learning
Shaveta Malik ... Rohit Sahoo
-
Shaveta Malik, et. al.Shaveta Malik ... Rohit Sahoo
24 Aug 2022
24 Aug 2022

Zero-Inflated Text Data Analysis using Generative Adversarial Networks and Statistical Modeling
Sunghae Jun
Computers | VOL. 12
Sunghae JunSunghae Jun
10 Dec 2023
Computers | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples

Abstract

Talk to us

Similar Papers

More From: Future Internet