The Exponential Distribution of the Order of Demonstrative, Numeral, Adjective and Noun
ABSTRACT The frequency of the preferred order for a noun phrase formed by demonstrative, numeral, adjective and noun has received significant attention over the last two decades. We investigate the actual distribution of the 24 possible orders. There is no consensus on whether it is well-fitted by an exponential or a power law distribution. We find that an exponential distribution is a much better model. This finding and other circumstances where an exponential-like distribution is found challenge the view that power-law distributions, e.g. Zipf’s law for word frequencies, are inevitable. We also investigate which of two exponential distributions gives a better fit: an exponential model where the 24 orders have non-zero probability (a geometric distribution truncated at rank 24) or an exponential model where the number of orders that can have non-zero probability is variable (a right-truncated geometric distribution). When consistency and generalizability are prioritized, we find higher support for the exponential model, where all 24 orders have non-zero probability. These findings strongly suggest that there is no hard constraint on word order variation, and then unattested orders merely result from undersampling, consistently with Cysouw’s view.
- Research Article
27
- 10.1006/jmva.1995.1027
- Apr 1, 1995
- Journal of Multivariate Analysis
Multivariate Exponential and Geometric Distributions with Limited Memory
- Research Article
3
- 10.25236/ajcis.2023.060208
- Jan 1, 2023
- Academic Journal of Computing & Information Science
The inner probabilistic properties of the big data have a great impact on the performance of pattern recognition systems. Jaccard similarity (JS) is a most popular statistic metric used for cal-culating the similarity of objects in feature extraction process. The paper combines JS with probabil-istic distribution model to explore the effect of the inner properties of big data. It deduced the gener-alized form of JS for probabilistic model and determined the calculation method of JS for power-law and exponential distribution. Experiment observations showed that power-law distribution has high-er JS than the correspondent exponential distribution, which denotes that power-law probabilistic structure is a more efficient probability structure. The original normalized data in MNIST database exhibited a more power-law-like distribution and the randomly translated data exhibited a more exponential-like distribution. The MNIST data with power-law-like property has higher JS and are more efficient comparing to the translated data. Thus, these observations provide possible guidelines for efficient information coding and processing methods.
- Research Article
28
- 10.1016/j.epsl.2006.03.028
- May 11, 2006
- Earth and Planetary Science Letters
Flexing is not stretching: An analogue study of flexure-induced fault populations
- Research Article
1
- 10.2307/3214846
- Jun 1, 1993
- Journal of Applied Probability
Let {Fn}n ≧ 0 be a sequence of c.d.f. and let {Rn}n ≧ 1 be the sequence of record values in a non-stationary record model where after the (n − 1)th record the population is distributed according to Fn. Then the equidistribution of the nth population and the record increment Rn – Rn– 1 (i.e. Rn – Rn– 1~ Fn) characterizes Fn to have an exponentially decreasing hazard function. To be more precise Fn is the exponential distribution if the support of Rn– 1 generates a dense subgroup in and otherwise the entity of all possible solutions can be obtained in the following way: let for simplicity the above additive subgroup be any c.d.f. F satisfying F(0) = 0, F(1) < 1 can be chosen arbitrarily. Setting λ = – log(1 – F(1)), Fn(x) = 1 – F(x – [x])exp(–λ [x]) is an admissible solution coinciding with F on the interval [0, 1] ([x] denotes the integer part of x). Simple additional assumptions ensuring that Fn is either exponential or geometric are given. Similar results for exponential or geometric tail distributions based on the independence of Rn– 1 and Rn – Rn– 1 are proved.
- Research Article
17
- 10.13140/rg.2.1.3925.6561
- Apr 19, 2015
The successful targeting of permeable fractures in geothermal fields is aided by understanding the spatial and geometric characteristics of fracture populations. Studies of numerous outcrop, and a limited number of geothermal reservoirs using cores and borehole logs, indicate that fracture frequency and width most commonly follow power-law distributions, with exponential, lognormal, gamma, and power-exponential distributions also reported. This paper presents the first statistical analysis of fracture width and spacing in the high-temperature Rotokawa Geothermal Field, Taupo Volcanic Zone, New Zealand. The fracture dataset comprises: (1) c. 3.6 km of acoustic borehole televiewer (BHTV) logs from three wells and, (2) c. 33 m of core. Statistical distributions have been fitted to the BHTV data using a maximum likelihood estimation method and statistical models selected using the Schwarz Bayesian Criterion. Fracture widths observed on BHTV logs range between c. 1 105 mm. Image resolution and sampling bias reduce the useable range of fracture width to less than one order of magnitude (c. 8 50 mm). Over this range, considering the sampling effects and core observations, the fracture width is best modelled by an exponential distribution with coefficients between 0.13±0.01 and 0.29±0.02, which should be treated as a lower bound. Analysis of fracture spacing of the four fracture sets identified on BHTV logs indicates that the dominant set (striking NE SW) is best modelled by a log-normal distribution, while power-law, power-exponential and gamma are also possible for individual wells. These spacing distributions indicate the presence of a characteristic scale which has not been observed in other geothermal reservoirs hosted in crystalline formations. The characteristic scale may be associated with mechanical interfaces associated with stratigraphic layering, faults, or cooling joints and/or sub-horizontal flow-banding in andesitic formations. Stratigraphic layering can consist of a succession of lava flows with intercalated breccia layers in the andesites, welding variations in tuffs and sedimentary layering in the sedimentary formations sampled by the BHTV logs. The subordinate fracture set striking N S is best modelled by a pareto (power-law) distribution which suggests that the spacing is more likely to be controlled by tectonic processes than by layering. This N S fracture set is predominant in only one of the wells studied which may indicate a structural control on their occurrence in the vicinity of this well. Low fracture spacing (<0.5 5 m) is best modelled by an exponential distribution and higher spacing by lognormal or pareto (power-law) distributions, except for the N – S striking dataset and the NE – SW striking fracture set in well RK32. The change of distribution model at different scales may be linked to the threshold at which fractures start interacting with each other. This work to date underlines the need to combine data spanning a broad range of length scales to conduct a sound statistical analysis of fracture populations and highlights the control on fracture formation by a combination of processes including tectonics, lava cooling and stress perturbations associated with stratigraphic anisotropy. The resulting distributions provide a basis for simulating and calibrating fracture models of geothermal reservoirs beyond those areas directly sampled with BHTV logs or cores and will integrate variations observed over a range of scales between the study wells.
- Research Article
2
- 10.11614/ksl.2012.45.4.420
- Dec 30, 2012
- Korean Journal of Lomnology
An Individual-Based Model (IBM) was developed by employing natural and toxic survival rates of individuals to elucidate the community responses of benthic macroinvertebrates to anthropogenic disturbance in the streams. Experimental models (doseresponse and relative sensitivity) and mathematical models (power law and negative exponential distribution) were applied to determinate the individual survival rates due to acute toxicity in stressful conditions. A power law was additionally used to present the natural survival rate. Life events, covering movement, exposure to contaminants, death and reproduction, were simulated in the IBM at the individual level in small (1 m) and short (1 week) scales to produce species abundance distributions (SADs) at the community level in large (5 km) and long (1~~2 years) scales. Consequently, the SADs, such as geometric series, log-series, and log-normal distribution, were accordingly observed at severely (Biological Monitoring Working Party (BMWP□10), intermediately (BMWP□40) and weakly (BMWP□50) polluted sites. The results from a power law and negative exponential distribution were suitably fitted to the field data across the different levels of pollution, according to the Kolmogorov-Smirnov test. The IBMs incorporating natural and toxic survival rates in individuals were useful for presenting community responses to disturbances and could be utilized as an integrative tool to elucidate community establishment processes in benthic macroinvertebrates in the streams.
- Research Article
5
- 10.1088/2399-6528/ad6ad1
- Aug 1, 2024
- Journal of Physics Communications
A fundamental challenge in the study of probability distributions is the quantification of inequality that is inherently present in them. Some parts of the distribution are more probable and some others are not, and we are interested in the quantification of this inequality through the lens of mathematical diversity, which is a new approach to studying inequality. We offer a theoretical advance, based on case-based entropy and slope of diversity, which addresses inequality for arbitrary probability distributions through the concept of mathematical diversity. Our approach is useful in three important ways: (1) it offers a universal way to measure inequality in arbitrary probability distributions based purely on the entropic uncertainty that is inherent in them and nothing else; (2) it allows us to compare the degree of inequality of arbitrary parts of any distribution (not just tails) and entire distributions alike; and (3) it can glean out empirical rules similar to the 80/20 rule, not just for the power law but for any given distribution or its parts thereof. The techniques shown in this paper demonstrate a more general machinery to quantify inequality, compare the degree of inequality of parts or whole of general distributions, and prove or glean out empirical rules for general distributions based on mathematical diversity. We demonstrate the utility of this new machinery by applying it to the power law, the exponential and the geometric distributions. The 60 − 40 rule of restricted diversity states that 60 percent or more of cases following a power law (or more generally a right skewed distribution) reside within 40 percent or less of the lower bound of Shannon equivalent equi-probable (SEE) types as measured by case-based entropy. In this paper, we prove the 60 − 40 rule for power law distributions analytically. We also show that in all power law distributions, the second half of the distribution is at least 4 times more uniformly distributed as the first. Lastly, we also show a scale-free way of comparing probability distributions based on the idea of mathematical diversity of parts of a distribution. We use this comparison technique to compare the exponential and power law distribution, and obtain the exponential distribution as an entropic limit of the power law distribution. We also demonstrate that the machinery is applicable to discrete distributions by proving a general result regarding the comparison of parts of the geometric distribution.
- Research Article
- 10.5128/lv26.05
- Oct 31, 2016
- Lähivõrdlusi. Lähivertailuja
"On the variation of word order in written L2 Finnish" Finnish word order is known to be syntactically relatively free, but it also has many discourse-conditioned functions (Vilkuna 1989: 9) that form part of the linguistic competency of Finnish native speakers. For those learning Finnish as a second language it can be difficult to recognize which word order is neutral (unmarked) and what interpretation would be triggered using another, rarer (marked) word order in a specific context. In this paper I concentrate on the Finnish existential (‘there is’) sentences, which were gleaned from the so-called Cefling corpus (cf. Martin et al. 2010) containing texts written by two groups: adults and school children. The texts in this corpus were judged as being of levels A1–C2 (adults) and levels A1–B2 (school children) with regard to the Common European Framework of Reference for Langugages (CEFR). The most typical word order of the existential sentences is that in which the theme position of the sentence is occupied by a local or possessive adverbial and the subject of the sentence is post-verbal (AVS for short). The theme position can also be empty (VS). Both of these word orders are also unmarked. First, I analyzed the variation of the (A)VS word order statistically. The marked variations of the (A)VS word order become more common, as the writing skills (according to CEFR levels) increase. Statistically highly significant differences were found between the levels A & B as well as between the levels B & C in the adult group. The variation in the school children group was not statistically significant. I then analyzed more closely the use of the marked SV-order in text context, using the so-called field description of word order (‘sanajärjestyksen kenttäkuvaus’) as presented in the ISK (2004: 1306–1345). (Cf. Vilkuna 1989 for the nearest equivalent of this model in English.) The unmarked VS-order sentence is sometimes considered as being “themeless”, since the theme field is not occupied. If the theme field is empty, the subject in the SV-type sentence could occupy the theme field. But also the so-called pre-field preceding the theme field could be occupied by the subject, if the theme field were not really empty. This might seem marginal, but could also have an influence on the interpretation of the sentence in context. The text samples reveal that a suitable theme can often be found for the empty theme slot – at least in the case of SV-order – in the text preceding this sentence. In this case the “empty” theme field could be occupied by this continuous theme, and the subject (rheme) would be in the pre-field. This word order is clearly marked and brings a contrastive or a convincing tone to the text. The text samples show that at least some of the higher-level L2 Finnish learners are able to use the marked SV order in texts this way quite correctly. The subject of the existential sentences is normally interpreted as a rheme or “new information”. In some cases, however, the text samples show that the subject of the existential SV sentence is not actually always a rheme in the prefield: it has been at least indirectly mentioned in the text before and perhaps that is why it rather seems to occupy the theme slot in some SV-order existential sentences. There are also some specific verbs with which the SV-order in the existential sentences seems to be well-established without necessarily being the marked order. This, as well as some learning-related issues of word order, requires further investigation.
- Research Article
- 10.5128/383
- Oct 31, 2016
- Lähivõrdlusi. Lähivertailuja
On the variation of word order in written L2 Finnish Finnish word order is known to be syntactically relatively free, but it also has many discourse-conditioned functions (Vilkuna 1989: 9) that form part of the linguistic competency of Finnish native speakers. For those learning Finnish as a second language it can be difficult to recognize which word order is neutral (unmarked) and what interpretation would be triggered using another, rarer (marked) word order in a specific context. In this paper I concentrate on the Finnish existential (‘there is’) sentences, which were gleaned from the so-called Cefling corpus (cf. Martin et al. 2010) containing texts written by two groups: adults and school children. The texts in this corpus were judged as being of levels A1–C2 (adults) and levels A1–B2 (school children) with regard to the Common European Framework of Reference for Langugages (CEFR). The most typical word order of the existential sentences is that in which the theme position of the sentence is occupied by a local or possessive adverbial and the subject of the sentence is post-verbal (AVS for short). The theme position can also be empty (VS). Both of these word orders are also unmarked. First, I analyzed the variation of the (A)VS word order statistically. The marked variations of the (A)VS word order become more common, as the writing skills (according to CEFR levels) increase. Statistically highly significant differences were found between the levels A & B as well as between the levels B & C in the adult group. The variation in the school children group was not statistically significant. I then analyzed more closely the use of the marked SV-order in text context, using the so-called field description of word order (‘sanajarjestyksen kenttakuvaus’) as presented in the ISK (2004: 1306–1345). (Cf. Vilkuna 1989 for the nearest equivalent of this model in English.) The unmarked VS-order sentence is sometimes considered as being “themeless”, since the theme field is not occupied. If the theme field is empty, the subject in the SV-type sentence could occupy the theme field. But also the so-called pre-field preceding the theme field could be occupied by the subject, if the theme field were not really empty. This might seem marginal, but could also have an influence on the interpretation of the sentence in context. The text samples reveal that a suitable theme can often be found for the empty theme slot – at least in the case of SV-order – in the text preceding this sentence. In this case the “empty” theme field could be occupied by this continuous theme, and the subject (rheme) would be in the pre-field. This word order is clearly marked and brings a contrastive or a convincing tone to the text. The text samples show that at least some of the higher-level L2 Finnish learners are able to use the marked SV order in texts this way quite correctly. The subject of the existential sentences is normally interpreted as a rheme or “new information”. In some cases, however, the text samples show that the subject of the existential SV sentence is not actually always a rheme in the pre field: it has been at least indirectly mentioned in the text before and perhaps that is why it rather seems to occupy the theme slot in some SV-order existential sentences. There are also some specific verbs with which the SV-order in the existential sentences seems to be well-established without necessarily being the marked order. This, as well as some learning-related issues of word order, requires further investigation.
- Research Article
7
- 10.1142/s0129183122500310
- Oct 6, 2021
- International Journal of Modern Physics C
Extensive real-data reveals that individuals exhibit heterogeneous contacting frequency in social systems. We propose a mathematical model to investigate the effects of heterogeneous contacting for information spreading in metapopulation networks. In the proposed model, we assume the number of contacting (NOC) distribution follows a specific distribution, including the normal, exponential, and power-law distributions. We utilize the Markov chain method to study the information spreading dynamics and find that mean and variance display no significant effect on the outbreak threshold for all the considered distributions. Under the same values of NOC distribution’s mean and variance, the information prevalence is largest when the distribution of NOC follows the normal distribution and second-largest for the exponential distribution, the smallest for the power-law distribution. When the distribution of NOC obeys the normal distribution, experimental results show that the information prevalence will decrease with individual contact ability heterogeneity. We observe similar phenomena when the distribution of NOC follows a power-law and exponential distribution. Furthermore, a larger mean of individual contact capacity distribution will result in higher information prevalence.
- Research Article
1
- 10.6186/ijims.2014.25.2.6
- Jul 1, 2014
- International journal of information and management sciences
The monetary amount of customers' purchases and interpurchase time are two related and important variables in the realm of business marketing. Yet most research has formulated them independently in prediction models. This paper proposes a prediction model of customer monetary spending using information on interpurchase time. Unlike previous research, we consider interpurchase time according to geometric distribution. Moreover, our monetary prediction model combines interpurchase time and an underlying (basic unit) monetary amount which is assumed as log normal distribution. This study collects empirical data to validate the proposed model and estimate its parameters. We also compare our results with those of interpurchase time following exponential distribution. The results show that our proposed model performances better at monetary forecasting than the exponential model does.
- Research Article
- 10.19139/soic-2310-5070-2308
- Feb 1, 2026
- Statistics, Optimization & Information Computing
Bivariate Gumbel’s exponential distribution is one of the most popular continuous bivariate distributions. Comprehensive studies have been done on bivariate Gumbel’s exponential model during the past few decades. In this paper, we have derived a generalized version of bivariate Gumbel’s exponential model through entropy optimization and we call this model as q-bivariate Gumbel’s exponential model. One of the major properties of the q-bivariate Gumbel’s exponential model is that its marginal densities are q-exponential distributions. Its survival function, distribution function and density function can be expressed in terms of q-exponential function, which is the q-analogue of exponential function which posses several applications in various fields. Different properties and a characterisation theorem of this distribution have been discussed. For illustrating the use of the proposed model the unknown parameters are estimated using the method of maximum likelihood estimation. A likelihood ratio test is carried out to test the goodness of fit of q-bivariate Gumbel’s exponential distribution to verify its compatibility with the existing bivariate Gumbel’s exponential model. In order to interpret the practical applicability of q-bivariate Gumbel’s exponential model a simulation study and a real data application have been carried out. From this study, we can conclude that q-bivariate Gumbel’s exponential model shows a better fit than bivariate Gumbel’s exponential model.
- Research Article
42
- 10.1007/bf01189237
- Mar 1, 1994
- Queueing Systems
Failures of machines have a significant effect on the behavior of manufacturing systems. As a result it is important to model this phenomenon. Many queueing models of manufacturing systems do incorporate the unreliability of the machines. Most models assume that the times to failure and the times to repair of each machine are exponentially distributed (or geometrically distributed in the case of discrete-time models). However, exponential distributions do not always accurately represent actual distributions encountered in real manufacturing systems. In this paper, we propose to model failure and repair time distributions bygeneralized exponential (GE) distributions (orgeneralized geometric distributions in the case of a discretetime model). The GE distribution can be used to approximate distributions with any coefficient of variation greater than one. The main contribution of the paper is to show that queueing models in which failure and repair times are represented by GE distributions can be analyzed with the same complexity as if these distributions were exponential. Indeed, we show that failures and repair times represented by GE distributions can (under certain assumptions) be equivalently represented by exponential distributions.
- Research Article
15
- 10.1007/s00024-020-02576-z
- Sep 3, 2020
- Pure and Applied Geophysics
The 27 November 2019 Mw 6.0 earthquake that occurred in the southwestern part of the Hellenic Arc near Crete Island provided evidence of the high potential for strong earthquakes and active seismicity in the Hellenic Arc. In addition, tsunamis have been reported to occur for the region after major earthquakes in the historical past, so the seismic hazard of the Hellenic Arc should be evaluated in detail. The aim of this study is to evaluate the seismic hazard of the Hellenic Arc more reliably and accurately by estimating the conditional probabilities of a strong earthquake based on Weibull, gamma, log-normal, exponential, Rayleigh, and inverse Gaussian distribution models for the inter-event time of Mw ≥ 6.0 earthquakes that occurred between 1900 and 2019 in the study area. The fit between each model and the data was tested using four different test criteria, namely the log-likelihood value, Akaike information criterion, Bayesian information criteria, and Kolmogorov–Smirnov test. According to the results, the inverse Gaussian distribution was selected as the best, the log-normal distribution as the second best, the Weibull and gamma distributions as intermediate, and the Rayleigh and exponential distribution as the poorest and second poorest model, respectively. The conditional probability of an earthquake with magnitude Mw ≥ 6.0 is estimated to be higher than 0.70 according to all of the models used in this study for the base year te = 0 (te = 2015) and t > 5 years (t > 2020). Moreover, the results obtained based on the inverse Gaussian, exponential, log-normal, and Weibull distribution models are close to each other and are higher than 0.60 for te = 0 and t ≥ 3 years (t ≥ 2018). The outcomes of this study when using the different distribution models will contribute to assessments of the seismic as well as tsunami hazards for the region.
- Research Article
13
- 10.1186/1471-244x-13-281
- Nov 4, 2013
- BMC Psychiatry
BackgroundAccumulating evidence has shown a universality in the temporal organization of activity and rest among animals ranging from mammals to insects. Previous reports in both humans and mice showed that rest bout durations followed long-tailed (i.e., power-law) distributions, whereas activity bouts followed exponential distributions. We confirmed similar results in the fruit fly, Drosophila melanogaster. Conversely, another report showed that the awakening bout durations, which were defined by polysomnography in bed, followed power-law distributions, while sleeping periods, which may correspond to rest, followed exponential distributions. This apparent discrepancy has been left to be resolved.MethodsActigraphy data from healthy and disordered children were analyzed separately for two periods: time out of bed (UP period) and time in bed (DOWN period).ResultsWhen data over a period of 24 h were analyzed as a whole, rest bouts showed a power law distribution as previously reported. However, when UP and DOWN period data were analyzed separately, neither showed power law properties. Using a newly developed strict method, only 30% of individuals satisfied the power law criteria, even when the 24 h data were analyzed. The human results were in contrast to the Drosophila results, which revealed clear power-law distributions for both day time and night time rest through the use of a strict method. In addition, we analyzed the actigraphy data from patients with childhood type chronic fatigue syndrome (CCFS), and found that they showed differences from healthy controls when their UP and DOWN data were analyzed separately.ConclusionsThese results suggested that the DOWN sleep, the bout distribution of which showed exponential properties, contributes to the production of long-tail distributions in human rest periods. We propose that separate analysis of UP and DOWN period data is important for understanding the temporal organization of activity.