Category Archives: Hypothesis 2

Prevalence of Most Popular Words within each Scientific Discipline

Bio_mostshared Medicine_mostshared Psych_mostshared Physics_mostsharedAll visualisations were generated using the matplotlib package in Python

The purpose behind the visualisation was to gain an insight into the prevalence of the top 5 most technical words among articles in comparison to the the most prevalent words within each specific science subject. Furthermore, we also wanted to gain an idea of how the occurrence of these technical words measured against the average number of times words were found in multiple articles.

These visualisation suggest that, Medicine had the highest average number of words shared between each individual article. Reasons for why this might be are incredibly extensive and one should be cautious about assigning a definitive causal explanation. Perhaps this is indicative of the extent of homogeneity among Medicine as an academic subject. However, with topic areas ranging from neuroanatomy to pharmacology, Medicine as a subject area carries an enormous breadth. Therefore, chances are that this high occurrence average is a result of  my own personal selection biases as opposed to any actual reflection of Medicine as a linguistically connected subject. Nevertheless, a larger and more detailed study should be carried out to verify our results.

In terms of technical word occurrence across the academic subjects, in all instances the top technical word occurred across articles from that subject at an above average rate in relation to the average word. However, in order to gage a more accurate and informative idea of word occurrence, more technical words should be investigated. In isolation, these visualisations don’t reveal much about how the prevalence of these technical words with one subject is related to an interdisciplinary context. That being said, it does offer an interesting insight into the occurrence of these words at a fundamental level (i.e. within their own academic subject).

Due to time constraints and processing power, it was not feasible to search the entire collection of words for each article and individually count the technical words. For this reason it was decided to look at the top 5 technical words as opposed to all of them.

(Isabelle Blackmore)

 

The Prevalence of words shared amongst all Arts Subjects

wordsinall

These two graphs are an extension of those above (Prevalence of Most Popular Words within each Arts/Scientific Subject). Having determined the most popular technical words within each subject, we tested to see if their frequency held up when expanding the scope of articles studied. The graphs above plot data from the 10 subject specific articles while these look at the 40 articles within a discipline.

Looking at these two types of graph, we observe that the popularity of a technical word within its subject does not necessarily correlate to its popularity within the discipline. For example, ‘contra’ is the most used technical word in Visual Arts, which implies its importance within the subject. However, it is not listed on this chart, which means that the word has less significance in other subjects. In fact, only 2 of the words in this chart was found in the one above. What we take away here is that the popularity of a technical word within a subject cannot be extrapolated to find out its popularity within a discipline.

What this visualisation does not account for is the number of occurrence within an article. For instance, if the word logic was used 50 times in one article but only 1 time in another, they will both be reflected equally on this chart since, as the axis shows, it counts the number of articles that have a word in it and not the frequency of the word. Despite this, we decided that the number of articles was sufficient to gauge extent of shared vocabulary as it is the use of another subject’s technical word – no matter 50 or 1 times – that indicates the transferability of technical words within subjects.

Word Clouds

Here are our word cloud visualisations!

We intended for these word clouds to be a graphic representation of the extent in which technical words are shared amongst subjects. As mentioned, word clouds are a gauge of frequency so in the clouds below, the bigger a word is, the more often it had been used throughout the articles. By frequency, what is meant is the number of articles that mention the word and not the number of times the word appears.

In all, we created 4 word clouds:

1. No. of articles that mention Sciences’ technical words across all articles

swmcloud

2. No. of articles that mention Arts’ technical words across all articles

awncloud

3. No. of articles that mention Social Sciences’ technical words across all articles

allwmcloud

4. No. of articles that mention all technical words across all articles

allwm

Comments

With the first three visualisations, we can identity which technical words had the most number of articles mentioning them. This thus leads on to review the technical word within each discipline that are most commonly used in academia.

For Science, the word used most frequently in the articles is ligand.

For Arts, the word used most frequently in the articles is epistemic.

For Social Sciences, the word used most frequently in the articles is social.

Word clouds allow us to contextualise this frequency in relation to other technical words within the discipline. Relative to other forms of visualisations, it is most accessible to the layman yet still providing an adequate perspective of what is being looked at. For instance,  someone can easily tell that ligand appeared more than twice compared to cell without having to study for instance, the axis, if this were to be visualised by a chart.

When grouped together though (4.), it is obvious that Social Sciences’ technical words appear much more frequently than the others. While epistemic is the technical word from the Arts that is used most frequently, when compared to the technical words from Social Sciences, it is barely visible (above the ‘s’ in social). In one sense, this visualisation may be interpreted as inaccurate for the comparison between 2. and 3. seems to imply that social and epistemic have similar frequency count when it is in reality, not the case. On the other hand, it is this very visualisation that exposes the frequency differences between these two words. Perhaps it is most important to bear in mind, when contrasting visualisations, to take into account the context of their data and not to plainly make comparisons between visualisations that might operate on different scales.

Word Cloud 4. however, very clearly shows us that the discipline which technical words are used in articles of other disciplines is Social Science. The sheer size of the Social Sciences’ technical words relative to even the most commonly occurring technical words from the other disciplines reflects this.

These clouds are a similar type of visualisation of the same data found in the graphs titled ‘The Prevalence of words shared amongst all Arts/Scientific Subjects’, except not limited to the Top 5 shared technical words. They however are limited in that their frequency count is simply the total number of articles that mention them, regardless of the number of subjects that mention them. For example, if ligand appears in 10 Biology articles while another word appears 2 times in 5 subjects, their size will be the same on the cloud. Perhaps an alternative way to account for this is to use the number of subjects a word is mentioned in over the number of articles. Even then though, the visualisation would have its bias in that a word with only 1 mention in 1 article within a subject would be the same size as one that has 100 mentions in 10 articles within a subject.


Hypothesis 2. The lower a discipline’s level of technicality, the more likely its technical words are shared with another discipline.

Here, we concluded the level of technicality from highest to lowest as follows: Science, Social Arts and Arts. These visualisations however imply that Social Science’s technical words are shared the most. Considering the ratio of technical:non-technical words within the 3 disciplines, perhaps the low ratio of the Arts suggest the specificity of the Art’s technical words and hence, illustrate why, even though on a whole it is the least technical, the Art’s technical words are less likely to be shared.