The aim of
{naturaList}
package is to implement a classification of
occurrence records based on the suitability in the species
identification record. The quality of classification is ranked up to six
levels of confidence. Additionally, {naturaList}
package
provides tools to filter the occurrence data based on these
classification levels, identify the possible specialists in the taxa and
evaluate the effects of the filtering procedure on different descriptors
of species spatial distribution of occurrence records (area of
distribution and niche breadth). With {naturaList}
package
the users can filter large occurrence data based on well established and
clear criterion, evaluate possible effect of data processing on
downstream analysis and explore spatial occurrence data through an
interactive interface.
{naturaList}
has as the core function
classify_occ()
. The rationale of the classification is that
the most reliable identification of a specimen is made by a specialist
in the taxa. To classify an occurrence at this level of confidence, the
classify_occ()
function needs of an occurrence and a
specialist dataset. The other levels in which data can be classified are
derived from information contained in the occurrence dataset. The
default order for classification in confidence levels is:
The user can alter this order, depending on his/her objectives, except for the Level 1 that is always a species determined by a specialist.
As example, we will use the datasets in {naturaList}
:
A.setosa
, as the occurrence dataset, and
speciaLists
, as the specialist dataset. In the
A.setosa
there are occurrence records for Alsophila
setosa, a tree fern of the Brazilian Atlantic Forest. This dataset
were downloaded from Global
Biodiversity Information Facility (GBIF). The
speciaLists
is a dataset with specialists of ferns and
lycophytes of Brazil, which we gathered from the authors of this paper.
# Load package and data
library(naturaList)
data("A.setosa")
data("speciaLists")
# see the size of datasets
dim(A.setosa) # see ?A.setosa for details
dim(speciaLists) # see ?speciaLists for details
Classification using the default order of confidence levels
You can check how many occurrences was classified in each level:
You can easily create a specialist dataset using
create_spec_df()
. You just need to provide a character
vector with the names of specialists, and the output is a dataset
formatted be used in classify_occ()
.
In this example, we use the names of four famous Brazilian musicians. Note that the Latin accent mark is provided, and even a nickname (e.g. Tom Jobim).
It might occur that some strings in the ‘identifiedBy’ column of the
occurrence dataset do not correspond to a taxonomist name. Strings as
such "Unknown"
often is included in the ‘identifiedBy’ data
field. It is important then that such strings be ignored by the
classify_occ()
, if not this function could flag an
occurrence record as determined by a taxonomist when it was not.
To cope with this issue, get_det_names()
can be used to
verify which strings are not taxonomists names. This function returns
all unique strings in the ‘identifiedBy’ column of the dataset. Based on
this list of names, you could create a character vector with the strings
to be ignored by classify_occ()
, providing it to the
ignore.det.names
argument. See also the
?classify_occ
for more details.
# check out if there are strings which are not taxonomists
get_det_names(A.setosa)
# include these strings in a object
ig.names <- c("Sem Informação" , "Anonymous")
# use 'ignore.det.names' to ignore those strings in classify_occ()
occ.class <- classify_occ(A.setosa, speciaLists, ignore.det.names = ig.names)
table(occ.class$naturaList_levels)