This markdown-document demonstrates the workflow for isolating the
soundscattering signal from the acoustics dataset reconstructed from
screenshots. The processing starts with the dataset generated by the
screenshot processing and finally returns a dataset where the acoustic
soundscattering signal contained in the screenshot is isolated from
other image features such as noise, the sea floor etc.. The final
dataset can be used to e.g. analyse swarm shapes and their vertical
position, e.g. by using the Centre of Mass or measures of dispersal. All
files and R-scripts necessary to replicate this tutorial can be found
under https://sandbox.zenodo.org/record/1184381.
Before we process the data, we load all the relevant functions for
the tutorial and set up the framework for the analysis.
# function for downsampling the original data
source('functions/downsampling_acoustics.r')
# function for sea floor detection
source('functions/sea_floor_max_sv.r')
# function to visualise the product of the sea floor detection
source('functions/sea_floor_plot.r')
# function to plot the echogram
source('functions/acoustics_plot.r')
# function to calculate backscattering intensity
source('functions/power_cal_trans.r')
# function to search interval (used for noise estimation)
source('functions/matrix_search_mean.r')
# function for noise removal
source('functions/noise_removal.r')
# function to remove all non-biomass-signal features
source('functions/isolate_signal.r')
The functions are contained within the functions folder of the
tutorial.
The processing starts with creating a tibble that stores the
(file-specific) input parameter values that are used for the sea floor
detection and de-noising algorithms to keep track of the settings. At
the end of the processing, this file will be stored in the
“data”-folder.
if(!file.exists('data/meta_data.xlsx')) {
meta_data <- tibble::tibble(
filePath = list.files(
'data',
pattern = '.csv', full.names = T, recursive = T
),
fileName = list.files(
'data',
pattern = '.csv', full.names = F, recursive = T
)
) |>
dplyr::mutate(
dateTimeStart = NA,
dateTimeStop = NA
)
} else {
meta_data <- openxlsx::read.xlsx(
'data/meta_data.xlsx'
)
}
❗ ❗ ❗ The next step is
to select the file to be processed. Here we can pick the row-number in
the meta_data
-object of the respective file. In this case
it has to be one of 1, 2, and 3. At this point, the script could also be
adapted and looped over multiple files.
fileNumber <- 2
Then we import the respective ‘.csv’-file and store the record’s
start and end time in the meta_data
data frame.
# import the acoustic data
acoustics <- readr::read_csv(
meta_data$filePath[fileNumber],
col_types = readr::cols(
biomassScore = readr::col_number(),
dateTime = readr::col_character(),
depth = readr::col_number(),
depthFilter = readr::col_number(),
interpSeafloor = readr::col_number(),
seaFloorDepth = readr::col_number(),
timeBin = readr::col_number()
)
)
## add start and end time the meta_data
# start time
meta_data$dateTimeStart[fileNumber] <- acoustics |>
dplyr::filter(dateTime == min(dateTime)) |>
dplyr::distinct(dateTime) |>
dplyr::pull(dateTime)
# end time
meta_data$dateTimeStop[fileNumber] <- acoustics |>
dplyr::filter(dateTime == max(dateTime)) |>
dplyr::distinct(dateTime) |>
dplyr::pull(dateTime)
The data set consists of eight variables. During processing, we need
the first four: the sampling depth, the relative time (scaling from 0-1;
start-end), the relative backscattering score from 0-1 and the datetime
information.
The following variables specify the downsampling input
parameters. The depth resolution is in meters, and the time component is
in minutes. For the data sets, the depth resolution should not be finer
than, in this example, the input resolution of 0.6667 meters. To have a
uniform time and depth resolution for all files, we store the
information in the meta_data
-tibble.
# set temporal resolution of the processed dataset (other possible time units: sec, hours)
tempResolution <- '3 min'
# set depth resolution of the processed dataset in meters
depthResolution <- 2/3
# local time offset to UTC in hours
utc_diff <- +1
# store information in the meta_data-tibble
meta_data$tempResolution <- tempResolution
meta_data$depthResolution <- depthResolution
meta_data$utc_diff <- utc_diff
The downsampling function requires several input arguments
data
)
meta_data
-object
(start_time
)
end_time
)
temp_resolution)
depth_resolution
)
local_time_zone
)
# downsample the input data
acoustics <- downsampling_acoustics(
data = acoustics,
start_time = meta_data$dateTimeStart[fileNumber],
end_time = meta_data$dateTimeStop[fileNumber],
temp_resolution = tempResolution,
depth_resolution = depthResolution,
local_time_zone = utc_diff
)
Before we start the sea floor detection, we check whether the
data are in the correct rectangular format. More precisely, every depth
level needs to have an assigned backscattering value at every time
point, regardless if it is an NA
or not. The lines below
transforms the data in said way.
acoustics <- acoustics |>
dplyr::group_by(dateTime) |>
dplyr::arrange(desc(depth)) |>
dplyr::ungroup()
#
frameExp <- acoustics |>
tidyr::expand(dateTime, depth) |>
dplyr::arrange(dateTime, desc(depth))
# biomass data as vector
acousticsMatrix <- acoustics |>
dplyr::select(c(depth, timeBin, biomassScore)) |>
tidyr::pivot_wider(names_from = timeBin, values_from = biomassScore, values_fill = NA) |>
dplyr::select(-depth) |>
as.matrix()
#
acoustics <- tibble::tibble(
frameExp, biomassScore = as.vector(acousticsMatrix)
) |>
dplyr::mutate(
timeBin = rep(1:length(unique(dateTime)), each=length(unique(-depth)))
) |>
dplyr::relocate(timeBin, .before = dateTime)
First we add the default values to the metadata data frame.
# create new columns to store the function inputs,
# here they get assigned default values that can be changed in case they do not produce the desired result
if(!file.exists('data/meta_data.xlsx')) {
meta_data <- meta_data |>
dplyr::mutate(
seaFloor_art_min_good_sv = 0.95,
seaFloor_discrim_level = 0.6,
seaFloor_end_depth = -300,
seaFloor_start_depth = -40
)
}
sea_floor_max_sv
is a slightly modified version of the
max_Sv-algorithm documented in the Echoview
help. The algorithm detects the sea floor based on selection
criteria related to the maximum relative backscattering intensity within
a pre-defined depth window (seaFloor_start_depth
,
seaFloor_end_depth
). In the following, only the sea floor
detection takes place. The sea floor removal is part of denoising part
described later. data
)
seaFloor_art_min_good_sv
)
seaFloor_discrim_level
)
seaFloor_end_depth)
seaFloor_start_depth
)
acoustics <- sea_floor_max_sv(
data = acoustics
)
# visualize the results
plotSeaFloor <- sea_floor_plot(acoustics)
The algorithms misclassifies the sea floor in some parts of the
dataset (red line). This means, that the settings were not optimal and
need to be tweaked to achieve a better result. In case the
classification worked well with the default settings, the following
steps are not necessary.
# Minimum good fit for the sea floor
seaFloor_art_min_good_sv <- 0.95
# discrimination level (seperates the sea floor and the waterbody)
seaFloor_discrim_level <- 0.85
# starting depth of the search window [m]
seaFloor_start_depth <- -140
# Store information in meta_data
meta_data$seaFloor_art_min_good_sv[fileNumber] <- seaFloor_art_min_good_sv
meta_data$seaFloor_discrim_level[fileNumber] <- seaFloor_discrim_level
meta_data$seaFloor_start_depth[fileNumber] <- seaFloor_start_depth
# repeat sea floor detection with updated parameter values
acoustics <- sea_floor_max_sv(
data = acoustics,
seaFloor_art_min_good_sv = seaFloor_art_min_good_sv,
seaFloor_discrim_level = seaFloor_discrim_level,
seaFloor_start_depth = seaFloor_start_depth
)
# visualize the results
plotSeaFloor <- sea_floor_plot(acoustics)
The refined parameter values significantly improved the sea floor detection quality.
As mentioned before, the detection algorithm does not remove parts of the dataset that are below the sea floor. Instead, it returns an object that contains the original input data and a suggested sea floor depth value for each timepoint in a new column. In another column, the detection quality is stored with a binary value (“good”, “bad”), which is a measure for the confidence of the classification. When no sea floor is visible at a given timepoint, the sea floor is typically determined at the maximum depth value (252 m in this example) with “bad” detection quality as the algorithm determined low confidence that this depth represents the actual sea floor.
In the denoising part, the biomass signal is isolated from
instrument noise. The algorithm, described in De Roberts and
Higginbottom (2007, doi: https://doi.org/10.1093/icesjms/fsm112),
assumes that a portion of the signal is dominated by background noise.
Therefore, it estimates the background noise for every sample and
subtracts the calculated noise value from the received signal.
The algorithm consists of three main steps that are executed below:
During the first step, the relative backscattering score is transformed
into a value in dB (decibel). This is also were the conversion from a
relative intensity (0-1) to an absolute intensity (-70 to -10 dB in this
case) takes place. The lower and upper boundaries of the decibel-values
should be set according to the colourbar settings used for the
visualization of the acoustic data in the screenshots. In the second
step, the converted signal is used to estimate the noise for every
sample by averaging over a particular grid cell. The size of the grid
cell can be individually selected for each data set In the third step,
the estimated noise is subtracted from the measured backscattering
value. In addition, when the estimated noise is equal or larger than the
backscattering value, NA
is produced. This is frequently
the case when no biomass is detected.
# store the input values used for the denoising in meta_data
if(!file.exists('data/meta_data.xlsx')) {
meta_data <- meta_data |>
dplyr::mutate(
temperature = 1, # in °C
salinity = 34, # in PSU
ph = 8.1,
frequency = 200000, # in 1/sec
pulseDuration = 0.001024, # in sec
scaling = TRUE,
decibelMax = -10, # in dB
decibelMin = -70, # in dB
noiseMax = -125 # in dB
)
}
# transform relative backscattering intensity to decibel values based on physical properties of the sea water
acoustics <- power_cal_trans(
acoustics = acoustics,
temperature = meta_data$temperature[fileNumber], # in °C
salinity = meta_data$salinity[fileNumber], # in PSU
pH = meta_data$ph[fileNumber],
frequency = meta_data$frequency[fileNumber], # in 1/sec
pulseDuration = meta_data$pulseDuration[fileNumber], # in sec
scaling = meta_data$scaling[fileNumber],
decibelMax = meta_data$decibelMax[fileNumber], # in dB
decibelMin = meta_data$decibelMin[fileNumber] # in dB
)
# matrix_search to estimate the noise
acoustics <- matrix_search(
data = acoustics,
noiseMax = meta_data$noiseMax[fileNumber]
)
# remove the noise, isolate the biomass signal from noise
acoustics <- noise_removal(acoustics, scaling = TRUE)
## [1] "347848 values are not available"
# visualize results
acousticsPlot <- acoustics_plot(
data = acoustics, variable = acoustics$biomassScoreDenoised
)
The algorithm clearly isolated the biomass signal from other image
features. What remains is surface noise as well as the sea floor. These
parts are removed in the following:
# define window within biomass signal should be contained (optional)
upper_limit <- 10
lower_limit <- 240
# save settings in meta_data
meta_data$upper_limit <- upper_limit
meta_data$lower_limit <- lower_limit
# isolate the biomass signal, remove sea floor and signal outside of the depth-window
acoustics <- isolate_signal(
data = acoustics,
upper_limit = upper_limit,
lower_limit = lower_limit
)
# visualize result
acousticsPlot_final <- acoustics_plot(
data = acoustics, variable = acoustics$biomassScoreDenoised
)
Now, the biomass signal is isolated from the rest.
The final dataset contains several new variables: The first four
variables are dateTime, timeBin, depth and biomassScore, which were
already contained in the dataset before processing. New variables that
were added through the processing are: biomassScoreDenoised, correctSv
and seaBottom. biomassScoreDenoised is the final product of the signal
isolation process and comes in relative units (0-1). correctSv is the
product of the de-noising and comes in decibel units. Visually, it is
similar to biomassScoreDenoised but it still contains the sea floor.
seaBottom is the identified sea bottom depth at a given timepoint.
At the end of the processing, the data are exported as an
RDS-file.
# save the processed acoustics file
saveRDS(
acoustics |>
dplyr::select(
c(dateTime, timeBin, depth,
biomassScore, biomassScoreDenoised, correctSv,
seaBottom)
),
paste0(
'export/acoustics_processed_', stringr::str_extract(
meta_data$fileName[fileNumber],
'\\d{4}-\\d{2}-\\d{2}_\\d{2}_\\d{2}_\\d{2}_to_\\d{4}-\\d{2}-\\d{2}_\\d{2}_\\d{2}_\\d{2}'
), '.RDS'
)
)
# picture of the echogram
ggplot2::ggsave(
paste0(
'export/acoustics_processed_',
stringr::str_extract(
meta_data$fileName[fileNumber],
'\\d{4}-\\d{2}-\\d{2}_\\d{2}_\\d{2}_\\d{2}_to_\\d{4}-\\d{2}-\\d{2}_\\d{2}_\\d{2}_\\d{2}'
),
'.png'
),
plot = acousticsPlot_final,
width = 40,
height = 4,
units = 'in',
limitsize = FALSE,
type = 'cairo')
# meta data
openxlsx::write.xlsx(
meta_data,
'data/meta_data.xlsx'
)