Denoising Tutorial

Description

This markdown-document demonstrates the workflow for isolating the soundscattering signal from the acoustics dataset reconstructed from screenshots. The processing starts with the dataset generated by the screenshot processing and finally returns a dataset where the acoustic soundscattering signal contained in the screenshot is isolated from other image features such as noise, the sea floor etc.. The final dataset can be used to e.g. analyse swarm shapes and their vertical position, e.g. by using the Centre of Mass or measures of dispersal. All files and R-scripts necessary to replicate this tutorial can be found under https://sandbox.zenodo.org/record/1184381.

Setup

Before we process the data, we load all the relevant functions for the tutorial and set up the framework for the analysis.

# function for downsampling the original data
source('functions/downsampling_acoustics.r')
# function for sea floor detection 
source('functions/sea_floor_max_sv.r')
# function to visualise the product of the sea floor detection
source('functions/sea_floor_plot.r')
# function to plot the echogram
source('functions/acoustics_plot.r')
# function to calculate backscattering intensity
source('functions/power_cal_trans.r')
# function to search interval (used for noise estimation)
source('functions/matrix_search_mean.r')
# function for noise removal
source('functions/noise_removal.r')
# function to remove all non-biomass-signal features
source('functions/isolate_signal.r')

The functions are contained within the functions folder of the tutorial.

Loading

The processing starts with creating a tibble that stores the (file-specific) input parameter values that are used for the sea floor detection and de-noising algorithms to keep track of the settings. At the end of the processing, this file will be stored in the “data”-folder.

if(!file.exists('data/meta_data.xlsx')) {
  meta_data <- tibble::tibble(
  filePath = list.files(
    'data',
    pattern = '.csv', full.names = T, recursive = T
  ),
  fileName = list.files(
    'data',
    pattern = '.csv', full.names = F, recursive = T
  )
) |>
  dplyr::mutate(
    dateTimeStart = NA,
    dateTimeStop = NA
  )
} else {
  meta_data <- openxlsx::read.xlsx(
    'data/meta_data.xlsx'
  )
}

Show Metadata

❗ ❗ ❗ The next step is to select the file to be processed. Here we can pick the row-number in the meta_data-object of the respective file. In this case it has to be one of 1, 2, and 3. At this point, the script could also be adapted and looped over multiple files.

fileNumber <- 2

Then we import the respective ‘.csv’-file and store the record’s start and end time in the meta_data data frame.

# import the acoustic data
acoustics <- readr::read_csv(
  meta_data$filePath[fileNumber],
  col_types = readr::cols(
    biomassScore = readr::col_number(), 
    dateTime = readr::col_character(), 
    depth = readr::col_number(), 
    depthFilter = readr::col_number(),
    interpSeafloor = readr::col_number(), 
    seaFloorDepth = readr::col_number(), 
    timeBin = readr::col_number()
  )
) 
## add start and end time the meta_data
# start time
meta_data$dateTimeStart[fileNumber] <- acoustics |>
  dplyr::filter(dateTime == min(dateTime)) |>
  dplyr::distinct(dateTime) |>
  dplyr::pull(dateTime)
# end time
meta_data$dateTimeStop[fileNumber] <- acoustics |>
  dplyr::filter(dateTime == max(dateTime)) |>
  dplyr::distinct(dateTime) |>
  dplyr::pull(dateTime)

Head of the Acoustic Data

Tail of the Acoustic Data

The data set consists of eight variables. During processing, we need the first four: the sampling depth, the relative time (scaling from 0-1; start-end), the relative backscattering score from 0-1 and the datetime information.

Downsampling

Downsampling Input

The following variables specify the downsampling input parameters. The depth resolution is in meters, and the time component is in minutes. For the data sets, the depth resolution should not be finer than, in this example, the input resolution of 0.6667 meters. To have a uniform time and depth resolution for all files, we store the information in the meta_data-tibble.

# set temporal resolution of the processed dataset (other possible time units: sec, hours)
tempResolution  <- '3 min' 

# set depth resolution of the processed dataset in meters 
depthResolution <- 2/3

# local time offset to UTC in hours
utc_diff <- +1

# store information in the meta_data-tibble
meta_data$tempResolution <- tempResolution 
meta_data$depthResolution <- depthResolution
meta_data$utc_diff <- utc_diff

The downsampling function requires several input arguments

the dataset to be downsampled (data)
the starting time stored in the meta_data-object (start_time)
the end time (end_time)
the selected temporal resolution (temp_resolution)
the selected depth resolution (depth_resolution)
the local time zone relative to the UTC in hours (local_time_zone)

# downsample the input data
acoustics <- downsampling_acoustics(
  data = acoustics,
  start_time = meta_data$dateTimeStart[fileNumber],
  end_time = meta_data$dateTimeStop[fileNumber], 
  temp_resolution = tempResolution,
  depth_resolution = depthResolution,
  local_time_zone = utc_diff
)

Glimpse of the processed and downsampled data:

Sea floor detection

Before we start the sea floor detection, we check whether the data are in the correct rectangular format. More precisely, every depth level needs to have an assigned backscattering value at every time point, regardless if it is an NA or not. The lines below transforms the data in said way.

acoustics <- acoustics |>
  dplyr::group_by(dateTime) |>
  dplyr::arrange(desc(depth)) |>
  dplyr::ungroup()
#
frameExp <- acoustics |>
  tidyr::expand(dateTime, depth) |>
  dplyr::arrange(dateTime, desc(depth))
# biomass data as vector
acousticsMatrix <- acoustics |>
  dplyr::select(c(depth, timeBin, biomassScore)) |>
  tidyr::pivot_wider(names_from = timeBin, values_from = biomassScore, values_fill = NA) |>
  dplyr::select(-depth) |>
  as.matrix()
# 
acoustics <- tibble::tibble(
  frameExp, biomassScore = as.vector(acousticsMatrix)
) |>
  dplyr::mutate(
    timeBin = rep(1:length(unique(dateTime)), each=length(unique(-depth)))
  ) |>
  dplyr::relocate(timeBin, .before = dateTime)

Sea floor detection using default settings

First we add the default values to the metadata data frame.

# create new columns to store the function inputs, 
# here they get assigned default values that can be changed in case they do not produce the desired result 
if(!file.exists('data/meta_data.xlsx')) {
  meta_data <- meta_data |>
    dplyr::mutate(
      seaFloor_art_min_good_sv = 0.95,
      seaFloor_discrim_level = 0.6,
      seaFloor_end_depth = -300,
      seaFloor_start_depth = -40
    )
}

The detection and removal of the sea floor is an important step in the process of isolating the biomass signal. The algorithm implemented in sea_floor_max_sv is a slightly modified version of the max_Sv-algorithm documented in the Echoview help. The algorithm detects the sea floor based on selection criteria related to the maximum relative backscattering intensity within a pre-defined depth window (seaFloor_start_depth, seaFloor_end_depth). In the following, only the sea floor detection takes place. The sea floor removal is part of denoising part described later.
The sea floor detection function requires the following input arguments:

the data set itself (data)
a minimum relative backscattering intensity that could represent the sea floor (seaFloor_art_min_good_sv)
a relative backscattering intensity to discriminate sea floor from water/biomass (seaFloor_discrim_level)
end depth of the search interval - defined by the maximum depth (seaFloor_end_depth)
start of the search interval (seaFloor_start_depth)

acoustics <- sea_floor_max_sv(
  data = acoustics
)

# visualize the results
plotSeaFloor <- sea_floor_plot(acoustics)

The algorithms misclassifies the sea floor in some parts of the dataset (red line). This means, that the settings were not optimal and need to be tweaked to achieve a better result. In case the classification worked well with the default settings, the following steps are not necessary.

Sea floor detection using refined parameter settings

# Minimum good fit for the sea floor
seaFloor_art_min_good_sv <- 0.95

# discrimination level (seperates the sea floor and the waterbody)
seaFloor_discrim_level   <- 0.85

# starting depth of the search window [m]
seaFloor_start_depth     <- -140

# Store information in meta_data
meta_data$seaFloor_art_min_good_sv[fileNumber] <- seaFloor_art_min_good_sv
meta_data$seaFloor_discrim_level[fileNumber]   <- seaFloor_discrim_level
meta_data$seaFloor_start_depth[fileNumber]     <- seaFloor_start_depth

# repeat sea floor detection with updated parameter values
acoustics <- sea_floor_max_sv(
  data = acoustics,
  seaFloor_art_min_good_sv = seaFloor_art_min_good_sv,
  seaFloor_discrim_level = seaFloor_discrim_level,
  seaFloor_start_depth = seaFloor_start_depth
)

# visualize the results
plotSeaFloor <- sea_floor_plot(acoustics)

The refined parameter values significantly improved the sea floor detection quality.

As mentioned before, the detection algorithm does not remove parts of the dataset that are below the sea floor. Instead, it returns an object that contains the original input data and a suggested sea floor depth value for each timepoint in a new column. In another column, the detection quality is stored with a binary value (“good”, “bad”), which is a measure for the confidence of the classification. When no sea floor is visible at a given timepoint, the sea floor is typically determined at the maximum depth value (252 m in this example) with “bad” detection quality as the algorithm determined low confidence that this depth represents the actual sea floor.

Overview of the resulting dataset

Denoising

In the denoising part, the biomass signal is isolated from instrument noise. The algorithm, described in De Roberts and Higginbottom (2007, doi: https://doi.org/10.1093/icesjms/fsm112), assumes that a portion of the signal is dominated by background noise. Therefore, it estimates the background noise for every sample and subtracts the calculated noise value from the received signal.

The algorithm consists of three main steps that are executed below: During the first step, the relative backscattering score is transformed into a value in dB (decibel). This is also were the conversion from a relative intensity (0-1) to an absolute intensity (-70 to -10 dB in this case) takes place. The lower and upper boundaries of the decibel-values should be set according to the colourbar settings used for the visualization of the acoustic data in the screenshots. In the second step, the converted signal is used to estimate the noise for every sample by averaging over a particular grid cell. The size of the grid cell can be individually selected for each data set In the third step, the estimated noise is subtracted from the measured backscattering value. In addition, when the estimated noise is equal or larger than the backscattering value, NA is produced. This is frequently the case when no biomass is detected.

Denoising Settings - Default

# store the input values used for the denoising in meta_data
if(!file.exists('data/meta_data.xlsx')) {
  meta_data <- meta_data |>
    dplyr::mutate(
      temperature = 1, # in °C
      salinity = 34, # in PSU
      ph = 8.1,
      frequency = 200000, # in 1/sec
      pulseDuration = 0.001024,  # in sec
      scaling = TRUE,
      decibelMax = -10, # in dB
      decibelMin = -70, # in dB
      noiseMax = -125 # in dB
    )
}

# transform relative backscattering intensity to decibel values based on physical properties of the sea water
acoustics <- power_cal_trans(
  acoustics = acoustics,
  temperature = meta_data$temperature[fileNumber], # in °C
  salinity = meta_data$salinity[fileNumber], # in PSU
  pH = meta_data$ph[fileNumber],
  frequency = meta_data$frequency[fileNumber], # in 1/sec
  pulseDuration = meta_data$pulseDuration[fileNumber],  # in sec
  scaling = meta_data$scaling[fileNumber],
  decibelMax = meta_data$decibelMax[fileNumber], # in dB
  decibelMin = meta_data$decibelMin[fileNumber] # in dB
)

# matrix_search to estimate the noise
acoustics <- matrix_search(
  data = acoustics,
  noiseMax = meta_data$noiseMax[fileNumber]
)

# remove the noise, isolate the biomass signal from noise
acoustics <- noise_removal(acoustics, scaling = TRUE)

## [1] "347848 values are not available"

# visualize results
acousticsPlot <- acoustics_plot(
  data = acoustics, variable = acoustics$biomassScoreDenoised
)

The algorithm clearly isolated the biomass signal from other image features. What remains is surface noise as well as the sea floor. These parts are removed in the following:

Isolating the biomass signal

# define window within biomass signal should be contained (optional)
upper_limit <- 10
lower_limit <- 240

# save settings in meta_data
meta_data$upper_limit <- upper_limit
meta_data$lower_limit <- lower_limit

# isolate the biomass signal, remove sea floor and signal outside of the depth-window
acoustics <- isolate_signal(
  data = acoustics,
  upper_limit = upper_limit,
  lower_limit = lower_limit
)

# visualize result
acousticsPlot_final <- acoustics_plot(
  data = acoustics, variable = acoustics$biomassScoreDenoised
)

Now, the biomass signal is isolated from the rest.

Final Data

The final dataset contains several new variables: The first four variables are dateTime, timeBin, depth and biomassScore, which were already contained in the dataset before processing. New variables that were added through the processing are: biomassScoreDenoised, correctSv and seaBottom. biomassScoreDenoised is the final product of the signal isolation process and comes in relative units (0-1). correctSv is the product of the de-noising and comes in decibel units. Visually, it is similar to biomassScoreDenoised but it still contains the sea floor. seaBottom is the identified sea bottom depth at a given timepoint.

Head

At the end of the processing, the data are exported as an RDS-file.

# save the processed acoustics file 
saveRDS(
  acoustics |>
    dplyr::select(
      c(dateTime, timeBin, depth,
      biomassScore, biomassScoreDenoised, correctSv,
      seaBottom)
    ),
  paste0(
    'export/acoustics_processed_', stringr::str_extract(
      meta_data$fileName[fileNumber], 
      '\\d{4}-\\d{2}-\\d{2}_\\d{2}_\\d{2}_\\d{2}_to_\\d{4}-\\d{2}-\\d{2}_\\d{2}_\\d{2}_\\d{2}'
    ), '.RDS'
  )
)

# picture of the echogram
  ggplot2::ggsave(
    paste0(
      'export/acoustics_processed_', 
      stringr::str_extract(
        meta_data$fileName[fileNumber], 
        '\\d{4}-\\d{2}-\\d{2}_\\d{2}_\\d{2}_\\d{2}_to_\\d{4}-\\d{2}-\\d{2}_\\d{2}_\\d{2}_\\d{2}'
      ),
      '.png'
    ),
    plot = acousticsPlot_final,
    width = 40,
    height = 4,
    units = 'in',
    limitsize = FALSE,
    type = 'cairo')

# meta data
openxlsx::write.xlsx(
  meta_data,
  'data/meta_data.xlsx'
)