extract_features
Module Documentation
These functions extract features from the sEMG signal, capturing information in both time and frequency domains. The main function to do this is extract_features
. Within this call, individual features are calculated by their own functions, allowing them to be incorporated into your own workflow.
Module Structure
Basic Time-Series Statistics
The extract_features
function calculates some basic statistics that don't involve their own functions. This includes:
- Minimum voltage (using
np.min
) - Maximum voltage (using
np.max
) - Mean voltage (using
np.mean
) - Standard deviation of voltage (using
np.std
) - Skew of voltage (using
scipy.stats.skew
) - Kurtosis of voltage (using
scipy.stats.kurtosis
) - Maximum frequency (using
np.max
)
Skew and kurtosis are less common, and have a more detailed explanation below.
Skew
Skewness describes the symmetry of a dataset, considered more skewed the less symmetrical the left and right distributions of the median are.
Skew is calculated as follows:
<-- Mean <-- Standard deviation <-- Mode <-- Median
Skew is calculated with scipy.stats.skew
Kurtosis
Kurtosis describes the amount of data in the tails of a bell curve of a distribution.
Kurtosis is calculated as follows:
<-- Mean <-- Standard deviation <-- Number of data points
Kurtosis is calculated with scipy.stats.kurtosis
calc_iemg
Description
Calculates the Integrated EMG (IEMG) of the signal. The IEMG measures the area under the curve of the signal, which can provide useful information about muscle activity. In an EMG signal, the IEMG describes when the muscle begins contracting, and is related to the signal sequence firing point (Phinyomark et al., 2009).
calc_iemg(Signal, col, sampling_rate)
Theory
In the reference, the IEMG does not account for the sampling rate. Two signal recordings with the same shape but different sampling rate would have different results since we are integrating with respect to time. As such, the calculation made here will include multiplying by sampling rate.
The IEMG is calculated as follows:
<-- Sampling rate <-- Number of data points
(Spiewak et al., 2018)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
sampling_rate
: int/float
- Sampling rate of the
Signal
. This is the number of entries recorded per second, or the inverse of the difference in time between entries.
Raises
An exception is raised if col
is not a column of Signal
.
An exception is raised if sampling_rate
is less or equal to 0.
Returns
IEMG
: float
- The IEMG of
Signal
.
Example
# Calculate the IEMG of Signal, for column 'EMG_zyg'
IEMG = EMGFlow.calc_iemg(Signal, 'EMG_zyg', 2000)
calc_mav
Description
Calculates the Mean Absolute Value (MAV) of the signal. In an EMG signal, the MAV describes the muscle contraction level (Phinyomark et al., 2009).
calc_mav(Signal, col)
Theory
The MAV is calculated as follows:
<-- Number of data points
(Tkach et al., 2010)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
Raises
An exception is raised if col
is not a column of Signal
.
Returns
MAV
: float
- The MAV of
Signal
.
Example
# Calculate the MAV of Signal, for column 'EMG_zyg'
MAV = EMGFlow.calc_mav(Signal, 'EMG_zyg', 2000)
calc_mmav1
Description
Calculates the Modified Mean Absolute Value 1 (MMAV1) of the signal. The MMAV1 is an alteration of MAV that gives more weight to values in the middle of the signal to reduce error from the beginning and end of the signal.
calc_mmav1(Signal, col)
Theory
The MMAV1 is identical to MAV, except it introduces a weight to the calculation. Values are given a weight of 1 when they are between the 25th and 75th percentile, and 0.5 outside.
The MMAV1 is calculated as follows:
<-- Number of data points
(Chowdhury et al., 2013)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
Raises
An exception is raised if col
is not a column of Signal
.
Returns
MMAV1
: float
- The MMAV1 of
Signal
.
Example
# Calculate the MMAV1 of Signal, for column 'EMG_zyg'
MMAV1 = EMGFlow.calc_mmav1(Signal, 'EMG_zyg', 2000)
calc_mmav2
Description
Calculates the Modified Mean Absolute Value 2 (MMAV2) of the signal. The MMAV2 is an alteration of MAV that gives more weight to values in the middle of the signal to reduce error from the beginning and end of the signal.
calc_mmav2(Signal, col)
Theory
The MMAV2 is identical to MAV, except it introduces a weight to the calculation. Values are given a weight of 1 when they are between the 25th and 75th percentile, a weight of
The MMAV2 is calculated as follows:
<-- Number of data points
(Hamedi et al., 2014)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
Raises
An exception is raised if col
is not a column of Signal
.
Returns
MMAV2
: float
- The MMAV2 of
Signal
.
Example
# Calculate the MMAV2 of Signal, for column 'EMG_zyg'
MMAV2 = EMGFlow.calc_mmav2(Signal, 'EMG_zyg', 2000)
calc_ssi
Description
Calculates the Simple Square Integral (SSI) of the signal. In an EMG signal, the SSI describes the energy of the signal (Phinyomark et al., 2009).
calc_ssi(Signal, col, sampling_rate)
Theory
In the reference, the SSI does not account for the sampling rate. Two signal recordings with the same shape but different sampling rate would have different results since we are integrating with respect to time. As such, the calculation made here will include multiplying by sampling rate.
The SSI is calculated as follows:
<-- Sampling rate <-- Number of data points
(Spiewak et al., 2018)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
sampling_rate
: int/float
- Sampling rate of the
Signal
. This is the number of entries recorded per second, or the inverse of the difference in time between entries.
Raises
An exception is raised if col
is not a column of Signal
.
An exception is raised if sampling_rate
is less or equal to 0.
Returns
SSI
: float
- The SSI of
Signal
.
Example
# Calculate the SSI of Signal, for column 'EMG_zyg'
SSI = EMGFlow.calc_ssi(Signal, 'EMG_zyg', 2000)
calc_var
Description
Calculates the Variance (VAR) of the signal. In an EMG signal, the VAR describes the power of the signal (Phinyomark et al., 2009).
calc_var(Signal, col)
Theory
The VAR is calculated as follows:
(Spiewak et al., 2018)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
Raises
An exception is raised if col
is not a column of Signal
.
Returns
VAR
: float
- The VAR of
Signal
.
Example
# Calculate the VAR of Signal, for column 'EMG_zyg'
VAR = EMGFlow.calc_var(Signal, 'EMG_zyg')
calc_vorder
Description
Calculates the V-Order of a signal. The V-Order is an alteration of VAR that takes the square root of the result.
calc_vorder(Signal, col)
Theory
The V-Order is calculated using the
The V-Order is calculated as follows:
(Tkach et al., 2010)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
Raises
An exception is raised if col
is not a column of Signal
.
Returns
VOrder
: float
- The V-Order of
Signal
.
Example
# Calculate the V-Order of Signal, for column 'EMG_zyg'
VOrder = EMGFlow.calc_vorder(Signal, 'EMG_zyg')
calc_rms
Description
Calculates the Root Mean Square (RMS) of a signal. In an EMG signal, the RMS provides information about the constant force, and non-fatiguing contractions of the muscles (Phinyomark et al., 2009).
calc_rms(Signal, col)
Theory
The RMS is calculated as follows:
<-- Number of data points
(Spiewak et al., 2018)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
Raises
An exception is raised if col
is not a column of Signal
.
Returns
RMS
: float
- The RMS of
Signal
.
Example
# Calculate the RMS of Signal, for column 'EMG_zyg'
RMS = EMGFlow.calc_rms(Signal, 'EMG_zyg')
calc_wl
Description
Calculates the Waveform Length (WL) of a signal. The WL provides information about the amplitude, frequency, and duration of the signal.
calc_wl(Signal, col)
Theory
The WL is calculated as follows:
<-- Number of data points
(Spiewak et al., 2018)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
Raises
An exception is raised if col
is not a column of Signal
.
Returns
WL
: float
- The WL of
Signal
.
Example
# Calculate the WL of Signal, for column 'EMG_zyg'
WL = EMGFlow.calc_wl(Signal, 'EMG_zyg')
calc_wamp
Description
Calculate the Willison Amplitude (WAMP) of a signal. The WAMP measures the number of times an EMG amplitude exceeds a given threshold. In an EMG signal, the WAMP describes the firing of Motor Unit Action Potentials (MUAP), and muscle contraction level (Phinyomark et al., 2009).
calc_wamp(Signal, col, threshold)
Theory
Thresholds for the WAMP are commonly chosen within the 50-100 mV range. The WAMP counts the number of recorded times in the signal that the charge is greater than the threshold, so it can be affected by the sampling rate. If datasets with different sampling rates are being compared, it may be beneficial to normalize each result by the length of the signal.
When choosing a value, pass it in terms of the same units being used in the data.
The WAMP is calculated as follows:
<-- Number of data points <-- Voltage change threshold
(Tkach et al., 2010)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
threshold
: float
- Threshold of the WAMP.
Raises
An exception is raised if col
is not a column of Signal
.
Returns
WAMP
: int
- The WAMP of
Signal
.
Example
# Calculate the WAMP of Signal, for column 'EMG_zyg'
WAMP = EMGFlow.calc_wamp(Signal, 'EMG_zyg', 55)
calc_log
Description
Calculates the Log-Detector (LOG) of a signal. The LOG provides an estimate of the force exerted by the muscle.
calc_log(Signal, col)
Theory
The LOG is calculated as follows:
<-- Number of data points
(Tkach et al., 2010)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
Raises
An exception is raised if col
is not a column of Signal
.
Returns
LOG
: float
- The LOG of
Signal
.
Example
# Calculate the LOG of Signal, for column 'EMG_zyg'
LOG = EMGFlow.calc_log(Signal, 'EMG_zyg')
calc_mfl
Description
Calculates the Maximum Fractal Length (MFL) of a signal. The MFL measures the activation of low-level muscle contractions.
calc_mfl(Signal, col)
Theory
The MFL is calculated as follows:
<-- Number of data points
(Too et al., 2019)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
Raises
An exception is raised if col
is not a column of Signal
.
Returns
MFL
: float
- The MFL of
Signal
.
Example
# Calculate the MFL of Signal, for column 'EMG_zyg'
MFL = EMGFlow.calc_mfl(Signal, 'EMG_zyg')
calc_ap
Description
Calculates the Average Power (AP) of a signal. The AP measures the energy distribution of the signal.
calc_ap(Signal, col)
Theory
The AP is calculated as follows:
<-- Number of data points
(Too et al., 2019)
Parameters
Signal
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
col
: str
- Column of
Signal
the feature is calculated from.
Raises
An exception is raised if col
is not a column of Signal
.
Returns
AP
: float
- The AP of
Signal
.
Example
# Calculate the AP of Signal, for column 'EMG_zyg'
AP = EMGFlow.calc_ap(Signal, 'EMG_zyg')
calc_spec_flux
Description
Calculates the Spectral Flux of a signal. Spectral Flux measures the change in spectrums between two signals, or two sections of a signal.
calc_spec_flux
behaves differently, depending on if diff
is a Pandas dataframe or float. A dataframe treats diff
as a second signal, and finds the spectral flux between it and Signal1
. Providing a float Signal1
into the first
The call to calc_spec_flux
within extract_features
uses a default value of diff=0.5
.
calc_spec_flux(Signal1, diff, col, sampling_rate, diff_sr=None)
Theory
Spectral Flux is used to compare the spectral compositions of two signals. This is complicated to implement, so there is no one function to handle this.
Spectral Flux is calculated as follows:
(Giannakopoulos & Pikrakis, 2014)
Parameters
Signal1
: pd.DataFrame
- A Pandas dataframe containing a 'Time' column, and additional columns for signal data.
diff
: pd.DataFrame, float
- Either a new signal dataframe, or a decimal indicating the percentage of the signal to split.
col
: str
- Column of
Signal
the feature is calculated from.
sampling_rate
: int/float
- Sampling rate of the
Signal
. This is the number of entries recorded per second, or the inverse of the difference in time between entries.
diff_sr
: int, float, optional (None)
- Sampling rate for
diff
, if it has a different sampling rate thanSignal1
. The default is None, in which case ifdiff
is a dataframe, the sampling rate is assumed to be the same asSignal1
.
Raises
An exception is raised if col
is not a column of Signal1
.
An exception is raised if sampling_rate
is less or equal to 0.
An exception is raised if diff
is a float, but isn't between 0 and 1.
An exception is raised if diff
is a dataframe and does not contain col
.
An exception is raised if diff_sr
is less or equal to 0.
An exception is raised if diff
is an invalid data type.
Returns
flux
: float
- The Spectral Flux of
Signal
.
Example
# Calculate the Spectral Flux of two signals
flux1 = EMGFlow.calc_spec_flux(Signal1, Signal2, 'EMG_zyg', 2000)
# Calculate the Spectral Flux of one signal divided at the halfway point
flux2 = EMGFlow.calc_spec_flux(Signal1, 0.5, 'EMG_zyg', 2000)
calc_mdf
Description
Calculates the Median Frequency (MDF) of a Signal.
Theory
MDF is the frequency on the power spectrum that can divide it into two regions of equal total power.
Since it may not be possible to perfectly divide the spectrum into two regions of exactly equal power, this function finds the frequency that divides it the best.
Parameters
psd
: pd.DataFrame
- A Pandas dataframe containing a 'Frequency' and 'Power' column. Should be normalized.
Raises
An exception is raised if psd
does not only have columns 'Frequency' and 'Power'.
Returns
med_freq
: float
- The MDF of
psd
.
Example
# Calculate the MDF of Signal, for column 'EMG_zyg'
psd = EMGFlow.emg_to_psd(Signal['EMG_zyg'], 2000)
MDF = EMGFlow.calc_mdf(psd)
calc_mnf
Description
Calculates the Mean Frequency (MNF) of a Signal.
Theory
MNF is the mean frequency on the power spectrum, weighted by the power of each frequency.
MNF is calculated as follows:
(Phinyomark et al., 2009)
Parameters
psd
: pd.DataFrame
- A Pandas dataframe containing a 'Frequency' and 'Power' column. Should be normalized.
Raises
An exception is raised if psd
does not only have columns 'Frequency' and 'Power'.
Returns
mean_freq
: float
- The MNF of
psd
.
Example
# Calculate the MNF of Signal, for column 'EMG_zyg'
psd = EMGFlow.emg_to_psd(Signal['EMG_zyg'], 2000)
MNF = EMGFlow.calc_mnf(psd)
calc_twitch_ratio
Description
Calculates the Twitch Ratio of a Signal based on its power distribution.
calc_twitch_ratio(psd, freq=60)
Theory
This metric uses a proposed muscle separation theory put forward by this project. This measure is typically used in audio feature extraction, separating the high and low frequencies. This kind of separation is not typically done in EMG feature extraction. However, literature suggests that there are both high and low frequency muscle activations that can be separated by an approximately 60Hz threshold (Hegedus et al., 2020). Since these muscles are present in the face (McComas, 1998), this experimental feature is being added by applying this audio feature in a new context.
Twitch Ratio is an adaptation of Alpha Ratio (Eyben et al., 2016).
Twitch Ratio is calculated as follows:
<-- Power of normalized PSD at frequency <-- Minimum frequency of the PSD <-- Threshold frequency of the PSD <-- Maximum frequency of the PSD
Parameters
psd
: pd.DataFrame
- A Pandas dataframe containing a 'Frequency' and 'Power' column. Should be normalized.
freq
: int, float, optional (60)
- Frequency threshold separating fast-twitching (high-frequency) muscles from slow-twitching (low-frequency) muscles. The default is 60.
Raises
An exception is raised if freq
is less or equal to 0.
An exception is raised if psd
does not only have columns 'Frequency' and 'Power'.
Returns
twitch_ratio
: float
- The Twitch Ratio of
psd
.
Example
# Calculate the Twitch Ratio of Signal, for column 'EMG_zyg'
psd = EMGFlow.emg_to_psd(Signal['EMG_zyg'], 2000)
twitch_ratio = EMGFlow.calc_twitch_ratio(psd)
calc_twitch_index
Description
Calculates the Twitch Index of a Signal based on its power distribution.
calc_twitch_index(psd, freq=60)
Theory
This metric uses a proposed muscle separation theory put forward by this project. This measure is typically used in audio feature extraction, separating the high and low frequencies. This kind of separation is not typically done in EMG feature extraction. However, literature suggests that there are both high and low frequency muscle activations (slow twitching muscles, and fast twitching muscles) that can be separated by an approximately 60Hz threshold. As such, this experimental feature is being added by applying this audio feature in a new context.
Twitch Index is an adaptation of the Hammarberg index (Eyben et al., 2016).
Twitch Index is calculated as follows:
<-- Power of normalized PSD at frequency <-- Minimum frequency of the PSD <-- Threshold frequency of the PSD <-- Maximum frequency of the PSD
Parameters
psd
: pd.DataFrame
- A Pandas dataframe containing a 'Frequency' and 'Power' column. Should be normalized.
freq
: int, float, optional (60)
- Frequency threshold separating fast-twitching (high-frequency) muscles from slow-twitching (low-frequency) muscles. The default is 60.
Raises
An exception is raised if freq
is less or equal to 0.
An exception is raised if psd
does not only have columns 'Frequency' and 'Power'.
Returns
twitch_index
: float
- The Twitch Ratio of
psd
.
Example
# Calculate the Twitch Index of Signal, for column 'EMG_zyg'
psd = EMGFlow.emg_to_psd(Signal['EMG_zyg'], 2000)
twitch_index = EMGFlow.calc_twitch_index(psd)
calc_twitch_slope
Description
Calculates the Twitch Slope of a Signal based on its power distribution.
calc_twitch_slope(psd, freq=60)
Theory
This metric uses a proposed muscle separation theory put forward by this project. This measure is typically used in audio feature extraction, separating the high and low frequencies. This kind of separation is not typically done in EMG feature extraction. However, literature suggests that there are both high and low frequency muscle activations (slow twitching muscles, and fast twitching muscles) that can be separated by an approximately 60Hz threshold. As such, this experimental feature is being added by applying this audio feature in a new context.
Twitch Slope is an adaptation of spectral slope (Eyben et al., 2016).
calc_twitch_slope
uses the np.linalg.lstsq
to find the slope of the line of best fit for the two regions separated by frequency, and returns both values.
Parameters
psd
: pd.DataFrame
- A Pandas dataframe containing a 'Frequency' and 'Power' column. Should be normalized.
freq
: int, float, optional (60)
- Frequency threshold separating fast-twitching (high-frequency) muscles from slow-twitching (low-frequency) muscles. The default is 60.
Raises
An exception is raised if freq
is less or equal to 0.
An exception is raised if psd
does not only have columns 'Frequency' and 'Power'.
Returns
fast_slope
: float
- The Twitch Slope of the fast-twitching muscles of
psd
.
slow_slope
: float
- The Twitch Slope of the slow-moving muscles of
psd
Example
# Calculate the Twitch Index of Signal, for column 'EMG_zyg'
psd = EMGFlow.emg_to_psd(Signal['EMG_zyg'], 2000)
fast_slope, slow_slope = EMGFlow.calc_twitch_slope(psd)
calc_sc
Description
Calculates the Spectral Centroid (SC) of a signal. The SC is the "center of mass" of a signal after a Fourier transform is applied.
calc_sc(psd)
Theory
SC is calculated as follows:
<-- Power of normalized PSD at frequency <-- Minimum frequency of the PSD <-- Maximum frequency of the PSD
(Roldán Jiménez et al., 2019)
Parameters
psd
: pd.DataFrame
- A Pandas dataframe containing a 'Frequency' and 'Power' column. Should be normalized.
Raises
An exception is raised if psd
does not only have columns 'Frequency' and 'Power'.
Returns
SC
: float
- The SC of
psd
.
Example
# Calculate the SC of Signal, for column 'EMG_zyg'
psd = (Signal['EMG_zyg'], 2000)
SC = EMGFlow.calc_sc(psd)
calc_sf
Description
Calculates the Spectral Flatness (SF) of a signal. The SF measures noise in the magnitude spectrum.
calc_sf(psd)
Theory
SF is calculated as follows:
<-- th element of PSD strength <-- Number of elements in PSD
(Nagineni et al., 2018)
Parameters
psd
: pd.DataFrame
- A Pandas dataframe containing a 'Frequency' and 'Power' column. Should be normalized.
Raises
An exception is raised if psd
does not only have columns 'Frequency' and 'Power'.
Returns
SF
: float
- The SF of
psd
.
Example
# Calculate the SF of Signal, for column 'EMG_zyg'
psd = EMGFlow.emg_to_psd(Signal['EMG_zyg'], 2000)
SC = EMGFlow.calc_sf(psd)
calc_ss
Description
Calculates the Spectral Spread (SS) of a signal. SS is also called the "instantaneous bandwidth", and measures the standard deviation around the SC.
calc_ss(psd)
Theory
SS is calculated as follows:
(Nagineni et al., 2018)
Parameters
psd
: pd.DataFrame
- A Pandas dataframe containing a 'Frequency' and 'Power' column. Should be normalized.
Raises
An exception is raised if psd
does not only have columns 'Frequency' and 'Power'.
Returns
SS
: float
- The SS of
psd
.
Example
# Calculate the SS of Signal, for column 'EMG_zyg'
psd = EMGFlow.emg_to_psd(Signal['EMG_zyg'], 2000)
SS = EMGFlow.calc_ss(psd)
calc_sd
Description
Calculates the Spectral Decrease (SD) of a signal. SD is the decrease of the slope of the spectrum with respect to frequency.
calc_sd(psd)
Theory
SD is calculated as follows:
(Nagineni et al., 2018)
Parameters
psd
: pd.DataFrame
- A Pandas dataframe containing a 'Frequency' and 'Power' column. Should be normalized.
Raises
An exception is raised if psd
does not only have columns 'Frequency' and 'Power'.
Returns
SD
: float
- The SD of
psd
.
Example
# Calculate the SD of Signal, for column 'EMG_zyg'
psd = EMGFlow.emg_to_psd(Signal['EMG_zyg'], 2000)
SD = EMGFlow.calc_sd(psd)
calc_se
Description
Calculates the Spectral Entropy (SE) of a signal. SE is the Shannon entropy of the spectrum.
calc_se(psd)
Theory
SE is calculated as follows:
(Llanos et al., 2017)
Parameters
psd
: pd.DataFrame
- A Pandas dataframe containing a 'Frequency' and 'Power' column. Should be normalized.
Raises
An exception is raised if psd
does not only have columns 'Frequency' and 'Power'.
Returns
SE
: float
- The SE of
psd
.
Example
# Calculate the SE of Signal, for column 'EMG_zyg'
psd = EMGFlow.emg_to_psd(Signal['EMG_zyg'], 2000)
SEntropy = EMGFlow.calc_se(psd)
calc_sr
Description
Calculates the Spectral Rolloff (SR) of a signal. The spectral rolloff point is the frequency of the PSD where 85% of the total spectral energy lies below it.
calc_sr(psd, percent=0.85)
Theory
The actual threshold for SR can be set manually, but literature suggests that 85% is the best point.
(Tjoa, 2022)
Parameters
psd
: pd.DataFrame
- A Pandas dataframe containing a 'Frequency' and 'Power' column. Should be normalized.
percent
: float
- The percentage of power that should be below the SR point. The default is 0.85.
Raises
An exception is raised if psd
does not only have columns 'Frequency' and 'Power'.
An exception is raised if percent
is not between 0 and 1.
Returns
SR
: float
- The SR of
psd
.
Example
# Calculate the SR of Signal, for column 'EMG_zyg'
psd = EMGFlow.emg_to_psd(Signal['EMG_zyg'], 2000)
SR = EMGFlow.calc_sr(psd)
calc_sbw
Description
Calculates the Spectral Bandwidth (SBW) of a signal. The SBW calculates the difference between the upper and lower freqencies in the frequency band.
calc_sbw(psd, p=2)
Theory
SBW has a parameter
SBW is calculated as follows:
(Tjoa, 2022)
Parameters
psd
: pd.DataFrame
- A Pandas dataframe containing a 'Frequency' and 'Power' column. Should be normalized.
p
: int, optional (2)
- Order of the SBW. The default is 2, which gives the standard deviation around the centroid.
Raises
An exception is raised if psd
does not only have columns 'Frequency' and 'Power'.
An exception is raised if p
is not greater than 0.
Returns
SBW
: float
- The SBW of
psd
.
Example
# Calculate the SBW of Signal, for column 'EMG_zyg'
psd = EMGFlow.emg_to_psd(Signal['EMG_zyg'], 2000)
SBW = EMGFlow.calc_sbw(psd)
extract_features
Description
Extracts usable features from two sets of signal files (before and after being smoothed). Writes output to a new folder directory, as specified by the 'Feature' key value of the path_names
dictionary. Output is both saved to the disk as a plaintext file, 'Features.csv', and is also returned as a dataframe object.
Components of a Signal
dataframe:
- Has a column named
Time
containing time indexes Time
indexes are all equally spaced apart- Has one (or more) columns with any other name, holding the value of the electrical signal read at that time
All files contained within the folder and subfolder with the proper extension are assumed to be signal files. All signal files within the folder and subfolders should have the same change in time between entries.
extract_features(path_names, sampling_rate, cols=None, expression=None, file_ext='csv', short_name=True)
Theory
This function requires a path to smoothed and unsmoothed data. This is because while time-series features are extracted from smoothed data, spectral features are not. High-frequency components of the signal can be lost in the smoothing, and we want to ensure the spectral features are as accurate as possible.
Parameters
path_names
: dict-str
- A dictionary of file locations with keys for the stage in the processing pipeline.
sampling_rate
: int, float
- Sampling rate of the signal files. This is the number of entries recorded per second, or the inverse of the difference in time between entries.
cols
: str, optional (None)
- List of columns of the signal files to extract features from. The default is None, in which case features are extracted from every column except for 'Time'.
expression
: str, optional (None)
- A regular expression. If provided, will only extract features from files whose names match the regular expression. The default is None.
file_ext
: str, optional ('csv')
- File extension for files to read. Only reads files with this extension. The default is 'csv'.
short_names
: bool, optional (True)
- If true, makes the key column of the feature files the relative path of the file. If false, uses the full system path. The default is True.
Raises
A warning is raised if expression
does not match with any files in the folders provided.
An exception is raised if 'Bandpass', 'Smooth' or 'Feature' are not keys of the path_names
dictionary provided.
An exception is raised if the 'Bandpass' and 'Smooth' filepaths do not contain the same files.
An exception is raised if a file cannot not be read in the 'Bandpass' or 'Smooth' filepaths.
An exception is raised if an unsupported file format was provided for file_ext
.
An exception is raised if expression
is not None or a valid regular expression.
Returns
Features
: pd.DataFrame
- A Pandas dataframe of feature data for each file read. Each row is a different file analyzed, marked by the 'File_ID' column. Additional columns show the values of the features extracted by the function.
Example
path_names = EMGFlow.make_paths()
EMGFlow.make_sample_data(path_names)
sampling_rate = 2000
cols = ['EMG_zyg', 'EMG_cor']
# Extracts all features from the files in the 'Bandpass' path and the 'Smooth'
# path. Assumes the same files are in both paths.
features = EMGFlow.extract_features(path_names, sampling_rate, cols)
Sources
Chowdhury, R. H., Reaz, M. B. I., Ali, M. A. B. M., Bakar, A. A. A., Chellappan, K., & Chang, Tae. G. (2013). Surface Electromyography Signal Processing and Classification Techniques. Sensors (Basel, Switzerland), 13(9), 12431–12466. https://doi.org/10.3390/s130912431
Eyben, F., Scherer, K. R., Schuller, B. W., Sundberg, J., André, E., Busso, C., Devillers, L. Y., Epps, J., Laukka, P., Narayanan, S. S., & Truong, K. P. (2016). The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Transactions on Affective Computing, 7(2), 190–202. https://doi.org/10.1109/TAFFC.2015.2457417
Giannakopoulos, T., & Pikrakis, A. (2014). Introduction to Audio Analysis. In T. Giannakopoulos & A. Pikrakis (Eds.), Introduction to Audio Analysis (pp. 59–103). Academic Press. https://doi.org/10.1016/B978-0-08-099388-1.00004-2
Hamedi, M., Salleh, S.-H., Astaraki, M., Noor, A. M., & Harris, A. R. A. (2014). Comparison of Multilayer Perceptron and Radial Basis Function Neural Networks for EMG-Based Facial Gesture Recognition. In H. A. Mat Sakim & M. T. Mustaffa (Eds.), The 8th International Conference on Robotic, Vision, Signal Processing & Power Applications (Vol. 291, pp. 285–294). Springer Singapore. https://doi.org/10.1007/978-981-4585-42-2_33
Hegedus, A., Trzaskoma, L., Soldos, P., Tuza, K., Katona, P., Greger, Z., Zsarnoczky-Dulhazi, F., & Kopper, B. (2020). Adaptation of Fatigue Affected Changes in Muscle EMG Frequency Characteristics for the Determination of Training Load in Physical Therapy for Cancer Patients. Pathology Oncology Research, 26(2), 1129–1135. https://doi.org/10.1007/s12253-019-00668-3
Llanos, F., Alexander, J. M., Stilp, C. E., & Kluender, K. R. (2017). Power spectral entropy as an information-theoretic correlate of manner of articulation in American English. The Journal of the Acoustical Society of America, 141(2), EL127–EL133. https://doi.org/10.1121/1.4976109
McComas, A. J. (1998). Oro-facial muscles: Internal structure, function and ageing. Gerodontology, 15(1), 3–14. https://doi.org/10.1111/j.1741-2358.1998.00003.x
Nagineni, S., Taran, S., & Bajaj, V. (2018). Features based on variational mode decomposition for identification of neuromuscular disorder using EMG signals. Health Information Science and Systems, 6(1), 13. https://doi.org/10.1007/s13755-018-0050-4
Phinyomark, A., Limsakul, C., & Phukpattaranont, P. (2009). A Novel Feature Extraction for Robust EMG Pattern Recognition. 1(1).
Roldán Jiménez, C., Bennett, P., Ortiz García, A., & Cuesta Vargas, A. I. (2019). Fatigue Detection during Sit-To-Stand Test Based on Surface Electromyography and Acceleration: A Case Study. Sensors (Basel, Switzerland), 19(19), 4202. https://doi.org/10.3390/s19194202
Spiewak, C., Islam, M. R., Assad-Uz-Zaman, M., & Rahman, M. (2018). A Comprehensive Study on EMG Feature Extraction and Classifiers. Open Access Journal of Biomedical Engineering and Its Applications, 1. https://doi.org/10.32474/OAJBEB.2018.01.000104
Tjoa, S. (2022). Spectral Features. Music Information Retrieval. https://musicinformationretrieval.com/spectral_features.html
Tkach, D., Huang, H., & Kuiken, T. A. (2010). Study of stability of time-domain features for electromyographic pattern recognition. Journal of NeuroEngineering and Rehabilitation, 7, 21. https://doi.org/10.1186/1743-0003-7-21
Too, J., Abdullah, A. R., Mohd Saad, N., & Tee, W. (2019). EMG Feature Selection and Classification Using a Pbest-Guide Binary Particle Swarm Optimization. Computation, 7(1), Article 1. https://doi.org/10.3390/computation7010012